Efficient Interleaved Batch Matrix Solvers for CUDA

09/10/2019
by   Andrew Gloster, et al.
0

In this paper we present a new methodology for data accesses when solving batches of Tridiagonal and Pentadiagonal matrices that all share the same LHS matrix. By only storing one copy of this matrix there is a significant reduction in storage overheads and the authors show that there is also a performance increase in terms of compute time. These two results combined lead to an overall more efficient implementation over the current state of the art algorithms cuThomasBatch and cuPentBatch, allowing for a greater number of systems to be solved on a single GPU.

READ FULL TEXT

page 7

page 11

page 12

07/08/2021

A Batched GPU Methodology for Numerical Solutions of Partial Differential Equations

In this paper we present a methodology for data accesses when solving ba...
09/16/2020

Accelerating Domain Propagation: an Efficient GPU-Parallel Algorithm over Sparse Matrices

Fast domain propagation of linear constraints has become a crucial compo...
07/13/2022

Grassmanian packings: Trust region stochastic tuning for matrix incoherence

We provide a new numerical procedure for constructing low coherence matr...
04/28/2022

Programming Matrices as Staged Sparse Rows to Generate Efficient Matrix-free Differential Equation Solver

Solving differential equations is a critical task in scientific computin...
12/20/2019

Matrix oriented reduction of space-time Petrov-Galerkin variational problems

Variational formulations of time-dependent PDEs in space and time yield ...
05/26/2019

Engineering Kernelization for Maximum Cut

Kernelization is a general theoretical framework for preprocessing insta...
07/26/2020

Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM

We implemented and optimized matrix multiplications between dense and bl...

Code Repositories

Please sign up or login with your details

Forgot password? Click here to reset