DeepAI AI Chat
Log In Sign Up

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

by   Moritz Kreutzer, et al.

Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme. In our test scenarios the pJDS format cuts the overall spMVM memory footprint on the GPGPU by up to 70 Using a suitable performance model we identify performance bottlenecks on the node level that invalidate some types of matrix structures for efficient multi-GPGPU parallelization. For appropriate sparsity patterns we extend previous work on distributed-memory parallel spMVM to demonstrate a scalable hybrid MPI-GPGPU code, achieving efficient overlap of communication and computation.


page 4

page 5

page 7


Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX

The A64FX CPU powers the current number one supercomputer on the Top500 ...

AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices

Sparse Matrix-Vector multiplication (SpMV) is an essential computational...

MatRox: A Model-Based Algorithm with an Efficient Storage Format for Parallel HSS-Structured Matrix Approximations

We present MatRox, a novel model-based algorithm and implementation of H...

Orthogonal layers of parallelism in large-scale eigenvalue computations

We address the communication overhead of distributed sparse matrix-(mult...

Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication

The multiplication of a sparse matrix with a dense vector (SpMV) is a ke...

Accelerating the SpMV kernel on standard CPUs by exploiting the partially diagonal structures

Sparse Matrix Vector multiplication (SpMV) is one of basic building bloc...

Efficient Distributed Transposition Of Large-Scale Multigraphs And High-Cardinality Sparse Matrices

Graph-based representations underlie a wide range of scientific problems...