Efficient Distributed-Memory Parallel Matrix-Vector Multiplication with Wide or Tall Unstructured Sparse Matrices

12/03/2018
by   Jonathan Eckstein, et al.
0

This paper presents an efficient technique for matrix-vector and vector-transpose-matrix multiplication in distributed-memory parallel computing environments, where the matrices are unstructured, sparse, and have a substantially larger number of columns than rows or vice versa. Our method allows for parallel I/O, does not require extensive preprocessing, and has the same communication complexity as matrix-vector multiplies with column or row partitioning. Our implementation of the method uses MPI. We partition the matrix by individual nonzero elements, rather than by row or column, and use an "overlapped" vector representation that is matched to the matrix. The transpose multiplies use matrix-specific MPI communicators and reductions that we show can be set up in an efficient manner. The proposed technique achieves a good work per processor balance even if some of the columns are dense, while keeping communication costs relatively low.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2017

Sparse Matrix Multiplication On An Associative Processor

Sparse matrix multiplication is an important component of linear algebra...
research
03/04/2023

Optimization of SpGEMM with Risc-V vector instructions

The Sparse GEneral Matrix-Matrix multiplication (SpGEMM) C = A × B is a ...
research
05/25/2020

On Optimal Partitioning For Sparse Matrices In Variable Block Row Format

The Variable Block Row (VBR) format is an influential blocked sparse mat...
research
06/27/2012

Matrix Tile Analysis

Many tasks require finding groups of elements in a matrix of numbers, sy...
research
11/15/2021

Recognizing Series-Parallel Matrices in Linear Time

A series-parallel matrix is a binary matrix that can be obtained from an...
research
02/17/2022

Fast Dynamic Updates and Dynamic SpGEMM on MPI-Distributed Graphs

Sparse matrix multiplication (SpGEMM) is a fundamental kernel used in ma...
research
07/31/2020

Load Plus Communication Balancing in Contiguous Partitions for Distributed Sparse Matrices: Linear-Time Algorithms

We study partitioning to parallelize multiplication of one or more dense...

Please sign up or login with your details

Forgot password? Click here to reset