SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations

Important workloads, such as machine learning and graph analytics applications, heavily involve sparse linear algebra operations. These operations use sparse matrix compression as an effective means to avoid storing zeros and performing unnecessary computation on zero elements. However, compression techniques like Compressed Sparse Row (CSR) that are widely used today introduce significant instruction overhead and expensive pointer-chasing operations to discover the positions of the non-zero elements. In this paper, we identify the discovery of the positions (i.e., indexing) of non-zero elements as a key bottleneck in sparse matrix-based workloads, which greatly reduces the benefits of compression. We propose SMASH, a hardware-software cooperative mechanism that enables highly-efficient indexing and storage of sparse matrices. The key idea of SMASH is to explicitly enable the hardware to recognize and exploit sparsity in data. To this end, we devise a novel software encoding based on a hierarchy of bitmaps. This encoding can be used to efficiently compress any sparse matrix, regardless of the extent and structure of sparsity. At the same time, the bitmap encoding can be directly interpreted by the hardware. We design a lightweight hardware unit, the Bitmap Management Unit (BMU), that buffers and scans the bitmap hierarchy to perform highly-efficient indexing of sparse matrices. SMASH exposes an expressive and rich ISA to communicate with the BMU, which enables its use in accelerating any sparse matrix computation. We demonstrate the benefits of SMASH on four use cases that include sparse matrix kernels and graph analytics applications.


page 5

page 10


Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra

Sparse linear algebra is crucial in many application domains, but challe...

SpChar: Characterizing the Sparse Puzzle via Decision Trees

Sparse matrix computation is crucial in various modern applications, inc...

Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra

This paper describes REAP, a software-hardware approach that enables hig...

PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices

Deep neural network (DNN) has emerged as the most important and popular ...

RSC: Accelerating Graph Neural Networks Training via Randomized Sparse Computations

The training of graph neural networks (GNNs) is extremely time consuming...

SpComp: A Sparsity Structure-Specific Compilation of Matrix Operations

Sparse matrix operations involve a large number of zero operands which m...

Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads

Sparse matrices are the key ingredients of several application domains, ...

Please sign up or login with your details

Forgot password? Click here to reset