Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads

11/22/2020
by   Bahar Asgari, et al.
0

Sparse matrices are the key ingredients of several application domains, from scientific computation to machine learning. The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse formats have been proposed to significantly eliminate zero entries. Such formats, essentially designed to optimize memory footprint, may not be as successful in performing faster processing. In other words, although they allow faster data transfer and improve memory bandwidth utilization – the classic challenge of sparse problems – their decompression mechanism can potentially create a computation bottleneck. Not only is this challenge not resolved, but also it becomes more serious with the advent of domain-specific architectures (DSAs), as they intend to more aggressively improve performance. The performance implications of using various formats along with DSAs, however, has not been extensively studied by prior work. To fill this gap of knowledge, we characterize the impact of using seven frequently used sparse formats on performance, based on a DSA for sparse matrix-vector multiplication (SpMV), implemented on an FPGA using high-level synthesis (HLS) tools, a growing and popular method for developing DSAs. Seeking a fair comparison, we tailor and optimize the HLS implementation of decompression for each format. We thoroughly explore diverse metrics, including decompression overhead, latency, balance ratio, throughput, memory bandwidth utilization, resource utilization, and power consumption, on a variety of real-world and synthetic sparse workloads.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 8

page 9

page 10

page 11

research
07/26/2022

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications

Applications with low data reuse and frequent irregular memory accesses,...
research
12/23/2011

Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

Sparse matrix-vector multiplication (spMVM) is the dominant operation in...
research
05/29/2020

Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format

Multiplication of a sparse matrix to a dense matrix (SpDM) is widely use...
research
09/30/2019

Memory Centric Characterization and Analysis of SPEC CPU2017 Suite

In this paper we provide a comprehensive, memory-centric characterizatio...
research
01/21/2021

Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir Computing

Reservoir computing systems rely on the recurrent multiplication of a ve...
research
10/23/2019

SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations

Important workloads, such as machine learning and graph analytics applic...
research
08/08/2023

Novel Area-Efficient and Flexible Architectures for Optimal Ate Pairing on FPGA

While FPGA is a suitable platform for implementing cryptographic algorit...

Please sign up or login with your details

Forgot password? Click here to reset