SMASH: Sparse Matrix Atomic Scratchpad Hashing

05/29/2021
by   Kaustubh Shivdikar, et al.
15

Sparse matrices, more specifically SpGEMM kernels, are commonly found in a wide range of applications, spanning graph-based path-finding to machine learning algorithms (e.g., neural networks). A particular challenge in implementing SpGEMM kernels has been the pressure placed on DRAM memory. One approach to tackle this problem is to use an inner product method for the SpGEMM kernel implementation. While the inner product produces fewer intermediate results, it can end up saturating the memory bandwidth, given the high number of redundant fetches of the input matrix elements. Using an outer product-based SpGEMM kernel can reduce redundant fetches, but at the cost of increased overhead due to extra computation and memory accesses for producing/managing partial products. In this thesis, we introduce a novel SpGEMM kernel implementation based on the row-wise product approach. We leverage atomic instructions to merge intermediate partial products as they are generated. The use of atomic instructions eliminates the need to create partial product matrices. To evaluate our row-wise product approach, we map an optimized SpGEMM kernel to a custom accelerator designed to accelerate graph-based applications. The targeted accelerator is an experimental system named PIUMA, being developed by Intel. PIUMA provides several attractive features, including fast context switching, user-configurable caches, globally addressable memory, non-coherent caches, and asynchronous pipelines. We tailor our SpGEMM kernel to exploit many of the features of the PIUMA fabric. This thesis compares our SpGEMM implementation against prior solutions, all mapped to the PIUMA framework. We briefly describe some of the PIUMA architecture features and then delve into the details of our optimized SpGEMM kernel. Our SpGEMM kernel can achieve 9.4x speedup as compared to competing approaches.

READ FULL TEXT

page 17

page 22

page 34

page 37

page 39

page 40

research
02/20/2020

SpArch: Efficient Architecture for Sparse Matrix Multiplication

Generalized Sparse Matrix-Matrix Multiplication (SpGEMM) is a ubiquitous...
research
07/08/2023

Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels

We propose Rosko – row skipping outer products – for deriving sparse mat...
research
11/24/2016

Automating the Last-Mile for High Performance Dense Linear Algebra

High performance dense linear algebra (DLA) libraries often rely on a ge...
research
01/03/2018

Computing the Sparse Matrix Vector Product using Block-Based Kernels Without Zero Padding on Processors with AVX-512 Instructions

The sparse matrix-vector product (SpMV) is a fundamental operation in ma...
research
02/09/2023

DeepCAM: A Fully CAM-based Inference Accelerator with Variable Hash Lengths for Energy-efficient Deep Neural Networks

With ever increasing depth and width in deep neural networks to achieve ...
research
01/25/2023

Flexagon: A Multi-Dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing

Sparsity is a growing trend in modern DNN models. Existing Sparse-Sparse...
research
09/15/2019

Inner-product Kernels are Asymptotically Equivalent to Binary Discrete Kernels

This article investigates the eigenspectrum of the inner product-type ke...

Please sign up or login with your details

Forgot password? Click here to reset