PopSparse: Accelerated block sparse matrix multiplication on IPU

03/29/2023
by   Zhiyi Li, et al.
0

Reducing the computational cost of running large scale neural networks using sparsity has attracted great attention in the deep learning community. While much success has been achieved in reducing FLOP and parameter counts while maintaining acceptable task performance, achieving actual speed improvements has typically been much more difficult, particularly on general purpose accelerators (GPAs) such as NVIDIA GPUs using low precision number formats. In this work we introduce PopSparse, a library that enables fast sparse operations on Graphcore IPUs by leveraging both the unique hardware characteristics of IPUs as well as any block structure defined in the data. We target two different types of sparsity: static, where the sparsity pattern is fixed at compile-time; and dynamic, where it can change each time the model is run. We present benchmark results for matrix multiplication for both of these modes on IPU with a range of block sizes, matrix sizes and densities. Results indicate that the PopSparse implementations are faster than dense matrix multiplications on IPU at a range of sparsity levels with large matrix size and block size. Furthermore, static sparsity in general outperforms dynamic sparsity. While previous work on GPAs has shown speedups only for very high sparsity (typically 99% and above), the present work demonstrates that our static sparse implementation outperforms equivalent dense calculations in FP16 at lower sparsity (around 90

READ FULL TEXT

page 4

page 13

research
06/18/2020

Sparse GPU Kernels for Deep Learning

Scientific workloads have traditionally exploited high levels of sparsit...
research
05/29/2017

Increasing the Efficiency of Sparse Matrix-Matrix Multiplication with a 2.5D Algorithm and One-Sided MPI

Matrix-matrix multiplication is a basic operation in linear algebra and ...
research
11/23/2020

The Chunks and Tasks Matrix Library 2.0

We present a C++ header-only parallel sparse matrix library, based on sp...
research
10/10/2019

DBCSR: A Library for Dense Matrix Multiplications on Distributed GPU-Accelerated Systems

Most, if not all the modern scientific simulation packages utilize matri...
research
01/26/2023

SparDA: Accelerating Dynamic Sparse Deep Neural Networks via Sparse-Dense Transformation

Due to its high cost-effectiveness, sparsity has become the most importa...
research
03/14/2023

Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization

Performance optimization is an increasingly challenging but often repeti...
research
01/25/2023

Flexagon: A Multi-Dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing

Sparsity is a growing trend in modern DNN models. Existing Sparse-Sparse...

Please sign up or login with your details

Forgot password? Click here to reset