TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference

09/01/2020
by   Mostafa Mahmoud, et al.
0

TensorDash is a hardware level technique for enabling data-parallel MAC units to take advantage of sparsity in their input operand streams. When used to compose a hardware accelerator for deep learning, TensorDash can speedup the training process while also increasing energy efficiency. TensorDash combines a low-cost, sparse input operand interconnect comprising an 8-input multiplexer per multiplier input, with an area-efficient hardware scheduler. While the interconnect allows a very limited set of movements per operand, the scheduler can effectively extract sparsity when it is present in the activations, weights or gradients of neural networks. Over a wide set of models covering various applications, TensorDash accelerates the training process by 1.95× while being 1.89× more energy-efficient, 1.6× more energy efficient when taking on-chip and off-chip memory accesses into account. While TensorDash works with any datatype, we demonstrate it with both single-precision floating-point units and bfloat16.

READ FULL TEXT

page 5

page 9

research
01/21/2022

APack: Off-Chip, Lossless Data Compression for Efficient Deep Learning Inference

Data accesses between on- and off-chip memories account for a large frac...
research
11/03/2017

SparseNN: An Energy-Efficient Neural Network Accelerator Exploiting Input and Output Sparsity

Contemporary Deep Neural Network (DNN) contains millions of synaptic con...
research
10/15/2020

FPRaker: A Processing Element For Accelerating Neural Network Training

We present FPRaker, a processing element for composing training accelera...
research
03/02/2020

A New MRAM-based Process In-Memory Accelerator for Efficient Neural Network Training with Floating Point Precision

The excellent performance of modern deep neural networks (DNNs) comes at...
research
11/25/2020

AccSS3D: Accelerator for Spatially Sparse 3D DNNs

Semantic understanding and completion of real world scenes is a foundati...
research
04/17/2020

Efficient, arbitrarily high precision hardware logarithmic arithmetic for linear algebra

The logarithmic number system (LNS) is arguably not broadly used due to ...
research
10/07/2021

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

Bayesian Neural Networks (BNNs) that possess a property of uncertainty e...

Please sign up or login with your details

Forgot password? Click here to reset