S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks

06/15/2021
by   Jianlei Yang, et al.
0

Convolutional neural networks (CNNs) have achieved great success in performing cognitive tasks. However, execution of CNNs requires a large amount of computing resources and generates heavy memory traffic, which imposes a severe challenge on computing system design. Through optimizing parallel executions and data reuse in convolution, systolic architecture demonstrates great advantages in accelerating CNN computations. However, regular internal data transmission path in traditional systolic architecture prevents the systolic architecture from completely leveraging the benefits introduced by neural network sparsity. Deployment of fine-grained sparsity on the existing systolic architectures is greatly hindered by the incurred computational overheads. In this work, we propose S2Engine - a novel systolic architecture that can fully exploit the sparsity in CNNs with maximized data reuse. S2Engine transmits compressed data internally and allows each processing element to dynamically select an aligned data from the compressed dataflow in convolution. Compared to the naive systolic array, S2Engine achieves about 3.2× and about 3.0× improvements on speed and energy efficiency, respectively.

READ FULL TEXT

page 2

page 4

page 5

page 6

page 7

page 8

page 11

page 12

research
10/30/2018

MPNA: A Massively-Parallel Neural Array Accelerator with Dataflow Optimization for Convolutional Neural Networks

The state-of-the-art accelerators for Convolutional Neural Networks (CNN...
research
07/21/2020

SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional Neural Networks Training

Training Convolutional Neural Networks (CNNs) usually requires a large n...
research
12/17/2022

FSCNN: A Fast Sparse Convolution Neural Network Inference System

Convolution neural networks (CNNs) have achieved remarkable success, but...
research
04/02/2019

DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis

Training convolutional neural networks (CNNs) requires intense compute t...
research
04/01/2020

Efficient Implementation of Multi-Channel Convolution in Monolithic 3D ReRAM Crossbar

Convolutional neural networks (CNNs) demonstrate promising accuracy in a...
research
06/08/2019

5 Parallel Prism: A topology for pipelined implementations of convolutional neural networks using computational memory

In-memory computing is an emerging computing paradigm that could enable ...
research
01/31/2018

Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in Convolutional Networks

While CNNs naturally lend themselves to densely sampled data, and sophis...

Please sign up or login with your details

Forgot password? Click here to reset