Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference

05/16/2020
by   Zhi-Gang Liu, et al.
0

Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). The systolic array (SA) is a pipelined 2D array of processing elements (PEs), with very efficient local data movement, well suited to accelerating GEMM, and widely deployed in industry. In this work, we describe two significant improvements to the traditional SA architecture, to specifically optimize for CNN inference. Firstly, we generalize the traditional scalar PE, into a Tensor-PE, which gives rise to a family of new Systolic Tensor Array (STA) microarchitectures. The STA family increases intra-PE operand reuse and datapath efficiency, resulting in circuit area and power dissipation reduction of as much as 2.08x and 1.36x respectively, compared to the conventional SA at iso-throughput with INT8 operands. Secondly, we extend this design to support a novel block-sparse data format called density-bound block (DBB). This variant (STA-DBB) achieves a 3.14x and 1.97x improvement over the SA baseline at iso-throughput in area and power respectively, when processing specially-trained DBB-sparse models, while remaining fully backwards compatible with dense models.

READ FULL TEXT

page 1

page 4

research
09/04/2020

Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration

Convolutional neural network (CNN) inference on mobile devices demands e...
research
07/16/2021

S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration

Exploiting sparsity is a key technique in accelerating quantized convolu...
research
03/10/2018

Towards a Multi-array Architecture for Accelerating Large-scale Matrix Multiplication on FPGAs

Large-scale floating-point matrix multiplication is a fundamental kernel...
research
07/28/2021

SPOTS: An Accelerator for Sparse Convolutional Networks Leveraging Systolic General Matrix-Matrix Multiplication

This paper proposes a new hardware accelerator for sparse convolutional ...
research
07/20/2020

HPIPE: Heterogeneous Layer-Pipelined and Sparse-Aware CNN Inference for FPGAs

We present both a novel Convolutional Neural Network (CNN) accelerator a...
research
09/06/2023

The Case for Asymmetric Systolic Array Floorplanning

The widespread proliferation of deep learning applications has triggered...
research
11/25/2021

A Dense Tensor Accelerator with Data Exchange Mesh for DNN and Vision Workloads

We propose a dense tensor accelerator called VectorMesh, a scalable, mem...

Please sign up or login with your details

Forgot password? Click here to reset