EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

08/30/2019
by   Lukas Cavigelli, et al.
11

In the wake of the success of convolutional neural networks in image classification, object recognition, speech recognition, etc., the demand for deploying these compute-intensive ML models on embedded and mobile systems with tight power and energy constraints at low cost, as well as for boosting throughput in data centers, is growing rapidly. This has sparked a surge of research into specialized hardware accelerators. Their performance is typically limited by I/O bandwidth, power consumption is dominated by I/O transfers to off-chip memory, and on-chip memories occupy a large part of the silicon area. We introduce and evaluate a novel, hardware-friendly, and lossless compression scheme for the feature maps present within convolutional neural networks. We present hardware architectures and synthesis results for the compressor and decompressor in 65nm. With a throughput of one 8-bit word/cycle at 600MHz, they fit into 2.8kGE and 3.0kGE of silicon area, respectively - together the size of less than seven 8-bit multiply-add units at the same throughput. We show that an average compression ratio of 5.1x for AlexNet, 4x for VGG-16, 2.4x for ResNet-34 and 2.2x for MobileNetV2 can be achieved - a gain of 45-70 existing methods. Our approach also works effectively for various number formats, has a low frame-to-frame variance on the compression ratio, and achieves compression factors for gradient map compression during training that are even better than for inference.

READ FULL TEXT

page 1

page 9

page 11

research
10/01/2018

Extended Bit-Plane Compression for Convolutional Neural Network Accelerators

After the tremendous success of convolutional neural networks in image c...
research
12/17/2020

FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4bit-Compact Multilayer Perceptrons

With the growing demand for deploying deep learning models to the "edge"...
research
09/25/2019

CAT: Compression-Aware Training for bandwidth reduction

Convolutional neural networks (CNNs) have become the dominant neural net...
research
04/20/2021

Microshift: An Efficient Image Compression Algorithm for Hardware

In this paper, we propose a lossy image compression algorithm called Mic...
research
04/18/2021

Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks

Convolutional neural networks (CNNs) are emerging as powerful tools for ...
research
09/10/2019

Boosting Throughput and Efficiency of Hardware Spiking Neural Accelerators using Time Compression Supporting Multiple Spike Codes

Spiking neural networks (SNNs) are the third generation of neural networ...
research
01/19/2019

Surface Compression Using Dynamic Color Palettes

Off-chip memory traffic is a major source of power and energy consumptio...

Please sign up or login with your details

Forgot password? Click here to reset