CAT: Compression-Aware Training for bandwidth reduction

09/25/2019
by   Chaim Baskin, et al.
0

Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory bandwidth requirements, which can be a main energy consumer and throughput bottleneck in hardware accelerators. Accordingly, an efficient feature map compression method can result in substantial performance gains. Inspired by quantization-aware training approaches, we propose a compression-aware training (CAT) method that involves training the model in a way that allows better compression of feature maps during inference. Our method trains the model to achieve low-entropy feature maps, which enables efficient compression at inference time using classical transform coding methods. CAT significantly improves the state-of-the-art results reported for quantization. For example, on ResNet-34 we achieve 73.1 baseline) with an average representation of only 1.79 bits per value. Reference implementation accompanies the paper at https://github.com/CAT-teams/CAT

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2019

Feature Map Transform Coding for Energy-Efficient CNN Inference

Convolutional neural networks (CNNs) achieve state-of-the-art accuracy i...
research
10/12/2021

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

Existing deep convolutional neural networks (CNNs) generate massive inte...
research
08/30/2019

EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

In the wake of the success of convolutional neural networks in image cla...
research
10/01/2018

Extended Bit-Plane Compression for Convolutional Neural Network Accelerators

After the tremendous success of convolutional neural networks in image c...
research
10/27/2022

Improved Projection Learning for Lower Dimensional Feature Maps

The requirement to repeatedly move large feature maps off- and on-chip d...
research
10/18/2021

Wideband and Entropy-Aware Deep Soft Bit Quantization

Deep learning has been recently applied to physical layer processing in ...
research
10/17/2021

Compression-aware Projection with Greedy Dimension Reduction for Convolutional Neural Network Activations

Convolutional neural networks (CNNs) achieve remarkable performance in a...

Please sign up or login with your details

Forgot password? Click here to reset