Minimizing Area and Energy of Deep Learning Hardware Design Using Collective Low Precision and Structured Compression

by   Shihui Yin, et al.

Deep learning algorithms have shown tremendous success in many recognition tasks; however, these algorithms typically include a deep neural network (DNN) structure and a large number of parameters, which makes it challenging to implement them on power/area-constrained embedded platforms. To reduce the network size, several studies investigated compression by introducing element-wise or row-/column-/block-wise sparsity via pruning and regularization. In addition, many recent works have focused on reducing precision of activations and weights with some reducing down to a single bit. However, combining various sparsity structures with binarized or very-low-precision (2-3 bit) neural networks have not been comprehensively explored. In this work, we present design techniques for minimum-area/-energy DNN hardware with minimal degradation in accuracy. During training, both binarization/low-precision and structured sparsity are applied as constraints to find the smallest memory footprint for a given deep learning algorithm. The DNN model for CIFAR-10 dataset with weight memory reduction of 50X exhibits accuracy comparable to that of the floating-point counterpart. Area, performance and energy results of DNN hardware in 40nm CMOS are reported for the MNIST dataset. The optimized DNN that combines 8X structured compression and 3-bit weight precision showed 98.4


page 1

page 2

page 3

page 4


Deep Learning Training on the Edge with Low-Precision Posits

Recently, the posit numerical format has shown promise for DNN data repr...

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Quantization enables efficient acceleration of deep neural networks by r...

Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks

Training with larger number of parameters while keeping fast iterations ...

Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks

Deep neural networks are gaining in popularity as they are used to gener...

Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update

Representing deep neural networks (DNNs) in low-precision is a promising...

Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks

Efforts to reduce the numerical precision of computations in deep learni...

Revisiting BFloat16 Training

State-of-the-art generic low-precision training algorithms use a mix of ...

Please sign up or login with your details

Forgot password? Click here to reset