Ternary Residual Networks

07/15/2017
by   Abhisek Kundu, et al.
0

Sub-8-bit representation of DNNs incur some discernible loss of accuracy despite rigorous (re)training at low-precision. Such loss of accuracy essentially makes them equivalent to a much shallower counterpart, diminishing the power of being deep networks. To address this problem of accuracy drop we introduce the notion of residual networks where we add more low-precision edges to sensitive branches of the sub-8-bit network to compensate for the lost accuracy. Further, we present a perturbation theory to identify such sensitive edges. Aided by such an elegant trade-off between accuracy and compute, the 8-2 model (8-bit activations, ternary weights), enhanced by ternary residual edges, turns out to be sophisticated enough to achieve very high accuracy (∼ 1% drop from our FP-32 baseline), despite ∼ 1.6× reduction in model size, ∼ 26× reduction in number of multiplications, and potentially ∼ 2× power-performance gain comparing to 8-8 representation, on the state-of-the-art deep network ResNet-101 pre-trained on ImageNet dataset. Moreover, depending on the varying accuracy requirements in a dynamic environment, the deployed low-precision model can be upgraded/downgraded on-the-fly by partially enabling/disabling residual connections. For example, disabling the least important residual connections in the above enhanced network, the accuracy drop is ∼ 2% (from FP32), despite ∼ 1.9× reduction in model size, ∼ 32× reduction in number of multiplications, and potentially ∼ 2.3× power-performance gain comparing to 8-8 representation. Finally, all the ternary connections are sparse in nature, and the ternary residual conversion can be done in a resource-constraint setting with no low-precision (re)training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2017

Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point

We propose a cluster-based quantization method to convert pre-trained fu...
research
05/02/2017

Ternary Neural Networks with Fine-Grained Quantization

We propose a novel fine-grained quantization (FGQ) method to ternarize p...
research
07/20/2018

Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy

Deep convolution neural network has achieved great success in many artif...
research
10/20/2017

Low Precision RNNs: Quantizing RNNs Without Losing Accuracy

Similar to convolution neural networks, recurrent neural networks (RNNs)...
research
09/11/2018

Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference

To realize the promise of ubiquitous embedded deep network inference, it...
research
06/10/2019

Network Implosion: Effective Model Compression for ResNets via Static Layer Pruning and Retraining

Residual Networks with convolutional layers are widely used in the field...

Please sign up or login with your details

Forgot password? Click here to reset