Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks

06/15/2022
by   Clemens JS Schaefer, et al.
0

The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference, facilitating the use of DNNs on edge computing platforms. Recent efforts at quantizing DNNs have employed a range of techniques encompassing progressive quantization, step-size adaptation, and gradient scaling. This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing. Our method establishes a new pareto frontier in model accuracy and memory footprint demonstrating a range of quantized models, delivering best-in-class accuracy below 4.3 MB of weights (wgts.) and activations (acts.). Our main contributions are: (i) hardware-aware heterogeneous differentiable quantization with tensor-sliced learned precision, (ii) targeted gradient modification for wgts. and acts. to mitigate quantization errors, and (iii) a multi-phase learning schedule to address instability in learning arising from updates to the learned quantizer and model parameters. We demonstrate the effectiveness of our techniques on the ImageNet dataset across a range of models including EfficientNet-Lite0 (e.g., 4.14MB of wgts. and acts. at 67.66 MobileNetV2 (e.g., 3.51MB wgts. and acts. at 65.39

READ FULL TEXT
research
05/27/2019

Differentiable Quantization of Deep Neural Networks

We propose differentiable quantization (DQ) for efficient deep neural ne...
research
10/22/2020

On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks

We present two methods to reduce the complexity of Bayesian network (BN)...
research
09/10/2020

QuantNet: Learning to Quantize by Learning within Fully Differentiable Framework

Despite the achievements of recent binarization methods on reducing the ...
research
03/26/2019

Robustness of Neural Networks to Parameter Quantization

Quantization, a commonly used technique to reduce the memory footprint o...
research
02/21/2019

Learned Step Size Quantization

We present here Learned Step Size Quantization, a method for training de...
research
09/01/2020

Training Deep Neural Networks with Constrained Learning Parameters

Today's deep learning models are primarily trained on CPUs and GPUs. Alt...
research
09/12/2018

FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks

Convolutional Neural Networks have rapidly become the most successful ma...

Please sign up or login with your details

Forgot password? Click here to reset