DeepAI AI Chat
Log In Sign Up

Trained Uniform Quantization for Accurate and Efficient Neural Network Inference on Fixed-Point Hardware

by   Sambhav R. Jain, et al.
Xilinx Inc.
Stanford University

We propose a method of training quantization clipping thresholds for uniform symmetric quantizers using standard backpropagation and gradient descent. Our quantizers are constrained to use power-of-2 scale-factors and per-tensor scaling for weights and activations. These constraints make our methods better suited for hardware implementations. Training with these difficult constraints is enabled by a combination of three techniques: using accurate threshold gradients to achieve range-precision trade-off, training thresholds in log-domain, and training with an adaptive gradient optimizer. We refer to this collection of techniques as Adaptive-Gradient Log-domain Threshold Training (ALT). We present analytical support for the general robustness of our methods and empirically validate them on various CNNs for ImageNet classification. We are able to achieve floating-point or near-floating-point accuracy on traditionally difficult networks such as MobileNets in less than 5 epochs of quantized (8-bit) retraining. Finally, we present Graffitist, a framework that enables immediate quantization of TensorFlow graphs using our methods. Code available at .


Quantizing deep convolutional networks for efficient inference: A whitepaper

We present an overview of techniques for quantizing convolutional neural...

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Quantization enables efficient acceleration of deep neural networks by r...

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

This paper presents incremental network quantization (INQ), a novel meth...

Neural gradients are lognormally distributed: understanding sparse and quantized training

Neural gradient compression remains a main bottleneck in improving train...

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

The nonuniform quantization strategy for compressing neural networks usu...

Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes

Quantization is a popular technique that transforms the parameter repres...

Shared Microexponents: A Little Shifting Goes a Long Way

This paper introduces Block Data Representations (BDR), a framework for ...

Code Repositories


Graph Transforms to Quantize and Retrain Deep Neural Nets in TensorFlow.

view repo