DeepAI AI Chat
Log In Sign Up

Trained Uniform Quantization for Accurate and Efficient Neural Network Inference on Fixed-Point Hardware

03/19/2019
by   Sambhav R. Jain, et al.
Xilinx Inc.
Stanford University
0

We propose a method of training quantization clipping thresholds for uniform symmetric quantizers using standard backpropagation and gradient descent. Our quantizers are constrained to use power-of-2 scale-factors and per-tensor scaling for weights and activations. These constraints make our methods better suited for hardware implementations. Training with these difficult constraints is enabled by a combination of three techniques: using accurate threshold gradients to achieve range-precision trade-off, training thresholds in log-domain, and training with an adaptive gradient optimizer. We refer to this collection of techniques as Adaptive-Gradient Log-domain Threshold Training (ALT). We present analytical support for the general robustness of our methods and empirically validate them on various CNNs for ImageNet classification. We are able to achieve floating-point or near-floating-point accuracy on traditionally difficult networks such as MobileNets in less than 5 epochs of quantized (8-bit) retraining. Finally, we present Graffitist, a framework that enables immediate quantization of TensorFlow graphs using our methods. Code available at https://github.com/Xilinx/graffitist .

READ FULL TEXT
06/21/2018

Quantizing deep convolutional networks for efficient inference: A whitepaper

We present an overview of techniques for quantizing convolutional neural...
02/08/2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Quantization enables efficient acceleration of deep neural networks by r...
02/10/2017

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

This paper presents incremental network quantization (INQ), a novel meth...
06/15/2020

Neural gradients are lognormally distributed: understanding sparse and quantized training

Neural gradient compression remains a main bottleneck in improving train...
11/29/2021

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

The nonuniform quantization strategy for compressing neural networks usu...
10/26/2021

Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes

Quantization is a popular technique that transforms the parameter repres...
02/16/2023

Shared Microexponents: A Little Shifting Goes a Long Way

This paper introduces Block Data Representations (BDR), a framework for ...

Code Repositories

graffitist

Graph Transforms to Quantize and Retrain Deep Neural Nets in TensorFlow.


view repo