Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance

01/31/2023
by   Ian Colbert, et al.
0

We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference. We leverage weight normalization as a means of constraining parameters during training using accumulator bit width bounds that we derive. We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline. We then show that this reduction translates to increased design efficiency for custom FPGA-based accelerators. Finally, we show that our algorithm not only constrains weights to fit into an accumulator of user-defined bit width, but also increases the sparsity and compressibility of the resulting weights. Across all of our benchmark models trained with 8-bit weights and activations, we observe that constraining the hidden layers of quantized neural networks to fit into 16-bit accumulators yields an average 98.2 maintaining 99.2

READ FULL TEXT
research
08/25/2023

A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance

We present accumulator-aware quantization (A2Q), a novel weight quantiza...
research
05/29/2019

Instant Quantization of Neural Networks using Monte Carlo Methods

Low bit-width integer weights and activations are very important for eff...
research
09/18/2020

Searching for Low-Bit Weights in Quantized Neural Networks

Quantized neural networks with low-bit weights and activations are attra...
research
10/20/2017

Low Precision RNNs: Quantizing RNNs Without Losing Accuracy

Similar to convolution neural networks, recurrent neural networks (RNNs)...
research
09/27/2018

Scalar Arithmetic Multiple Data: Customizable Precision for Deep Neural Networks

Quantization of weights and activations in Deep Neural Networks (DNNs) i...
research
12/10/2022

Vertical Layering of Quantized Neural Networks for Heterogeneous Inference

Although considerable progress has been obtained in neural network quant...
research
06/04/2022

Combinatorial optimization for low bit-width neural networks

Low-bit width neural networks have been extensively explored for deploym...

Please sign up or login with your details

Forgot password? Click here to reset