A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance

08/25/2023
by   Ian Colbert, et al.
0

We present accumulator-aware quantization (A2Q), a novel weight quantization method designed to train quantized neural networks (QNNs) to avoid overflow when using low-precision accumulators during inference. A2Q introduces a unique formulation inspired by weight normalization that constrains the L1-norm of model weights according to accumulator bit width bounds that we derive. Thus, in training QNNs for low-precision accumulation, A2Q also inherently promotes unstructured weight sparsity to guarantee overflow avoidance. We apply our method to deep learning-based computer vision tasks to show that A2Q can train QNNs for low-precision accumulators while maintaining model accuracy competitive with a floating-point baseline. In our evaluations, we consider the impact of A2Q on both general-purpose platforms and programmable hardware. However, we primarily target model deployment on FPGAs because they can be programmed to fully exploit custom accumulator bit widths. Our experimentation shows accumulator bit width significantly impacts the resource efficiency of FPGA-based accelerators. On average across our benchmarks, A2Q offers up to a 2.3x reduction in resource utilization over 32-bit accumulator counterparts with 99.2

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2023

Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance

We introduce a quantization-aware training algorithm that guarantees avo...
research
05/21/2023

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

Efficient deployment of large language models (LLMs) necessitates low-bi...
research
03/09/2022

Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural Networks

Deploying Deep Neural Networks in low-power embedded devices for real ti...
research
04/18/2023

DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables

A lot of recent progress has been made in ultra low-bit quantization, pr...
research
07/12/2022

Synergistic Self-supervised and Quantization Learning

With the success of self-supervised learning (SSL), it has become a main...
research
02/08/2022

Binary Neural Networks as a general-propose compute paradigm for on-device computer vision

For binary neural networks (BNNs) to become the mainstream on-device com...
research
05/24/2023

RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

With the rapid increase in the size of neural networks, model compressio...

Please sign up or login with your details

Forgot password? Click here to reset