PACT: Parameterized Clipping Activation for Quantized Neural Networks

05/16/2018
by   Jungwook Choi, et al.
0

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number of quantization schemes have been proposed - but most of these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes a novel quantization scheme for activations during training - that enables neural networks to work well with ultra low precision weights and activations without any significant accuracy degradation. This technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter α that is optimized during training to find the right quantization scale. PACT allows quantizing activations to arbitrary bit precisions, while achieving much better accuracy relative to published state-of-the-art quantization schemes. We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. We also show that exploiting these reduced-precision computational units in hardware can enable a super-linear improvement in inferencing performance due to a significant reduction in the area of accelerator compute engines coupled with the ability to retain the quantized model and activation data in on-chip memories.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2018

Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN)

Deep learning algorithms achieve high classification accuracy at the exp...
research
04/20/2018

Value-aware Quantization for Training and Inference of Neural Networks

We propose a novel value-aware quantization which applies aggressively r...
research
01/09/2020

Least squares binary quantization of neural networks

Quantizing weights and activations of deep neural networks results in si...
research
12/21/2019

Towards Efficient Training for Neural Network Quantization

Quantization reduces computation costs of neural networks but suffers fr...
research
12/04/2019

RTN: Reparameterized Ternary Network

To deploy deep neural networks on resource-limited devices, quantization...
research
09/14/2022

Analysis of Quantization on MLP-based Vision Models

Quantization is wildly taken as a model compression technique, which obt...
research
06/03/2019

A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off

Reducing the precision of weights and activation functions in neural net...

Please sign up or login with your details

Forgot password? Click here to reset