BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

02/08/2020
by   Milos Nikolic, et al.
15

Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer quantization, yielding both execution time and energy benefits on existing hardware designs that support short bitlengths. However, the question of finding the minimum bitlength for a desired accuracy remains open. We introduce a training method for minimizing inference bitlength at any granularity while maintaining accuracy. Furthermore, we propose a regularizer that penalizes large bitlength representations throughout the architecture and show how it can be modified to minimize other quantifiable criteria, such as number of operations or memory footprint. We demonstrate that our method learns thrifty representations while maintaining accuracy. With ImageNet, the method produces an average per layer bitlength of 4.13 and 3.76 bits on AlexNet and ResNet18 respectively, remaining within 2.0 accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2023

Self-Compressing Neural Networks

This work focuses on reducing neural network size, which is a major driv...
research
02/18/2022

LG-LSQ: Learned Gradient Linear Symmetric Quantization

Deep neural networks with lower precision weights and operations at infe...
research
05/16/2023

MINT: Multiplier-less Integer Quantization for Spiking Neural Networks

We propose Multiplier-less INTeger (MINT) quantization, an efficient uni...
research
02/28/2019

End-to-End Efficient Representation Learning via Cascading Combinatorial Optimization

We develop hierarchically quantized efficient embedding representations ...
research
12/03/2022

Make RepVGG Greater Again: A Quantization-aware Approach

The tradeoff between performance and inference speed is critical for pra...
research
05/08/2020

GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference

Attention-based models have demonstrated remarkable success in various n...
research
05/21/2020

TASO: Time and Space Optimization for Memory-Constrained DNN Inference

Convolutional neural networks (CNNs) are used in many embedded applicati...

Please sign up or login with your details

Forgot password? Click here to reset