Neural Networks with Quantization Constraints

10/27/2022
by   Ignacio Hounie, et al.
0

Enabling low precision implementations of deep learning models, without considerable performance degradation, is necessary in resource and latency constrained settings. Moreover, exploiting the differences in sensitivity to quantization across layers can allow mixed precision implementations to achieve a considerably better computation performance trade-off. However, backpropagating through the quantization operation requires introducing gradient approximations, and choosing which layers to quantize is challenging for modern architectures due to the large search space. In this work, we present a constrained learning approach to quantization aware training. We formulate low precision supervised learning as a constrained optimization problem, and show that despite its non-convexity, the resulting problem is strongly dual and does away with gradient estimations. Furthermore, we show that dual variables indicate the sensitivity of the objective with respect to constraint perturbations. We demonstrate that the proposed approach exhibits competitive performance in image classification tasks, and leverage the sensitivity result to apply layer selective quantization based on the value of dual variables, leading to considerable performance improvements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2021

Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization

Quantization is a widely used technique to compress and accelerate deep ...
research
11/10/2019

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

Quantization is an effective method for reducing memory footprint and in...
research
03/04/2021

Effective and Fast: A Novel Sequential Single Path Search for Mixed-Precision Quantization

Since model quantization helps to reduce the model size and computation ...
research
07/10/2023

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

Quantizing neural networks is one of the most effective methods for achi...
research
12/20/2022

CSMPQ:Class Separability Based Mixed-Precision Quantization

Mixed-precision quantization has received increasing attention for its c...
research
03/16/2022

Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance

The exponentially large discrete search space in mixed-precision quantiz...
research
08/02/2021

Multi-objective Recurrent Neural Networks Optimization for the Edge – a Quantization-based Approach

The compression of deep learning models is of fundamental importance in ...

Please sign up or login with your details

Forgot password? Click here to reset