Channel-wise Hessian Aware trace-Weighted Quantization of Neural Networks

08/19/2020
by   Xu Qian, et al.
0

Second-order information has proven to be very effective in determining the redundancy of neural network weights and activations. Recent paper proposes to use Hessian traces of weights and activations for mixed-precision quantization and achieves state-of-the-art results. However, prior works only focus on selecting bits for each layer while the redundancy of different channels within a layer also differ a lot. This is mainly because the complexity of determining bits for each channel is too high for original methods. Here, we introduce Channel-wise Hessian Aware trace-Weighted Quantization (CW-HAWQ). CW-HAWQ uses Hessian trace to determine the relative sensitivity order of different channels of activations and weights. What's more, CW-HAWQ proposes to use deep Reinforcement learning (DRL) Deep Deterministic Policy Gradient (DDPG)-based agent to find the optimal ratios of different quantization bits and assign bits to channels according to the Hessian trace order. The number of states in CW-HAWQ is much smaller compared with traditional AutoML based mix-precision methods since we only need to search ratios for the quantization bits. Compare CW-HAWQ with state-of-the-art shows that we can achieve better results for multiple networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2019

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

Quantization is an effective method for reducing memory footprint and in...
research
06/12/2023

Resource Efficient Neural Networks Using Hessian Based Pruning

Neural network pruning is a practical way for reducing the size of train...
research
02/10/2022

Quantization in Layer's Input is Matter

In this paper, we will show that the quantization in layer's input is mo...
research
03/17/2020

Efficient Bitwidth Search for Practical Mixed Precision Neural Network

Network quantization has rapidly become one of the most widely used meth...
research
04/29/2021

Hessian Aware Quantization of Spiking Neural Networks

To achieve the low latency, high throughput, and energy efficiency benef...
research
02/14/2022

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation

Quantization of deep neural networks (DNN) has been proven effective for...
research
12/02/2021

Equal Bits: Enforcing Equally Distributed Binary Network Weights

Binary networks are extremely efficient as they use only two symbols to ...

Please sign up or login with your details

Forgot password? Click here to reset