Model compression as constrained optimization, with application to neural nets. Part II: quantization

We consider the problem of deep neural net compression by quantization: given a large, reference net, we want to quantize its real-valued weights using a codebook with K entries so that the training loss of the quantized net is minimal. The codebook can be optimally learned jointly with the net, or fixed, as for binarization or ternarization approaches. Previous work has quantized the weights of the reference net, or incorporated rounding operations in the backpropagation algorithm, but this has no guarantee of converging to a loss-optimal, quantized net. We describe a new approach based on the recently proposed framework of model compression as constrained optimization Carreir17a. This results in a simple iterative "learning-compression" algorithm, which alternates a step that learns a net of continuous weights with a step that quantizes (or binarizes/ternarizes) the weights, and is guaranteed to converge to local optimum of the loss for quantized nets. We develop algorithms for an adaptive codebook or a (partially) fixed codebook. The latter includes binarization, ternarization, powers-of-two and other important particular cases. We show experimentally that we can achieve much higher compression rates than previous quantization work (even using just 1 bit per weight) with negligible loss degradation.

READ FULL TEXT

page 6

page 16

page 27

page 28

research
05/29/2018

Retraining-Based Iterative Weight Quantization for Deep Neural Networks

Model compression has gained a lot of attention due to its ability to re...
research
10/11/2022

Deep learning model compression using network sensitivity and gradients

Deep learning model compression is an improving and important field for ...
research
05/12/2023

∂𝔹 nets: learning discrete functions by gradient descent

∂𝔹 nets are differentiable neural networks that learn discrete boolean-v...
research
07/09/2021

Model compression as constrained optimization, with application to neural nets. Part V: combining compressions

Model compression is generally performed by using quantization, low-rank...
research
07/05/2017

Model compression as constrained optimization, with application to neural nets. Part I: general framework

Compressing neural nets is an active research problem, given the large s...
research
06/11/2019

Table-Based Neural Units: Fully Quantizing Networks for Multiply-Free Inference

In this work, we propose to quantize all parts of standard classificatio...
research
07/07/2017

A multi-layer image representation using Regularized Residual Quantization: application to compression and denoising

A learning-based framework for representation of domain-specific images ...

Please sign up or login with your details

Forgot password? Click here to reset