Differentiable Model Compression via Pseudo Quantization Noise

04/20/2021
by   Alexandre Défossez, et al.
0

We propose to add independent pseudo quantization noise to model parameters during training to approximate the effect of a quantization operator. This method, DiffQ, is differentiable both with respect to the unquantized parameters, and the number of bits used. Given a single hyper-parameter expressing the desired balance between the quantized model size and accuracy, DiffQ can optimize the number of bits used per individual weight or groups of weights, in a single training. We experimentally verify that our method outperforms state-of-the-art quantization techniques on several benchmarks and architectures for image classification, language modeling, and audio source separation. For instance, on the Wikitext-103 language modeling benchmark, DiffQ compresses a 16 layers transformer model by a factor of 8, equivalent to 4 bits precision, while losing only 0.5 points of perplexity. Code is available at: https://github.com/facebookresearch/diffq

READ FULL TEXT
research
11/24/2021

Sharpness-aware Quantization for Deep Neural Networks

Network quantization is an effective compression method to reduce the mo...
research
12/11/2022

Error-aware Quantization through Noise Tempering

Quantization has become a predominant approach for model compression, en...
research
04/15/2020

Training with Quantization Noise for Extreme Model Compression

We tackle the problem of producing compact models, maximizing their accu...
research
05/23/2021

Post-Training Sparsity-Aware Quantization

Quantization is a technique used in deep neural networks (DNNs) to incre...
research
09/07/2019

LAMAL: LAnguage Modeling Is All You Need for Lifelong Language Learning

Most research on lifelong learning (LLL) applies to images or games, but...
research
02/15/2021

FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation

Learning convolutional neural networks (CNNs) with low bitwidth is chall...
research
04/15/2020

Training with Quantization Noise for Extreme Fixed-Point Compression

We tackle the problem of producing compact models, maximizing their accu...

Please sign up or login with your details

Forgot password? Click here to reset