Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks

10/29/2020
by   Julieta Martinez, et al.
0

Compressing large neural networks is an important step for their deployment in resource-constrained computational platforms. In this context, vector quantization is an appealing framework that expresses multiple parameters using a single code, and has recently achieved state-of-the-art network compression on a range of core vision and natural language processing tasks. Key to the success of vector quantization is deciding which parameter groups should be compressed together. Previous work has relied on heuristics that group the spatial dimension of individual convolutional filters, but a general solution remains unaddressed. This is desirable for pointwise convolutions (which dominate modern architectures), linear layers (which have no notion of spatial dimension), and convolutions (when more than one filter is compressed to the same codeword). In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress. Finally, we rely on an annealed quantization algorithm to better compress the network and achieve higher final accuracy. We show results on image classification, object detection, and segmentation, reducing the gap with the uncompressed model by 40 to 70

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2020

Training with Quantization Noise for Extreme Model Compression

We tackle the problem of producing compact models, maximizing their accu...
research
12/18/2014

Compressing Deep Convolutional Networks using Vector Quantization

Deep convolutional neural networks (CNN) has become the most promising m...
research
04/15/2020

Training with Quantization Noise for Extreme Fixed-Point Compression

We tackle the problem of producing compact models, maximizing their accu...
research
02/04/2021

Compressed Object Detection

Deep learning approaches have achieved unprecedented performance in visu...
research
05/24/2022

Wavelet Feature Maps Compression for Image-to-Image CNNs

Convolutional Neural Networks (CNNs) are known for requiring extensive c...
research
03/30/2016

Vector Quantization for Machine Vision

This paper shows how to reduce the computational cost for a variety of c...
research
07/12/2019

And the Bit Goes Down: Revisiting the Quantization of Neural Networks

In this paper, we address the problem of reducing the memory footprint o...

Please sign up or login with your details

Forgot password? Click here to reset