Model compression as constrained optimization, with application to neural nets. Part V: combining compressions

Model compression is generally performed by using quantization, low-rank approximation or pruning, for which various algorithms have been researched in recent years. One fundamental question is: what types of compression work better for a given model? Or even better: can we improve by combining compressions in a suitable way? We formulate this generally as a problem of optimizing the loss but where the weights are constrained to equal an additive combination of separately compressed parts; and we give an algorithm to learn the corresponding parts' parameters. Experimentally with deep neural nets, we observe that 1) we can find significantly better models in the error-compression space, indicating that different compression types have complementary benefits, and 2) the best type of combination depends exquisitely on the type of neural net. For example, we can compress ResNets and AlexNet using only 1 bit per weight without error degradation at the cost of adding a few floating point weights. However, VGG nets can be better compressed by combining low-rank with a few floating point weights.

READ FULL TEXT

page 2

page 25

research
07/05/2017

Model compression as constrained optimization, with application to neural nets. Part I: general framework

Compressing neural nets is an active research problem, given the large s...
research
07/13/2017

Model compression as constrained optimization, with application to neural nets. Part II: quantization

We consider the problem of deep neural net compression by quantization: ...
research
10/30/2018

DeepTwist: Learning Model Compression via Occasional Weight Distortion

Model compression has been introduced to reduce the required hardware re...
research
05/15/2020

A flexible, extensible software framework for model compression based on the LC algorithm

We propose a software framework based on the ideas of the Learning-Compr...
research
03/29/2021

Deep Compression for PyTorch Model Deployment on Microcontrollers

Neural network deployment on low-cost embedded systems, hence on microco...
research
11/21/2022

Learning Low-Rank Representations for Model Compression

Vector Quantization (VQ) is an appealing model compression method to obt...
research
07/22/2020

Compressing invariant manifolds in neural nets

We study how neural networks compress uninformative input space in model...

Please sign up or login with your details

Forgot password? Click here to reset