Model compression as constrained optimization, with application to neural nets. Part I: general framework

Compressing neural nets is an active research problem, given the large size of state-of-the-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. We give a general formulation of model compression as constrained optimization. This includes many types of compression: quantization, low-rank decomposition, pruning, lossless compression and others. Then, we give a general algorithm to optimize this nonconvex problem based on the augmented Lagrangian and alternating optimization. This results in a "learning-compression" algorithm, which alternates a learning step of the uncompressed model, independent of the compression type, with a compression step of the model parameters, independent of the learning task. This simple, efficient algorithm is guaranteed to find the best compressed model for the task in a local sense under standard assumptions. We present separately in several companion papers the development of this general framework into specific algorithms for model compression based on quantization, pruning and other variations, including experimental results on compressing neural nets and other models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2021

Model compression as constrained optimization, with application to neural nets. Part V: combining compressions

Model compression is generally performed by using quantization, low-rank...
research
05/15/2020

A flexible, extensible software framework for model compression based on the LC algorithm

We propose a software framework based on the ideas of the Learning-Compr...
research
03/01/2019

TamperNN: Efficient Tampering Detection of Deployed Neural Nets

Neural networks are powering the deployment of embedded devices and Inte...
research
07/13/2017

Model compression as constrained optimization, with application to neural nets. Part II: quantization

We consider the problem of deep neural net compression by quantization: ...
research
10/31/2022

L-GreCo: An Efficient and General Framework for Layerwise-Adaptive Gradient Compression

Data-parallel distributed training of deep neural networks (DNN) has gai...
research
12/18/2018

Entropy-Constrained Training of Deep Neural Networks

We propose a general framework for neural network compression that is mo...
research
11/25/2019

The Convex Information Bottleneck Lagrangian

The information bottleneck (IB) problem tackles the issue of obtaining r...

Please sign up or login with your details

Forgot password? Click here to reset