A flexible, extensible software framework for model compression based on the LC algorithm

05/15/2020
by   Yerlan Idelbayev, et al.
0

We propose a software framework based on the ideas of the Learning-Compression (LC) algorithm, that allows a user to compress a neural network or other machine learning model using different compression schemes with minimal effort. Currently, the supported compressions include pruning, quantization, low-rank methods (including automatically learning the layer ranks), and combinations of those, and the user can choose different compression types for different parts of a neural network. The LC algorithm alternates two types of steps until convergence: a learning (L) step, which trains a model on a dataset (using an algorithm such as SGD); and a compression (C) step, which compresses the model parameters (using a compression scheme such as low-rank or quantization). This decoupling of the "machine learning" aspect from the "signal compression" aspect means that changing the model or the compression type amounts to calling the corresponding subroutine in the L or C step, respectively. The library fully supports this by design, which makes it flexible and extensible. This does not come at the expense of performance: the runtime needed to compress a model is comparable to that of training the model in the first place; and the compressed model is competitive in terms of prediction accuracy and compression ratio with other algorithms (which are often specialized for specific models or compression schemes). The library is written in Python and PyTorch and available in Github.

READ FULL TEXT
research
07/05/2017

Model compression as constrained optimization, with application to neural nets. Part I: general framework

Compressing neural nets is an active research problem, given the large s...
research
10/30/2018

DeepTwist: Learning Model Compression via Occasional Weight Distortion

Model compression has been introduced to reduce the required hardware re...
research
07/09/2021

Model compression as constrained optimization, with application to neural nets. Part V: combining compressions

Model compression is generally performed by using quantization, low-rank...
research
09/26/2021

Quantization for Distributed Optimization

Massive amounts of data have led to the training of large-scale machine ...
research
05/06/2022

Online Model Compression for Federated Learning with Large Models

This paper addresses the challenges of training large neural network mod...
research
06/14/2018

Scalable Neural Network Compression and Pruning Using Hard Clustering and L1 Regularization

We propose a simple and easy to implement neural network compression alg...
research
08/04/2020

PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning

Lossy gradient compression has become a practical tool to overcome the c...

Please sign up or login with your details

Forgot password? Click here to reset