Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

08/24/2022
by   Elias Frantar, et al.
0

We consider the problem of model compression for deep neural networks (DNNs) in the challenging post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of calibration input data. This problem has become popular in view of the emerging software and hardware support for executing models compressed via pruning and/or quantization with speedup, and well-performing solutions have been proposed independently for both compression approaches. In this paper, we introduce a new compression framework which covers both weight pruning and quantization in a unified setting, is time- and space-efficient, and considerably improves upon the practical performance of existing post-training methods. At the technical level, our approach is based on the first exact and efficient realization of the classical Optimal Brain Surgeon (OBS) framework of [LeCun, Denker, and Solla, 1990] at the scale of modern DNNs, which we further extend to cover weight quantization. This is enabled by a series of algorithmic developments which may be of independent interest. From the practical perspective, our experimental results show that it can improve significantly upon the compression-accuracy trade-offs of existing post-training methods, and that it can even enable the accurate joint application of both pruning and quantization in a post-training setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2022

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

As Deep Neural Networks (DNNs) usually are overparameterized and have mi...
research
07/28/2022

CrAM: A Compression-Aware Minimizer

We examine the question of whether SGD-based optimization of deep neural...
research
07/20/2020

Differentiable Joint Pruning and Quantization for Hardware Efficiency

We present a differentiable joint pruning and quantization (DJPQ) scheme...
research
01/31/2022

SPDY: Accurate Pruning with Speedup Guarantees

The recent focus on the efficiency of deep neural networks (DNNs) has le...
research
10/06/2020

Characterising Bias in Compressed Models

The popularity and widespread use of pruning and quantization is driven ...
research
04/30/2021

Post-training deep neural network pruning via layer-wise calibration

We present a post-training weight pruning method for deep neural network...
research
03/13/2020

A Privacy-Preserving DNN Pruning and Mobile Acceleration Framework

To facilitate the deployment of deep neural networks (DNNs) on resource-...

Please sign up or login with your details

Forgot password? Click here to reset