Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

02/28/2023
by   Riade Benbaki, et al.
0

The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful, these techniques often face serious tradeoffs between computational requirements and compression quality. In this work, we propose a novel optimization-based pruning framework that considers the combined effect of pruning (and updating) multiple weights subject to a sparsity constraint. Our approach, CHITA, extends the classical Optimal Brain Surgeon framework and results in significant improvements in speed, memory, and performance over existing optimization-based approaches for network pruning. CHITA's main workhorse performs combinatorial optimization updates on a memory-friendly representation of local quadratic approximation(s) of the loss function. On a standard benchmark of pretrained models and datasets, CHITA leads to significantly better sparsity-accuracy tradeoffs than competing methods. For example, for MLPNet with only 2 the accuracy by 63 conjunction with fine-tuning SGD steps, our method achieves significant accuracy gains over the state-of-the-art approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2020

Movement Pruning: Adaptive Sparsity by Fine-Tuning

Magnitude pruning is a widely used strategy for reducing model size in p...
research
09/10/2019

Differentiable Mask Pruning for Neural Networks

Pruning of neural networks is one of the well-known and promising model ...
research
03/09/2022

The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks

Neural networks tend to achieve better accuracy with training if they ar...
research
03/03/2023

R-TOSS: A Framework for Real-Time Object Detection using Semi-Structured Pruning

Object detectors used in autonomous vehicles can have high memory and co...
research
11/23/2020

Synthesis and Pruning as a Dynamic Compression Strategy for Efficient Deep Neural Networks

The brain is a highly reconfigurable machine capable of task-specific ad...
research
05/24/2016

Local Perturb-and-MAP for Structured Prediction

Conditional random fields (CRFs) provide a powerful tool for structured ...
research
08/04/2023

Pruning a neural network using Bayesian inference

Neural network pruning is a highly effective technique aimed at reducing...

Please sign up or login with your details

Forgot password? Click here to reset