Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging

06/29/2023
by   Max Zimmer, et al.
0

Neural networks can be significantly compressed by pruning, leading to sparse models requiring considerably less storage and floating-point operations while maintaining predictive performance. Model soups (Wortsman et al., 2022) improve generalization and out-of-distribution performance by averaging the parameters of multiple models into a single one without increased inference time. However, identifying models in the same loss basin to leverage both sparsity and parameter averaging is challenging, as averaging arbitrary sparse models reduces the overall sparsity due to differing sparse connectivities. In this work, we address these challenges by demonstrating that exploring a single retraining phase of Iterative Magnitude Pruning (IMP) with varying hyperparameter configurations, such as batch ordering or weight decay, produces models that are suitable for averaging and share the same sparse connectivity by design. Averaging these models significantly enhances generalization performance compared to their individual components. Building on this idea, we introduce Sparse Model Soups (SMS), a novel method for merging sparse models by initiating each prune-retrain cycle with the averaged model of the previous phase. SMS maintains sparsity, exploits sparse network benefits being modular and fully parallelizable, and substantially improves IMP's performance. Additionally, we demonstrate that SMS can be adapted to enhance the performance of state-of-the-art pruning during training approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

SWAMP: Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning

Given the ever-increasing size of modern neural networks, the significan...
research
10/15/2020

A Deeper Look at the Layerwise Sparsity of Magnitude-based Pruning

Recent discoveries on neural network pruning reveal that, with a careful...
research
08/21/2020

Training Sparse Neural Networks using Compressed Sensing

Pruning the weights of neural networks is an effective and widely-used t...
research
06/12/2020

Dynamic Model Pruning with Feedback

Deep neural networks often have millions of parameters. This can hinder ...
research
10/01/2021

Powerpropagation: A sparsity inducing weight reparameterisation

The training of sparse neural networks is becoming an increasingly impor...
research
03/02/2023

Average of Pruning: Improving Performance and Stability of Out-of-Distribution Detection

Detecting Out-of-distribution (OOD) inputs have been a critical issue fo...
research
06/09/2023

How Sparse Can We Prune A Deep Network: A Geometric Viewpoint

Overparameterization constitutes one of the most significant hallmarks o...

Please sign up or login with your details

Forgot password? Click here to reset