Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

12/16/2020
by   Xiangyu Chang, et al.
0

Deep networks are typically trained with many more parameters than the size of the training dataset. Recent empirical evidence indicates that the practice of overparameterization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models. Specifically, it suggests that overparameterization benefits model pruning / sparsification. This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional asymptotics of model pruning in the overparameterized regime. The theory presented addresses the following core question: "should one train a small model from the beginning, or first train a large model and then prune?". We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning rather than simply training with the known informative features. This leads to a new double descent in the training of sparse models: growing the original model, while preserving the target sparsity, improves the test accuracy as one moves beyond the overparameterization threshold. Our analysis further reveals the benefit of retraining by relating it to feature correlations. We find that the above phenomena are already present in linear and random-features models. Our technical approach advances the toolset of high-dimensional analysis and precisely characterizes the asymptotic distribution of over-parameterized least-squares. The intuition gained by analytically studying simpler models is numerically verified on neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2022

Sparse Double Descent: Where Network Pruning Aggravates Overfitting

People usually believe that network pruning not only reduces the computa...
research
03/19/2019

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Interpolators -- estimators that achieve zero training error -- have att...
research
12/04/2019

Deep Double Descent: Where Bigger Models and More Data Hurt

We show that a variety of modern deep learning tasks exhibit a "double-d...
research
02/21/2020

Generalisation error in learning with random features and the hidden manifold model

We study generalised linear regression and classification for a syntheti...
research
09/21/2022

Deep Double Descent via Smooth Interpolation

Overparameterized deep networks are known to be able to perfectly fit th...
research
06/21/2023

Quantifying lottery tickets under label noise: accuracy, calibration, and complexity

Pruning deep neural networks is a widely used strategy to alleviate the ...
research
09/22/2022

EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models

Neural models are known to be over-parameterized, and recent work has sh...

Please sign up or login with your details

Forgot password? Click here to reset