Sparse Double Descent: Where Network Pruning Aggravates Overfitting

06/17/2022
by   Zheng He, et al.
0

People usually believe that network pruning not only reduces the computational cost of deep networks, but also prevents overfitting by decreasing model capacity. However, our work surprisingly discovers that network pruning sometimes even aggravates overfitting. We report an unexpected sparse double descent phenomenon that, as we increase model sparsity via network pruning, test performance first gets worse (due to overfitting), then gets better (due to relieved overfitting), and gets worse at last (due to forgetting useful information). While recent studies focused on the deep double descent with respect to model overparameterization, they failed to recognize that sparsity may also cause double descent. In this paper, we have three main contributions. First, we report the novel sparse double descent phenomenon through extensive experiments. Second, for this phenomenon, we propose a novel learning distance interpretation that the curve of ℓ_2 learning distance of sparse models (from initialized parameters to final parameters) may correlate with the sparse double descent curve well and reflect generalization better than minima flatness. Third, in the context of sparse double descent, a winning ticket in the lottery ticket hypothesis surprisingly may not always win.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2019

Deep Double Descent: Where Bigger Models and More Data Hurt

We show that a variety of modern deep learning tasks exhibit a "double-d...
research
12/16/2020

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Deep networks are typically trained with many more parameters than the s...
research
09/04/2022

Towards Understanding the Overfitting Phenomenon of Deep Click-Through Rate Prediction Models

Deep learning techniques have been applied widely in industrial recommen...
research
07/15/2023

Does Double Descent Occur in Self-Supervised Learning?

Most investigations into double descent have focused on supervised model...
research
11/18/2022

Understanding the double descent curve in Machine Learning

The theory of bias-variance used to serve as a guide for model selection...
research
08/23/2016

Deep Double Sparsity Encoder: Learning to Sparsify Not Only Features But Also Parameters

This paper emphasizes the significance to jointly exploit the problem st...
research
08/31/2023

The Quest of Finding the Antidote to Sparse Double Descent

In energy-efficient schemes, finding the optimal size of deep learning m...

Please sign up or login with your details

Forgot password? Click here to reset