The Quest of Finding the Antidote to Sparse Double Descent

08/31/2023
by   Victor Quétu, et al.
0

In energy-efficient schemes, finding the optimal size of deep learning models is very important and has a broad impact. Meanwhile, recent studies have reported an unexpected phenomenon, the sparse double descent: as the model's sparsity increases, the performance first worsens, then improves, and finally deteriorates. Such a non-monotonic behavior raises serious questions about the optimal model's size to maintain high performance: the model needs to be sufficiently over-parametrized, but having too many parameters wastes training resources. In this paper, we aim to find the best trade-off efficiently. More precisely, we tackle the occurrence of the sparse double descent and present some solutions to avoid it. Firstly, we show that a simple ℓ_2 regularization method can help to mitigate this phenomenon but sacrifices the performance/sparsity compromise. To overcome this problem, we then introduce a learning scheme in which distilling knowledge regularizes the student model. Supported by experimental results achieved using typical image classification setups, we show that this approach leads to the avoidance of such a phenomenon.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2023

Can we avoid Double Descent in Deep Neural Networks?

Finding the optimal size of deep learning models is very actual and of b...
research
07/26/2023

Sparse Double Descent in Vision Transformers: real or phantom threat?

Vision transformers (ViT) have been of broad interest in recent theoreti...
research
03/02/2023

Dodging the Sparse Double Descent

This paper presents an approach to addressing the issue of over-parametr...
research
06/17/2022

Sparse Double Descent: Where Network Pruning Aggravates Overfitting

People usually believe that network pruning not only reduces the computa...
research
05/25/2023

Dropout Drops Double Descent

In this paper, we find and analyze that we can easily drop the double de...
research
08/23/2016

Deep Double Sparsity Encoder: Learning to Sparsify Not Only Features But Also Parameters

This paper emphasizes the significance to jointly exploit the problem st...
research
02/18/2022

Geometric Regularization from Overparameterization explains Double Descent and other findings

The volume of the distribution of possible weight configurations associa...

Please sign up or login with your details

Forgot password? Click here to reset