Sparse Double Descent in Vision Transformers: real or phantom threat?

07/26/2023
by   Victor Quétu, et al.
0

Vision transformers (ViT) have been of broad interest in recent theoretical and empirical works. They are state-of-the-art thanks to their attention-based approach, which boosts the identification of key features and patterns within images thanks to the capability of avoiding inductive bias, resulting in highly accurate image analysis. Meanwhile, neoteric studies have reported a “sparse double descent” phenomenon that can occur in modern deep-learning models, where extremely over-parametrized models can generalize well. This raises practical questions about the optimal size of the model and the quest over finding the best trade-off between sparsity and performance is launched: are Vision Transformers also prone to sparse double descent? Can we find a way to avoid such a phenomenon? Our work tackles the occurrence of sparse double descent on ViTs. Despite some works that have shown that traditional architectures, like Resnet, are condemned to the sparse double descent phenomenon, for ViTs we observe that an optimally-tuned ℓ_2 regularization relieves such a phenomenon. However, everything comes at a cost: optimal lambda will sacrifice the potential compression of the ViT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2023

The Quest of Finding the Antidote to Sparse Double Descent

In energy-efficient schemes, finding the optimal size of deep learning m...
research
02/26/2023

Can we avoid Double Descent in Deep Neural Networks?

Finding the optimal size of deep learning models is very actual and of b...
research
12/11/2020

Avoiding The Double Descent Phenomenon of Random Feature Models Using Hybrid Regularization

We demonstrate the ability of hybrid regularization methods to automatic...
research
07/15/2023

Does Double Descent Occur in Self-Supervised Learning?

Most investigations into double descent have focused on supervised model...
research
11/18/2022

Understanding the double descent curve in Machine Learning

The theory of bias-variance used to serve as a guide for model selection...
research
03/02/2023

Dodging the Sparse Double Descent

This paper presents an approach to addressing the issue of over-parametr...
research
10/13/2022

Vision Transformers provably learn spatial structure

Vision Transformers (ViTs) have achieved comparable or superior performa...

Please sign up or login with your details

Forgot password? Click here to reset