Guillotine Regularization: Improving Deep Networks Generalization by Removing their Head

06/27/2022
by   Florian Bordes, et al.
33

One unexpected technique that emerged in recent years consists in training a Deep Network (DN) with a Self-Supervised Learning (SSL) method, and using this network on downstream tasks but with its last few layers entirely removed. This usually skimmed-over trick is actually critical for SSL methods to display competitive performances. For example, on ImageNet classification, more than 30 points of percentage can be gained that way. This is a little vexing, as one would hope that the network layer at which invariance is explicitly enforced by the SSL criterion during training (the last layer) should be the one to use for best generalization performance downstream. But it seems not to be, and this study sheds some light on why. This trick, which we name Guillotine Regularization (GR), is in fact a generically applicable form of regularization that has also been used to improve generalization performance in transfer learning scenarios. In this work, through theory and experiments, we formalize GR and identify the underlying reasons behind its success in SSL methods. Our study shows that the use of this trick is essential to SSL performance for two main reasons: (i) improper data-augmentations to define the positive pairs used during training, and/or (ii) suboptimal selection of the hyper-parameters of the SSL loss.

READ FULL TEXT
research
06/30/2022

Improving the Generalization of Supervised Models

We consider the problem of training a deep neural network on a given cla...
research
06/07/2022

Extending Momentum Contrast with Cross Similarity Consistency Regularization

Contrastive self-supervised representation learning methods maximize the...
research
02/06/2023

The SSL Interplay: Augmentations, Inductive Bias, and Generalization

Self-supervised learning (SSL) has emerged as a powerful framework to le...
research
03/03/2023

Towards Democratizing Joint-Embedding Self-Supervised Learning

Joint Embedding Self-Supervised Learning (JE-SSL) has seen rapid develop...
research
05/22/2023

Regularization Through Simultaneous Learning: A Case Study for Hop Classification

Overfitting remains a prevalent challenge in deep neural networks, leadi...
research
09/06/2021

Training Deep Networks from Zero to Hero: avoiding pitfalls and going beyond

Training deep neural networks may be challenging in real world data. Usi...
research
10/05/2022

RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank

Joint-Embedding Self Supervised Learning (JE-SSL) has seen a rapid devel...

Please sign up or login with your details

Forgot password? Click here to reset