What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation

07/12/2023
by   Gabriele Merlin, et al.
0

The pretrain-finetune paradigm usually improves downstream performance over training a model from scratch on the same task, becoming commonplace across many areas of machine learning. While pretraining is empirically observed to be beneficial for a range of tasks, there is not a clear understanding yet of the reasons for this effect. In this work, we examine the relationship between pretrained vision transformers and the corresponding finetuned versions on several benchmark datasets and tasks. We present new metrics that specifically investigate the degree to which invariances learned by a pretrained model are retained or forgotten during finetuning. Using these metrics, we present a suite of empirical findings, including that pretraining induces transferable invariances in shallow layers and that invariances from deeper pretrained layers are compressed towards shallower layers during finetuning. Together, these findings contribute to understanding some of the reasons for the successes of pretrained models and the changes that a pretrained model undergoes when finetuned on a downstream task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2021

Pretrained Transformers as Universal Computation Engines

We investigate the capability of a transformer pretrained on natural lan...
research
07/21/2022

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

Vision transformer (ViT) recently has drawn great attention in computer ...
research
02/17/2022

When, Why, and Which Pretrained GANs Are Useful?

The literature has proposed several methods to finetune pretrained GANs ...
research
03/02/2023

Learning to Grow Pretrained Models for Efficient Transformer Training

Scaling transformers has led to significant breakthroughs in many domain...
research
06/01/2023

TMI! Finetuned Models Leak Private Information from their Pretraining Data

Transfer learning has become an increasingly popular technique in machin...
research
05/11/2023

Extending Audio Masked Autoencoders Toward Audio Restoration

Audio classification and restoration are among major downstream tasks in...
research
03/03/2023

Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Coreset selection is among the most effective ways to reduce the trainin...

Please sign up or login with your details

Forgot password? Click here to reset