Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

02/21/2022
by   Ananya Kumar, et al.
0

When transferring a pretrained model to a downstream task, two popular methods are full fine-tuning (updating all the model parameters) and linear probing (updating only the last linear layer – the "head"). It is well known that fine-tuning leads to better accuracy in-distribution (ID). However, in this paper, we find that fine-tuning can achieve worse accuracy than linear probing out-of-distribution (OOD) when the pretrained features are good and the distribution shift is large. On 10 distribution shift datasets (Breeds-Living17, Breeds-Entity30, DomainNet, CIFAR → STL, CIFAR10.1, FMoW, ImageNetV2, ImageNet-R, ImageNet-A, ImageNet-Sketch), fine-tuning obtains on average 2 show theoretically that this tradeoff between ID and OOD accuracy arises even in a simple setting: fine-tuning overparameterized two-layer linear networks. We prove that the OOD error of fine-tuning is high when we initialize with a fixed or random head – this is because while fine-tuning learns the head, the lower layers of the neural network change simultaneously and distort the pretrained features. Our analysis suggests that the easy two-step strategy of linear probing then full fine-tuning (LP-FT), sometimes used as a fine-tuning heuristic, combines the benefits of both fine-tuning and linear probing. Empirically, LP-FT outperforms both fine-tuning and linear probing on the above datasets (1

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2022

Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced Data

Classification on long-tailed distributed data is a challenging problem,...
research
10/19/2022

lo-fi: distributed fine-tuning without communication

When fine-tuning large neural networks, it is common to use multiple nod...
research
04/25/2022

Fine-tuning Pruned Networks with Linear Over-parameterization

Structured pruning compresses neural networks by reducing channels (filt...
research
08/15/2023

Domain-Aware Fine-Tuning: Enhancing Neural Network Adaptability

Fine-tuning pre-trained neural network models has become a widely adopte...
research
02/11/2023

How to prepare your task head for finetuning

In deep learning, transferring information from a pretrained network to ...
research
03/15/2021

How Many Data Points is a Prompt Worth?

When fine-tuning pretrained models for classification, researchers eithe...
research
08/27/2020

A Flexible Selection Scheme for Minimum-Effort Transfer Learning

Fine-tuning is a popular way of exploiting knowledge contained in a pre-...

Please sign up or login with your details

Forgot password? Click here to reset