Surgical Fine-Tuning Improves Adaptation to Distribution Shifts

10/20/2022
by   Yoonho Lee, et al.
6

A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task. This paper shows that in such settings, selectively fine-tuning a subset of layers (which we term surgical fine-tuning) matches or outperforms commonly used fine-tuning approaches. Moreover, the type of distribution shift influences which subset is more effective to tune: for example, for image corruptions, fine-tuning only the first few layers works best. We validate our findings systematically across seven real-world data tasks spanning three types of distribution shifts. Theoretically, we prove that for two-layer neural networks in an idealized setting, first-layer tuning can outperform fine-tuning all layers. Intuitively, fine-tuning more parameters on a small target dataset can cause information learned during pre-training to be forgotten, and the relevant information depends on the type of shift.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2019

AdaFilter: Adaptive Filter Fine-tuning for Deep Transfer Learning

There is an increasing number of pre-trained deep neural network models....
research
04/22/2022

Alleviating Representational Shift for Continual Fine-tuning

We study a practical setting of continual learning: fine-tuning on a pre...
research
07/03/2023

Surgical fine-tuning for Grape Bunch Segmentation under Visual Domain Shifts

Mobile robots will play a crucial role in the transition towards sustain...
research
11/06/2016

Beyond Fine Tuning: A Modular Approach to Learning on Small Data

In this paper we present a technique to train neural network models on s...
research
08/27/2020

A Flexible Selection Scheme for Minimum-Effort Transfer Learning

Fine-tuning is a popular way of exploiting knowledge contained in a pre-...
research
02/10/2021

Partial transfusion: on the expressive influence of trainable batch norm parameters for transfer learning

Transfer learning from ImageNet is the go-to approach when applying deep...
research
09/19/2023

Test-Time Training for Speech

In this paper, we study the application of Test-Time Training (TTT) as a...

Please sign up or login with your details

Forgot password? Click here to reset