How to prepare your task head for finetuning

by   Yi Ren, et al.

In deep learning, transferring information from a pretrained network to a downstream task by finetuning has many benefits. The choice of task head plays an important role in fine-tuning, as the pretrained and downstream tasks are usually different. Although there exist many different designs for finetuning, a full understanding of when and why these algorithms work has been elusive. We analyze how the choice of task head controls feature adaptation and hence influences the downstream performance. By decomposing the learning dynamics of adaptation, we find that the key aspect is the training accuracy and loss at the beginning of finetuning, which determines the "energy" available for the feature's adaptation. We identify a significant trend in the effect of changes in this initial energy on the resulting features after fine-tuning. Specifically, as the energy increases, the Euclidean and cosine distances between the resulting and original features increase, while their dot products (and the resulting features' norm) first increase and then decrease. Inspired by this, we give several practical principles that lead to better downstream performance. We analytically prove this trend in an overparamterized linear setting and verify its applicability to different experimental settings.


page 16

page 19


Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

When transferring a pretrained model to a downstream task, two popular m...

On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation

Adapter-based tuning has recently arisen as an alternative to fine-tunin...

DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration

The visual models pretrained on large-scale benchmarks encode general kn...

SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data

Recent success in fine-tuning large models, that are pretrained on broad...

ViM: Vision Middleware for Unified Downstream Transferring

Foundation models are pre-trained on massive data and transferred to dow...

A Theoretical Analysis of Fine-tuning with Linear Teachers

Fine-tuning is a common practice in deep learning, achieving excellent g...

Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks

Temporal Action Localization (TAL) methods typically operate on top of f...

Please sign up or login with your details

Forgot password? Click here to reset