A Theoretical Analysis of Fine-tuning with Linear Teachers

07/04/2021
by   Gal Shachaf, et al.
1

Fine-tuning is a common practice in deep learning, achieving excellent generalization results on downstream tasks using relatively little training data. Although widely used in practice, it is lacking strong theoretical understanding. We analyze the sample complexity of this scheme for regression with linear teachers in several architectures. Intuitively, the success of fine-tuning depends on the similarity between the source tasks and the target task, however measuring it is non trivial. We show that a relevant measure considers the relation between the source task, the target task and the covariance structure of the target data. In the setting of linear regression, we show that under realistic settings a substantial sample complexity reduction is plausible when the above measure is low. For deep linear regression, we present a novel result regarding the inductive bias of gradient-based training when the network is initialized with pretrained weights. Using this result we show that the similarity measure for this setting is also affected by the depth of the network. We further present results on shallow ReLU models, and analyze the dependence of sample complexity there on source and target tasks. We empirically demonstrate our results for both synthetic and realistic data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2020

Few-Shot Learning via Learning the Representation, Provably

This paper studies few-shot learning via representation learning, where ...
research
05/25/2023

Representation Transfer Learning via Multiple Pre-trained models for Linear Regression

In this paper, we consider the problem of learning a linear regression m...
research
07/23/2021

Using a Cross-Task Grid of Linear Probes to Interpret CNN Model Predictions On Retinal Images

We analyze a dataset of retinal images using linear probes: linear regre...
research
02/28/2017

Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-tuning

Deep neural networks require a large amount of labeled training data dur...
research
12/21/2022

A Theoretical Study of The Effects of Adversarial Attacks on Sparse Regression

This paper analyzes ℓ_1 regularized linear regression under the challeng...
research
03/09/2021

Transfer Learning Can Outperform the True Prior in Double Descent Regularization

We study a fundamental transfer learning process from source to target l...
research
02/11/2023

How to prepare your task head for finetuning

In deep learning, transferring information from a pretrained network to ...

Please sign up or login with your details

Forgot password? Click here to reset