A Stability Analysis of Fine-Tuning a Pre-Trained Model

01/24/2023
by   Zihao Fu, et al.
0

Fine-tuning a pre-trained model (such as BERT, ALBERT, RoBERTa, T5, GPT, etc.) has proven to be one of the most promising paradigms in recent NLP research. However, numerous recent works indicate that fine-tuning suffers from the instability problem, i.e., tuning the same model under the same setting results in significantly different performance. Many recent works have proposed different methods to solve this problem, but there is no theoretical understanding of why and how these methods work. In this paper, we propose a novel theoretical stability analysis of fine-tuning that focuses on two commonly used settings, namely, full fine-tuning and head tuning. We define the stability under each setting and prove the corresponding stability bounds. The theoretical bounds explain why and how several existing methods can stabilize the fine-tuning procedure. In addition to being able to explain most of the observed empirical discoveries, our proposed theoretical analysis framework can also help in the design of effective and provable methods. Based on our theory, we propose three novel strategies to stabilize the fine-tuning procedure, namely, Maximal Margin Regularizer (MMR), Multi-Head Loss (MHLoss), and Self Unsupervised Re-Training (SURT). We extensively evaluate our proposed approaches on 11 widely used real-world benchmark datasets, as well as hundreds of synthetic classification datasets. The experiment results show that our proposed methods significantly stabilize the fine-tuning procedure and also corroborate our theoretical analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2022

Memorization in NLP Fine-tuning Methods

Large language models are shown to present privacy risks through memoriz...
research
11/28/2022

On the Effectiveness of Parameter-Efficient Fine-Tuning

Fine-tuning pre-trained models has been ubiquitously proven to be effect...
research
06/21/2021

On fine-tuning of Autoencoders for Fuzzy rule classifiers

Recent discoveries in Deep Neural Networks are allowing researchers to t...
research
10/20/2021

Ranking and Tuning Pre-trained Models: A New Paradigm of Exploiting Model Hubs

Pre-trained model hubs with many pre-trained models (PTMs) have been a c...
research
03/13/2022

Towards Personalized Intelligence at Scale

Personalized Intelligence (PI) is the problem of providing customized AI...
research
05/03/2023

SimSC: A Simple Framework for Semantic Correspondence with Temperature Learning

We propose SimSC, a remarkably simple framework, to address the problem ...
research
07/21/2023

Tuning Pre-trained Model via Moment Probing

Recently, efficient fine-tuning of large-scale pre-trained models has at...

Please sign up or login with your details

Forgot password? Click here to reset