Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping

10/19/2022
by   Chenghao Yang, et al.
0

Fine-tuning over large pretrained language models (PLMs) has established many state-of-the-art results. Despite its superior performance, such fine-tuning can be unstable, resulting in significant variance in performance and potential risks for practical applications. Previous works have attributed such instability to the catastrophic forgetting problem in the top layers of PLMs, which indicates iteratively that fine-tuning layers in a top-down manner is a promising solution. In this paper, we first point out that this method does not always work out due to the different convergence speeds of different layers/modules. Inspired by this observation, we propose a simple component-wise gradient norm clipping method to adjust the convergence speed for different components. Experiment results demonstrate that our method achieves consistent improvements in terms of generalization performance, convergence speed, and training stability. The codebase can be found at https://github.com/yangalan123/FineTuningStability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2020

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

Fine-tuning pre-trained transformer-based language models such as BERT h...
research
09/18/2023

Understanding Catastrophic Forgetting in Language Models via Implicit Inference

Fine-tuning (via methods such as instruction-tuning or reinforcement lea...
research
07/19/2023

Gradient Sparsification For Masked Fine-Tuning of Transformers

Fine-tuning pretrained self-supervised language models is widely adopted...
research
11/24/2022

Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes

In this paper, we move towards combining large parametric models with no...
research
08/03/2022

Efficient Fine-Tuning of Compressed Language Models with Learners

Fine-tuning BERT-based models is resource-intensive in memory, computati...
research
09/21/2021

Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you?

In this paper, we investigate what types of stereotypical information ar...
research
08/20/2023

LMTuner: An user-friendly and highly-integrable Training Framework for fine-tuning Large Language Models

With the burgeoning development in the realm of large language models (L...

Please sign up or login with your details

Forgot password? Click here to reset