Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

02/15/2020
by   Jesse Dodge, et al.
0

Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing. This process, however, is often brittle: even with the same hyperparameter values, distinct random seeds can lead to substantially different results. To better understand this phenomenon, we experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds. We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials. Further, we examine two factors influenced by the choice of random seed: weight initialization and training data order. We find that both contribute comparably to the variance of out-of-sample performance, and that some weight initializations perform well across all tasks explored. On small datasets, we observe that many fine-tuning trials diverge part of the way through training, and we offer best practices for practitioners to stop training less promising runs early. We publicly release all of our experimental data, including training and validation scores for 2,100 trials, to encourage further analysis of training dynamics during fine-tuning.

READ FULL TEXT

page 5

page 7

page 11

research
01/01/2021

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Fine-tuning is the de facto way to leverage large pretrained language mo...
research
01/29/2023

Debiased Fine-Tuning for Vision-language Models by Prompt Regularization

We present a new paradigm for fine-tuning large-scale visionlanguage pre...
research
05/24/2023

Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization

Pretrained language models have achieved remarkable success in a variety...
research
02/14/2020

HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing

Computation-intensive pretrained models have been taking the lead of man...
research
04/26/2023

Fine Tuning with Abnormal Examples

Given the prevalence of crowd sourced labor in creating Natural Language...
research
06/10/2020

Revisiting Few-sample BERT Fine-tuning

We study the problem of few-sample fine-tuning of BERT contextual repres...
research
03/28/2023

Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes

We investigate different natural language processing (NLP) approaches ba...

Please sign up or login with your details

Forgot password? Click here to reset