An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models

06/17/2021
by   Xueqing Liu, et al.
0

The performance of fine-tuning pre-trained language models largely depends on the hyperparameter configuration. In this paper, we investigate the performance of modern hyperparameter optimization methods (HPO) on fine-tuning pre-trained language models. First, we study and report three HPO algorithms' performances on fine-tuning two state-of-the-art language models on the GLUE dataset. We find that using the same time budget, HPO often fails to outperform grid search due to two reasons: insufficient time budget and overfitting. We propose two general strategies and an experimental procedure to systematically troubleshoot HPO's failure cases. By applying the procedure, we observe that HPO can succeed with more appropriate settings in the search space and time budget; however, in certain cases overfitting remains. Finally, we make suggestions for future work. Our implementation can be found in https://github.com/microsoft/FLAML/tree/main/flaml/nlp/.

READ FULL TEXT

page 8

page 13

page 14

page 15

research
05/22/2023

Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection

Out-of-distribution (OOD) detection is a critical task for reliable pred...
research
05/23/2022

Improving language models fine-tuning with representation consistency targets

Fine-tuning contextualized representations learned by pre-trained langua...
research
08/07/2023

MedMine: Examining Pre-trained Language Models on Medication Mining

Automatic medication mining from clinical and biomedical text has become...
research
08/14/2021

Few-Sample Named Entity Recognition for Security Vulnerability Reports by Fine-Tuning Pre-Trained Language Models

Public security vulnerability reports (e.g., CVE reports) play an import...
research
05/08/2023

Diffusion Theory as a Scalpel: Detecting and Purifying Poisonous Dimensions in Pre-trained Language Models Caused by Backdoor or Bias

Pre-trained Language Models (PLMs) may be poisonous with backdoors or bi...
research
05/24/2023

ChatAgri: Exploring Potentials of ChatGPT on Cross-linguistic Agricultural Text Classification

In the era of sustainable smart agriculture, a massive amount of agricul...
research
08/11/2023

Assessing Guest Nationality Composition from Hotel Reviews

Many hotels target guest acquisition efforts to specific markets in orde...

Please sign up or login with your details

Forgot password? Click here to reset