Revisiting Few-sample BERT Fine-tuning

06/10/2020
by   Tianyi Zhang, et al.
0

We study the problem of few-sample fine-tuning of BERT contextual representations, and identify three sub-optimal choices in current, broadly adopted practices. First, we observe that the omission of the gradient bias correction in the optimizer results in fine-tuning instability. We also find that parts of the BERT network provide a detrimental starting point for fine-tuning, and simply re-initializing these layers speeds up learning and improves performance. Finally, we study the effect of training time, and observe that commonly used recipes often do not allocate sufficient time for training. In light of these findings, we re-visit recently proposed methods to improve few-sample fine-tuning with BERT and re-evaluate their effectiveness. Generally, we observe a decrease in their relative impact when modifying the fine-tuning process based on our findings.

READ FULL TEXT
research
06/18/2021

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

We show that with small-to-medium training data, fine-tuning only the bi...
research
06/27/2021

A Closer Look at How Fine-tuning Changes BERT

Given the prevalence of pre-trained contextualized representations in to...
research
01/30/2021

Speech Recognition by Simply Fine-tuning BERT

We propose a simple method for automatic speech recognition (ASR) by fin...
research
09/14/2021

On the Language-specificity of Multilingual BERT and the Impact of Fine-tuning

Recent work has shown evidence that the knowledge acquired by multilingu...
research
02/15/2020

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

Fine-tuning pretrained contextual word embedding models to supervised do...
research
10/28/2021

Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning

A BERT-based Neural Ranking Model (NRM) can be either a cross-encoder or...
research
06/05/2022

Speech Detection Task Against Asian Hate: BERT the Central, While Data-Centric Studies the Crucial

With the epidemic continuing, hatred against Asians is intensifying in c...

Please sign up or login with your details

Forgot password? Click here to reset