Noise Stability Regularization for Improving BERT Fine-tuning

07/10/2021
by   Hang Hua, et al.
0

Fine-tuning pre-trained language models such as BERT has become a common practice dominating leaderboards across various NLP tasks. Despite its recent success and wide adoption, this process is unstable when there are only a small number of training samples available. The brittleness of this process is often reflected by the sensitivity to random seeds. In this paper, we propose to tackle this problem based on the noise stability property of deep nets, which is investigated in recent literature (Arora et al., 2018; Sanyal et al., 2020). Specifically, we introduce a novel and effective regularization method to improve fine-tuning on NLP tasks, referred to as Layer-wise Noise Stability Regularization (LNSR). We extend the theories about adding noise to the input and prove that our method gives a stabler regularization effect. We provide supportive evidence by experimentally confirming that well-performing models show a low sensitivity to noise and fine-tuning with LNSR exhibits clearly higher generalizability and stability. Furthermore, our method also demonstrates advantages over other state-of-the-art algorithms including L2-SP (Li et al., 2018), Mixout (Lee et al., 2020) and SMART (Jiang et al., 2020).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2020

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

Fine-tuning pre-trained transformer-based language models such as BERT h...
research
04/27/2022

On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations

Recent work has shown that deep learning models in NLP are highly sensit...
research
10/13/2021

Newer is not always better: Rethinking transferability metrics, their peculiarities, stability and performance

Fine-tuning of large pre-trained image and language models on small cust...
research
07/29/2021

Term Expansion and FinBERT fine-tuning for Hypernym and Synonym Ranking of Financial Terms

Hypernym and synonym matching are one of the mainstream Natural Language...
research
05/06/2022

A Data Cartography based MixUp for Pre-trained Language Models

MixUp is a data augmentation strategy where additional samples are gener...
research
11/22/2021

Finding the Winning Ticket of BERT for Binary Text Classification via Adaptive Layer Truncation before Fine-tuning

In light of the success of transferring language models into NLP tasks, ...
research
09/05/2021

Teaching Autoregressive Language Models Complex Tasks By Demonstration

This paper demonstrates that by fine-tuning an autoregressive language m...

Please sign up or login with your details

Forgot password? Click here to reset