AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

10/12/2022
by   Tao Yang, et al.
0

Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attention mechanism. In this paper, we investigate this problem through self-attention attribution and find that dropping attention positions with low attribution scores can accelerate training and increase the risk of overfitting. Motivated by this observation, we propose Attribution-Driven Dropout (AD-DROP), which randomly discards some high-attribution positions to encourage the model to make predictions by relying more on low-attribution positions to reduce overfitting. We also develop a cross-tuning strategy to alternate fine-tuning and AD-DROP to avoid dropping high-attribution positions excessively. Extensive experiments on various benchmarks show that AD-DROP yields consistent improvements over baselines. Analysis further confirms that AD-DROP serves as a strategic regularizer to prevent overfitting during fine-tuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Bi-Drop: Generalizable Fine-tuning for Pre-trained Language Models via Adaptive Subnetwork Optimization

Pretrained language models have achieved remarkable success in a variety...
research
10/11/2022

A Kernel-Based View of Language Model Fine-Tuning

It has become standard to solve NLP tasks by fine-tuning pre-trained lan...
research
09/23/2022

Whodunit? Learning to Contrast for Authorship Attribution

Authorship attribution is the task of identifying the author of a given ...
research
05/10/2023

Automatic Evaluation of Attribution by Large Language Models

A recent focus of large language model (LLM) development, as exemplified...
research
10/08/2019

SesameBERT: Attention for Anywhere

Fine-tuning with pre-trained models has achieved exceptional results for...
research
03/02/2023

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

Despite their remarkable achievement, gigantic transformers encounter si...
research
08/04/2022

DropKey

In this paper, we focus on analyzing and improving the dropout technique...

Please sign up or login with your details

Forgot password? Click here to reset