PATS: Sensitivity-aware Noisy Learning for Pretrained Language Models

10/22/2022
by   Yupeng Zhang, et al.
0

A wide range of NLP tasks benefit from the fine-tuning of pretrained language models (PLMs). However, a number of redundant parameters which contribute less to the downstream task are observed in a directly fine-tuned model. We consider the gap between pretraining and downstream tasks hinders the training of these redundant parameters, and results in a suboptimal performance of the overall model. In this paper, we present PATS (Perturbation According To Sensitivity), a noisy training mechanism which considers each parameter's importance in the downstream task to help fine-tune PLMs. The main idea of PATS is to add bigger noise to parameters with lower sensitivity and vice versa, in order to activate more parameters' contributions to downstream tasks without affecting the sensitive ones much. Extensive experiments conducted on different tasks of the GLUE benchmark show PATS can consistently empower the fine-tuning of different sizes of PLMs, and the parameters in the well-performing models always have more concentrated distributions of sensitivities, which experimentally proves the effectiveness of our method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2021

Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning

Recent pretrained language models extend from millions to billions of pa...
research
02/24/2022

NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better

Effectively finetuning pretrained language models (PLMs) is critical for...
research
07/19/2023

Gradient Sparsification For Masked Fine-Tuning of Transformers

Fine-tuning pretrained self-supervised language models is widely adopted...
research
06/17/2021

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Pretrained language models have achieved state-of-the-art performance wh...
research
08/23/2023

DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration

The visual models pretrained on large-scale benchmarks encode general kn...
research
03/15/2022

Data Contamination: From Memorization to Exploitation

Pretrained language models are typically trained on massive web-based da...
research
05/03/2022

ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models

Nowadays, pretrained language models (PLMs) have dominated the majority ...

Please sign up or login with your details

Forgot password? Click here to reset