NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better

02/24/2022
by   Chuhan Wu, et al.
0

Effectively finetuning pretrained language models (PLMs) is critical for their success in downstream tasks. However, PLMs may have risks in overfitting pretraining signals, and there are some gaps between downstream tasks and the pretraining tasks. It can be difficult for vanilla finetuning methods to overcome the barrier between pretraining and downstream tasks, which leads to suboptimal performance. In this paper, we propose a very simple yet effective method named NoisyTune which can help better finetune PLMs in downstream tasks by adding some noise to the parameters of PLMs before finetuning. More specifically, we propose a matrix-wise perturbing method by adding different uniform noises according to the standard deviations of different parameter matrices, which can consider the varied characteristics of different types of parameters in PLMs. Extensive experiments on the GLUE English benchmark and the XTREME multilingual benchmark show that NoisyTune can consistently improve the performance of different PLMs in many downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2022

PATS: Sensitivity-aware Noisy Learning for Pretrained Language Models

A wide range of NLP tasks benefit from the fine-tuning of pretrained lan...
research
06/17/2021

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Pretrained language models have achieved state-of-the-art performance wh...
research
10/07/2020

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

Autoregressive language models pretrained on large corpora have been suc...
research
04/24/2023

PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques

Recent parameter-efficient finetuning (PEFT) techniques aim to improve o...
research
01/18/2023

Effective End-to-End Vision Language Pretraining with Semantic Visual Loss

Current vision language pretraining models are dominated by methods usin...
research
06/11/2021

Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation

A well-known limitation in pretrain-finetune paradigm lies in its inflex...
research
06/16/2023

How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in Japanese

This paper investigates the effect of tokenizers on the downstream perfo...

Please sign up or login with your details

Forgot password? Click here to reset