Better Fine-Tuning by Reducing Representational Collapse

by   Armen Aghajanyan, et al.

Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or uniform distribution), thereby discouraging representation change during fine-tuning when possible without hurting performance. We also introduce a new analysis to motivate the use of trust region methods more generally, by studying representational collapse; the degradation of generalizable representations from pre-trained models as they are fine-tuned for a specific end task. Extensive experiments show that our fine-tuning method matches or exceeds the performance of previous trust region methods on a range of understanding and generation tasks (including DailyMail/CNN, Gigaword, Reddit TIFU, and the GLUE benchmark), while also being much faster. We also show that it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.


page 1

page 2

page 3

page 4


Improving language models fine-tuning with representation consistency targets

Fine-tuning contextualized representations learned by pre-trained langua...

Knowledge is a Region in Weight Space for Fine-tuned Language Models

Research on neural networks has largely focused on understanding a singl...

Scene Understanding for Autonomous Driving

To detect and segment objects in images based on their content is one of...

Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning

Pre-Trained Models have been widely applied and recently proved vulnerab...

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

Prompt tuning, which only tunes continuous prompts with a frozen languag...

Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy

Current state-of-the-art results in computer vision depend in part on fi...

Büyük dil modellerinin Türkçe verisetleri ile eğitilmesi ve ince ayarlanması

Large language models have advanced enormously, gained vast attraction a...

Please sign up or login with your details

Forgot password? Click here to reset