What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning

11/08/2019
by   Jaejun Lee, et al.
0

Pretrained transformer-based language models have achieved state of the art across countless tasks in natural language processing. These models are highly expressive, comprising at least a hundred million parameters and a dozen layers. Recent evidence suggests that only a few of the final layers need to be fine-tuned for high quality on downstream tasks. Naturally, a subsequent research question is, "how many of the last layers do we need to fine-tune?" In this paper, we precisely answer this question. We examine two recent pretrained language models, BERT and RoBERTa, across standard tasks in textual entailment, semantic similarity, sentiment analysis, and linguistic acceptability. We vary the number of final layers that are fine-tuned, then study the resulting change in task-specific effectiveness. We show that only a fourth of the final layers need to be fine-tuned to achieve 90 also find that fine-tuning all layers does not always help.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2021

Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers

Despite the success of fine-tuning pretrained language encoders like BER...
research
10/19/2020

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Probing complex language models has recently revealed several insights i...
research
12/01/2022

Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis

Multimodal learning pipelines have benefited from the success of pretrai...
research
12/29/2022

Maximizing Use-Case Specificity through Precision Model Tuning

Language models have become increasingly popular in recent years for tas...
research
08/22/2020

HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection

Sentiment analysis for code-mixed social media text continues to be an u...
research
06/06/2019

Visualizing and Measuring the Geometry of BERT

Transformer architectures show significant promise for natural language ...
research
11/30/2022

Topological Data Analysis for Speech Processing

We apply topological data analysis (TDA) to speech classification proble...

Please sign up or login with your details

Forgot password? Click here to reset