A Closer Look at How Fine-tuning Changes BERT

06/27/2021
by   Yichu Zhou, et al.
0

Given the prevalence of pre-trained contextualized representations in today's NLP, there have been several efforts to understand what information such representations contain. A common strategy to use such representations is to fine-tune them for an end task. However, how fine-tuning for a task changes the underlying space is less studied. In this work, we study the English BERT family and use two probing techniques to analyze how fine-tuning changes the space. Our experiments reveal that fine-tuning improves performance because it pushes points associated with a label away from other labels. By comparing the representations before and after fine-tuning, we also discover that fine-tuning does not change the representations arbitrarily; instead, it adjusts the representations to downstream tasks while preserving the original structure. Finally, using carefully constructed experiments, we show that fine-tuning can encode training sets in a representation, suggesting an overfitting problem of a new kind.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/29/2020

What Happens To BERT Embeddings During Fine-tuning?

While there has been much recent work studying how linguistic informatio...
08/29/2022

Resolving inconsistencies of runtime configuration changes through change propagation and adjustments

A system configuration may be modified at runtime to adapt the system to...
09/10/2021

How Does Fine-tuning Affect the Geometry of Embedding Space: A Case Study on Isotropy

It is widely accepted that fine-tuning pre-trained language models usual...
06/10/2020

Revisiting Few-sample BERT Fine-tuning

We study the problem of few-sample fine-tuning of BERT contextual repres...
04/22/2022

Alleviating Representational Shift for Continual Fine-tuning

We study a practical setting of continual learning: fine-tuning on a pre...
05/31/2021

On the Interplay Between Fine-tuning and Composition in Transformers

Pre-trained transformer language models have shown remarkable performanc...
05/15/2017

Tuning Modular Networks with Weighted Losses for Hand-Eye Coordination

This paper introduces an end-to-end fine-tuning method to improve hand-e...