A Closer Look at How Fine-tuning Changes BERT

by   Yichu Zhou, et al.

Given the prevalence of pre-trained contextualized representations in today's NLP, there have been several efforts to understand what information such representations contain. A common strategy to use such representations is to fine-tune them for an end task. However, how fine-tuning for a task changes the underlying space is less studied. In this work, we study the English BERT family and use two probing techniques to analyze how fine-tuning changes the space. Our experiments reveal that fine-tuning improves performance because it pushes points associated with a label away from other labels. By comparing the representations before and after fine-tuning, we also discover that fine-tuning does not change the representations arbitrarily; instead, it adjusts the representations to downstream tasks while preserving the original structure. Finally, using carefully constructed experiments, we show that fine-tuning can encode training sets in a representation, suggesting an overfitting problem of a new kind.


page 1

page 2

page 3

page 4


What Happens To BERT Embeddings During Fine-tuning?

While there has been much recent work studying how linguistic informatio...

Resolving inconsistencies of runtime configuration changes through change propagation and adjustments

A system configuration may be modified at runtime to adapt the system to...

How Does Fine-tuning Affect the Geometry of Embedding Space: A Case Study on Isotropy

It is widely accepted that fine-tuning pre-trained language models usual...

Revisiting Few-sample BERT Fine-tuning

We study the problem of few-sample fine-tuning of BERT contextual repres...

Alleviating Representational Shift for Continual Fine-tuning

We study a practical setting of continual learning: fine-tuning on a pre...

On the Interplay Between Fine-tuning and Composition in Transformers

Pre-trained transformer language models have shown remarkable performanc...

Tuning Modular Networks with Weighted Losses for Hand-Eye Coordination

This paper introduces an end-to-end fine-tuning method to improve hand-e...