A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension

06/02/2020
by   Jie Cai, et al.
0

Pre-trained models have brought significant improvements to many NLP tasks and have been extensively analyzed. But little is known about the effect of fine-tuning on specific tasks. Intuitively, people may agree that a pre-trained model already learns semantic representations of words (e.g. synonyms are closer to each other) and fine-tuning further improves its capabilities which require more complicated reasoning (e.g. coreference resolution, entity boundary detection, etc). However, how to verify these arguments analytically and quantitatively is a challenging task and there are few works focus on this topic. In this paper, inspired by the observation that most probing tasks involve identifying matched pairs of phrases (e.g. coreference requires matching an entity and a pronoun), we propose a pairwise probe to understand BERT fine-tuning on the machine reading comprehension (MRC) task. Specifically, we identify five phenomena in MRC. According to pairwise probing tasks, we compare the performance of each layer's hidden representation of pre-trained and fine-tuned BERT. The proposed pairwise probe alleviates the problem of distraction from inaccurate model training and makes a robust and quantitative comparison. Our experimental analysis leads to highly confident conclusions: (1) Fine-tuning has little effect on the fundamental and low-level information and general semantic tasks. (2) For specific abilities required for downstream tasks, fine-tuned BERT is better than pre-trained BERT and such gaps are obvious after the fifth layer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

Tracing Origins: Coref-aware Machine Reading Comprehension

Machine reading comprehension is a heavily-studied research and test fie...
research
04/29/2020

What Happens To BERT Embeddings During Fine-tuning?

While there has been much recent work studying how linguistic informatio...
research
10/19/2020

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Probing complex language models has recently revealed several insights i...
research
06/14/2021

Why Can You Lay Off Heads? Investigating How BERT Heads Transfer

The huge size of the widely used BERT family models has led to recent ef...
research
04/15/2020

Coreferential Reasoning Learning for Language Representation

Language representation models such as BERT could effectively capture co...
research
10/12/2020

Multi-Stage Pre-training for Low-Resource Domain Adaptation

Transfer learning techniques are particularly useful in NLP tasks where ...
research
04/15/2021

NT5?! Training T5 to Perform Numerical Reasoning

Numerical reasoning over text (NRoT) presents unique challenges that are...

Please sign up or login with your details

Forgot password? Click here to reset