Investigating Transferability in Pretrained Language Models

04/30/2020
by   Alex Tamkin, et al.
0

While probing is a common technique for identifying knowledge in the representations of pretrained models, it is unclear whether this technique can explain the downstream success of models like BERT which are trained end-to-end during finetuning. To address this question, we compare probing with a different measure of transferability: the decrease in finetuning performance of a partially-reinitialized model. This technique reveals that in BERT, layers with high probing accuracy on downstream GLUE tasks are neither necessary nor sufficient for high accuracy on those tasks. In addition, dataset size impacts layer transferability: the less finetuning data one has, the more important the middle and later layers of BERT become. Furthermore, BERT does not simply find a better initializer for individual layers; instead, interactions between layers matter and reordering BERT's layers prior to finetuning significantly harms evaluation metrics. These results provide a way of understanding the transferability of parameters in pretrained language models, revealing the fluidity and complexity of transfer learning in these models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2020

What's so special about BERT's layers? A closer look at the NLP pipeline in monolingual and multilingual models

Experiments with transfer learning on pre-trained language models such a...
research
03/10/2022

PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks

With the increasing abundance of pretrained models in recent years, the ...
research
04/25/2020

Quantifying the Contextualization of Word Representations with Semantic Class Probing

Pretrained language models have achieved a new state of the art on many ...
research
09/07/2023

Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Transferring the knowledge of large language models (LLMs) is a promisin...
research
09/05/2019

Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs

Though state-of-the-art sentence representation models can perform tasks...
research
10/18/2021

BERMo: What can BERT learn from ELMo?

We propose BERMo, an architectural modification to BERT, which makes pre...
research
04/26/2020

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

We present an efficient method of utilizing pretrained language models, ...

Please sign up or login with your details

Forgot password? Click here to reset