Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

04/26/2020
by   Mengjie Zhao, et al.
0

We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT and RoBERTa on a series of NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be inferred simultaneously. Through intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks. Analyzing the loss landscape, we show that masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy. This confirms that masking can be utilized as an efficient alternative to finetuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2020

FinBERT: A Pretrained Language Model for Financial Communications

Contextual pretrained language models, such as BERT (Devlin et al., 2019...
research
04/14/2021

Disentangling Representations of Text by Masking Transformers

Representations from large pretrained models such as BERT encode a range...
research
04/25/2020

Quantifying the Contextualization of Word Representations with Semantic Class Probing

Pretrained language models have achieved a new state of the art on many ...
research
10/08/2020

Large Product Key Memory for Pretrained Language Models

Product key memory (PKM) proposed by Lample et al. (2019) enables to imp...
research
04/30/2020

Investigating Transferability in Pretrained Language Models

While probing is a common technique for identifying knowledge in the rep...
research
06/04/2022

Instance-wise Prompt Tuning for Pretrained Language Models

Prompt Learning has recently gained great popularity in bridging the gap...
research
05/24/2023

Measuring the Knowledge Acquisition-Utilization Gap in Pretrained Language Models

While pre-trained language models (PLMs) have shown evidence of acquirin...

Please sign up or login with your details

Forgot password? Click here to reset