On the Influence of Masking Policies in Intermediate Pre-training

04/18/2021
by   Qinyuan Ye, et al.
0

Current NLP models are predominantly trained through a pretrain-then-finetune pipeline, where models are first pretrained on a large text corpus with a masked-language-modelling (MLM) objective, then finetuned on the downstream task. Prior work has shown that inserting an intermediate pre-training phase, with heuristic MLM objectives that resemble downstream tasks, can significantly improve final performance. However, it is still unclear (1) in what cases such intermediate pre-training is helpful, (2) whether hand-crafted heuristic objectives are optimal for a given task, and (3) whether a MLM policy designed for one task is generalizable beyond that task. In this paper, we perform a large-scale empirical study to investigate the effect of various MLM policies in intermediate pre-training. Crucially, we introduce methods to automate discovery of optimal MLM policies, by learning a masking model through either direct supervision or meta-learning on the downstream task. We investigate the effects of using heuristic, directly supervised, and meta-learned MLM policies for intermediate pretraining, on eight selected tasks across three categories (closed-book QA, knowledge-intensive language tasks, and abstractive summarization). Most notably, we show that learned masking policies outperform the heuristic of masking named entities on TriviaQA, and masking policies learned on one task can positively transfer to other tasks in certain cases.

READ FULL TEXT
research
12/31/2020

Studying Strategically: Learning to Mask for Closed-book QA

Closed-book question-answering (QA) is a challenging task that requires ...
research
04/12/2020

Pre-training Text Representations as Meta Learning

Pre-training text representations has recently been shown to significant...
research
09/15/2021

Should We Be Pre-training? An Argument for End-task Aware Training as an Alternative

Pre-training, where models are trained on an auxiliary objective with ab...
research
09/07/2022

Blessing of Class Diversity in Pre-training

This paper presents a new statistical analysis aiming to explain the rec...
research
01/17/2013

Knowledge Matters: Importance of Prior Information for Optimization

We explore the effect of introducing prior information into the intermed...
research
10/30/2022

Learning to Decompose: Hypothetical Question Decomposition Based on Comparable Texts

Explicit decomposition modeling, which involves breaking down complex ta...
research
05/18/2023

How does the task complexity of masked pretraining objectives affect downstream performance?

Masked language modeling (MLM) is a widely used self-supervised pretrain...

Please sign up or login with your details

Forgot password? Click here to reset