Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

09/04/2021
by   Atsuki Yamaguchi, et al.
1

Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced by a [MASK] placeholder in a multi-class setting over the entire vocabulary. When pretraining, it is common to use alongside MLM other auxiliary objectives on the token or sequence level to improve downstream performance (e.g. next sentence prediction). However, no previous work so far has attempted in examining whether other simpler linguistically intuitive or not objectives can be used standalone as main pretraining objectives. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of MLM. Empirical results on GLUE and SQuAD show that our proposed methods achieve comparable or better performance to MLM using a BERT-BASE architecture. We further validate our methods using smaller models, showing that pretraining a model with 41 BERT-MEDIUM results in only a 1

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2023

How does the task complexity of masked pretraining objectives affect downstream performance?

Masked language modeling (MLM) is a widely used self-supervised pretrain...
research
02/25/2021

A Primer on Contrastive Pretraining in Language Processing: Methods, Lessons Learned and Perspectives

Modern natural language processing (NLP) methods employ self-supervised ...
research
09/05/2019

Informing Unsupervised Pretraining with External Linguistic Knowledge

Unsupervised pretraining models have been shown to facilitate a wide ran...
research
02/16/2021

COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

We present COCO-LM, a new self-supervised learning framework that pretra...
research
05/24/2023

Self-Evolution Learning for Discriminative Language Model Pretraining

Masked language modeling, widely used in discriminative language model (...
research
05/09/2023

What is the best recipe for character-level encoder-only modelling?

This paper aims to benchmark recent progress in language understanding m...
research
05/24/2023

Dynamic Masking Rate Schedules for MLM Pretraining

Most works on transformers trained with the Masked Language Modeling (ML...

Please sign up or login with your details

Forgot password? Click here to reset