Position Masking for Language Models

06/02/2020
by   Andy Wagner, et al.
0

Masked language modeling (MLM) pre-training models such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. This is an effective technique which has led to good results on all NLP benchmarks. We propose to expand upon this idea by masking the positions of some tokens along with the masked input token ids. We follow the same standard approach as BERT masking a percentage of the tokens positions and then predicting their original values using an additional fully connected classifier stage. This approach has shown good performance gains (.3% improvement) for the SQUAD additional improvement in convergence times. For the Graphcore IPU the convergence of BERT Base with position masking requires only 50% of the tokens from the original BERT paper.

READ FULL TEXT
research
03/23/2020

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Masked language modeling (MLM) pre-training methods such as BERT corrupt...
research
11/01/2019

When Choosing Plausible Alternatives, Clever Hans can be Clever

Pretrained language models, such as BERT and RoBERTa, have shown large i...
research
04/20/2020

MPNet: Masked and Permuted Pre-training for Language Understanding

BERT adopts masked language modeling (MLM) for pre-training and is one o...
research
01/02/2021

Superbizarre Is Not Superb: Improving BERT's Interpretations of Complex Words with Derivational Morphology

How does the input segmentation of pretrained language models (PLMs) aff...
research
11/09/2022

Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token

The pre-training of masked language models (MLMs) consumes massive compu...
research
02/28/2023

Weighted Sampling for Masked Language Modeling

Masked Language Modeling (MLM) is widely used to pretrain language model...
research
05/01/2020

Identifying Necessary Elements for BERT's Multilinguality

It has been shown that multilingual BERT (mBERT) yields high quality mul...

Please sign up or login with your details

Forgot password? Click here to reset