MPNet: Masked and Permuted Pre-training for Language Understanding

04/20/2020
by   Kaitao Song, et al.
0

BERT adopts masked language modeling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem. We argue that XLNet does not leverage the full position information of a sentence and thus suffers from position discrepancy between pre-training and fine-tuning. In this paper, we propose MPNet, a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. MPNet leverages the dependency among predicted tokens through permuted language modeling (vs. MLM in BERT), and takes auxiliary position information as input to make the model see a full sentence and thus reducing the position discrepancy (vs. PLM in XLNet). We pre-train MPNet on a large-scale dataset (over 160GB text corpora) and fine-tune on a variety of down-streaming tasks (GLUE, SQuAD, etc). Experimental results show that MPNet outperforms MLM and PLM by a large margin, and achieves better results on these tasks compared with previous state-of-the-art pre-trained methods (e.g., BERT, XLNet, RoBERTa) under the same model setting. We release the code and pre-trained model in GitHub[%s].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

SegaBERT: Pre-training of Segment-aware BERT for Language Understanding

Pre-trained language models have achieved state-of-the-art results in va...
research
10/23/2020

ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding

Coarse-grained linguistic information, such as name entities or phrases,...
research
04/16/2022

SimpleBERT: A Pre-trained Model That Learns to Generate Simple Words

Pre-trained models are widely used in the tasks of natural language proc...
research
10/04/2020

On Losses for Modern Language Models

BERT set many state-of-the-art results over varied NLU benchmarks by pre...
research
05/25/2022

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

Sentence scoring aims at measuring the likelihood score of a sentence an...
research
06/02/2020

Position Masking for Language Models

Masked language modeling (MLM) pre-training models such as BERT corrupt ...
research
03/17/2023

Trained on 100 million words and still in shape: BERT meets British National Corpus

While modern masked language models (LMs) are trained on ever larger cor...

Please sign up or login with your details

Forgot password? Click here to reset