COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

02/16/2021
by   Yu Meng, et al.
0

We present COCO-LM, a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences. COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences. It creates more challenging pretraining inputs, where noises are sampled based on their likelihood in the auxiliary language model. COCO-LM then pretrains with two tasks: The first task, corrective language modeling, learns to correct the auxiliary model's corruptions by recovering the original tokens. The second task, sequence contrastive learning, ensures that the language model generates sequence representations that are invariant to noises and transformations. In our experiments on the GLUE and SQuAD benchmarks, COCO-LM outperforms recent pretraining approaches in various pretraining settings and few-shot evaluations, with higher pretraining efficiency. Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2021

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Masked language modeling (MLM), a self-supervised pretraining objective,...
research
05/16/2020

CERT: Contrastive Self-supervised Learning for Language Understanding

Pretrained language models such as BERT, GPT have shown great effectiven...
research
10/21/2022

InforMask: Unsupervised Informative Masking for Language Model Pretraining

Masked language modeling is widely used for pretraining large language m...
research
04/07/2022

Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

We present a new framework AMOS that pretrains text encoders with an Adv...
research
07/29/2023

GeneMask: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning

Large-scale language models such as DNABert and LOGO aim to learn optima...
research
05/24/2023

Self-Evolution Learning for Discriminative Language Model Pretraining

Masked language modeling, widely used in discriminative language model (...
research
08/12/2020

Variance-reduced Language Pretraining via a Mask Proposal Network

Self-supervised learning, a.k.a., pretraining, is important in natural l...

Please sign up or login with your details

Forgot password? Click here to reset