Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

04/07/2022
by   Yu Meng, et al.
0

We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators. Following ELECTRA-style pretraining, the main encoder is trained as a discriminator to detect replaced tokens generated by auxiliary masked language models (MLMs). Different from ELECTRA which trains one MLM as the generator, we jointly train multiple MLMs of different sizes to provide training signals at various levels of difficulty. To push the discriminator to learn better with challenging replaced tokens, we learn mixture weights over the auxiliary MLMs' outputs to maximize the discriminator loss by backpropagating the gradient from the discriminator via Gumbel-Softmax. For better pretraining efficiency, we propose a way to assemble multiple MLMs into one unified auxiliary model. AMOS outperforms ELECTRA and recent state-of-the-art pretrained models by about 1 point on the GLUE benchmark for BERT base-sized models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2020

AraELECTRA: Pre-Training Text Discriminators for Arabic Language Understanding

Advances in English language representation enabled a more sample-effici...
research
04/13/2022

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

We present an efficient method of pretraining large-scale autoencoding l...
research
02/16/2021

COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

We present COCO-LM, a new self-supervised learning framework that pretra...
research
02/04/2023

Representation Deficiency in Masked Language Modeling

Masked Language Modeling (MLM) has been one of the most prominent approa...
research
11/30/2022

BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model From Scratch?

Pretrained transformer models have achieved state-of-the-art results in ...
research
05/21/2023

Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers

This paper explores the effectiveness of model-generated signals in impr...
research
06/25/2021

Learning to Sample Replacements for ELECTRA Pre-Training

ELECTRA pretrains a discriminator to detect replaced tokens, where the r...

Please sign up or login with your details

Forgot password? Click here to reset