MC-BERT: Efficient Language Pre-Training via a Meta Controller

06/10/2020
by   Zhenhui Xu, et al.
0

Pre-trained contextual representations (e.g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks. However, large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. Our studies reveal that ELECTRA's success is mainly due to its reduced complexity of the pre-training task: the binary classification (replaced token detection) is more efficient to learn than the generation task (masked language modeling). However, such a simplified task is less semantically informative. To achieve better efficiency and effectiveness, we propose a novel meta-learning framework, MC-BERT. The pre-training task is a multi-choice cloze test with a reject option, where a meta controller network provides training input and candidates. Results over GLUE natural language understanding benchmark demonstrate that our proposed method is both efficient and effective: it outperforms baselines on GLUE semantic tasks given the same computational budget.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2020

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Masked language modeling (MLM) pre-training methods such as BERT corrupt...
research
05/18/2023

Efficient Prompting via Dynamic In-Context Learning

The primary way of building AI applications is shifting from training sp...
research
12/20/2022

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

Pre-trained models have achieved remarkable success in natural language ...
research
10/13/2021

Maximizing Efficiency of Language Model Pre-training for Learning Representation

Pre-trained language models in the past years have shown exponential gro...
research
03/29/2022

mc-BEiT: Multi-choice Discretization for Image BERT Pre-training

Image BERT pre-training with masked image modeling (MIM) becomes a popul...
research
06/05/2023

On "Scientific Debt" in NLP: A Case for More Rigour in Language Model Pre-Training Research

This evidence-based position paper critiques current research practices ...
research
02/10/2023

BEST: BERT Pre-Training for Sign Language Recognition with Coupling Tokenization

In this work, we are dedicated to leveraging the BERT pre-training succe...

Please sign up or login with your details

Forgot password? Click here to reset