Pre-Training Transformers as Energy-Based Cloze Models

12/15/2020
by   Kevin Clark, et al.
0

We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

MUX-PLMs: Pre-training Language Models with Data Multiplexing

Data multiplexing is a recently proposed method for improving a model's ...
research
08/29/2023

Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability

How do language models learn to make predictions during pre-training? To...
research
03/23/2020

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Masked language modeling (MLM) pre-training methods such as BERT corrupt...
research
09/15/2023

Headless Language Models: Learning without Predicting with Contrastive Weight Tying

Self-supervised pre-training of language models usually consists in pred...
research
04/17/2022

On Effectively Learning of Knowledge in Continual Pre-training

Pre-trained language models (PLMs) like BERT have made significant progr...
research
06/02/2023

VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores

Vision-language models (VLMs) discriminatively pre-trained with contrast...
research
10/08/2022

Short Text Pre-training with Extended Token Classification for E-commerce Query Understanding

E-commerce query understanding is the process of inferring the shopping ...

Please sign up or login with your details

Forgot password? Click here to reset