DeepAI
Log In Sign Up

Cloze-driven Pretraining of Self-attention Networks

03/19/2019
by   Alexei Baevski, et al.
0

We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems. Our model solves a cloze-style word reconstruction task, where each word is ablated and must be predicted given the rest of the text. Experiments demonstrate large performance gains on GLUE and new state of the art results on NER as well as constituency parsing benchmarks, consistent with the concurrently introduced BERT model. We also present a detailed analysis of a number of factors that contribute to effective pretraining, including data domain and size, model capacity, and variations on the cloze objective.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/26/2019

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Language model pretraining has led to significant performance gains but ...
10/23/2019

Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks

Self-attention network (SAN) can benefit significantly from the bi-direc...
05/28/2021

Domain-Adaptive Pretraining Methods for Dialogue Understanding

Language models like BERT and SpanBERT pretrained on open-domain data ha...
06/19/2019

XLNet: Generalized Autoregressive Pretraining for Language Understanding

With the capability of modeling bidirectional contexts, denoising autoen...
04/13/2021

Semantic maps and metrics for science Semantic maps and metrics for science using deep transformer encoders

The growing deluge of scientific publications demands text analysis tool...
04/18/2021

When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset

While self-supervised learning has made rapid advances in natural langua...
07/17/2021

Generative Pretraining for Paraphrase Evaluation

We introduce ParaBLEU, a paraphrase representation learning model and ev...