XLNet: Generalized Autoregressive Pretraining for Language Understanding

06/19/2019
by   Zhilin Yang, et al.
0

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2021

All NLP Tasks Are Generation Tasks: A General Pretraining Framework

There have been various types of pretraining architectures including aut...
research
12/31/2020

ERNIE-DOC: The Retrospective Long-Document Modeling Transformer

Transformers are not suited for processing long document input due to it...
research
09/19/2023

KoBigBird-large: Transformation of Transformer for Korean Language Understanding

This work presents KoBigBird-large, a large size of Korean BigBird that ...
research
09/05/2019

Informing Unsupervised Pretraining with External Linguistic Knowledge

Unsupervised pretraining models have been shown to facilitate a wide ran...
research
08/13/2019

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Recently, the pre-trained language model, BERT (Devlin et al.(2018)Devli...
research
03/19/2019

Cloze-driven Pretraining of Self-attention Networks

We present a new approach for pretraining a bi-directional transformer m...
research
10/16/2019

Bridging the Knowledge Gap: Enhancing Question Answering with World and Domain Knowledge

In this paper we present OSCAR (Ontology-based Semantic Composition Augm...

Please sign up or login with your details

Forgot password? Click here to reset