Unified Language Model Pre-training for Natural Language Understanding and Generation

by   Li Dong, et al.

This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling objectives: unidirectional (both left-to-right and right-to-left), bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. We can fine-tune UniLM as a unidirectional decoder, a bidirectional encoder, or a sequence-to-sequence model to support various downstream natural language understanding and generation tasks. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, our model achieves new state-of-the-art results on three natural language generation tasks, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.63 (2.16 absolute improvement), pushing the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), and the SQuAD question generation BLEU-4 to 22.88 (6.50 absolute improvement).


page 1

page 2

page 3

page 4


Unified Vision-Language Pre-Training for Image Captioning and VQA

This paper presents a unified Vision-Language Pre-training (VLP) model. ...

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stan...

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

We propose to pre-train a unified language model for both autoencoding a...

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Recently, the pre-trained language model, BERT (Devlin et al.(2018)Devli...

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

In this paper, we take the advantage of previous pre-trained models (PTM...

SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs

In this paper, we propose SPBERT, a transformer-based language model pre...

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Unsupervised pre-training of large neural models has recently revolution...