Unified Language Model Pre-training for Natural Language Understanding and Generation

05/08/2019
by   Li Dong, et al.
0

This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling objectives: unidirectional (both left-to-right and right-to-left), bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. We can fine-tune UniLM as a unidirectional decoder, a bidirectional encoder, or a sequence-to-sequence model to support various downstream natural language understanding and generation tasks. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, our model achieves new state-of-the-art results on three natural language generation tasks, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.63 (2.16 absolute improvement), pushing the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), and the SQuAD question generation BLEU-4 to 22.88 (6.50 absolute improvement).

READ FULL TEXT

page 1

page 2

page 3

page 4

09/24/2019

Unified Vision-Language Pre-Training for Image Captioning and VQA

This paper presents a unified Vision-Language Pre-training (VLP) model. ...
10/11/2018

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stan...
02/28/2020

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

We propose to pre-train a unified language model for both autoencoding a...
08/13/2019

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Recently, the pre-trained language model, BERT (Devlin et al.(2018)Devli...
09/13/2021

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

In this paper, we take the advantage of previous pre-trained models (PTM...
06/18/2021

SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs

In this paper, we propose SPBERT, a transformer-based language model pre...
07/29/2019

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Unsupervised pre-training of large neural models has recently revolution...