Unified Language Model Pre-training for Natural Language Understanding and Generation

05/08/2019
by   Li Dong, et al.
0

This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling objectives: unidirectional (both left-to-right and right-to-left), bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. We can fine-tune UniLM as a unidirectional decoder, a bidirectional encoder, or a sequence-to-sequence model to support various downstream natural language understanding and generation tasks. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, our model achieves new state-of-the-art results on three natural language generation tasks, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.63 (2.16 absolute improvement), pushing the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), and the SQuAD question generation BLEU-4 to 22.88 (6.50 absolute improvement).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2019

Unified Vision-Language Pre-Training for Image Captioning and VQA

This paper presents a unified Vision-Language Pre-training (VLP) model. ...
research
10/11/2018

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stan...
research
02/28/2020

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

We propose to pre-train a unified language model for both autoencoding a...
research
08/13/2019

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Recently, the pre-trained language model, BERT (Devlin et al.(2018)Devli...
research
07/01/2023

BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained Transformer

BatGPT is a large-scale language model designed and trained jointly by W...
research
06/18/2021

SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs

In this paper, we propose SPBERT, a transformer-based language model pre...
research
07/29/2019

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Unsupervised pre-training of large neural models has recently revolution...

Please sign up or login with your details

Forgot password? Click here to reset