UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

02/28/2020
by   Hangbo Bao, et al.
0

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM). Given an input text with masked tokens, we rely on conventional masks to learn inter-relations between corrupted tokens and context via autoencoding, and pseudo masks to learn intra-relations between masked spans via partially autoregressive modeling. With well-designed position embeddings and self-attention masks, the context encodings are reused to avoid redundant computation. Moreover, conventional masks used for autoencoding provide global masking information, so that all the position embeddings are accessible in partially autoregressive language modeling. In addition, the two tasks pre-train a unified language model as a bidirectional encoder and a sequence-to-sequence decoder, respectively. Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks across several widely used benchmarks.

READ FULL TEXT
research
05/08/2019

Unified Language Model Pre-training for Natural Language Understanding and Generation

This paper presents a new Unified pre-trained Language Model (UniLM) tha...
research
04/14/2020

PALM: Pre-training an Autoencoding Autoregressive Language Model for Context-conditioned Generation

Self-supervised pre-training has emerged as a powerful technique for nat...
research
12/10/2021

Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation

Most existing vision-language pre-training methods focus on understandin...
research
06/26/2023

MotionGPT: Human Motion as a Foreign Language

Though the advancement of pre-trained large language models unfolds, the...
research
07/01/2023

BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained Transformer

BatGPT is a large-scale language model designed and trained jointly by W...
research
08/04/2022

Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models

Although masked language models are highly performant and widely adopted...
research
09/20/2023

BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

We introduce the Bittensor Language Model, called "BTLM-3B-8K", a new st...

Please sign up or login with your details

Forgot password? Click here to reset