All NLP Tasks Are Generation Tasks: A General Pretraining Framework

03/18/2021
by   Zhengxiao Du, et al.
0

There have been various types of pretraining architectures including autoregressive models (e.g., GPT), autoencoding models (e.g., BERT), and encoder-decoder models (e.g., T5). On the other hand, NLP tasks are different in nature, with three main categories being classification, unconditional generation, and conditional generation. However, none of the pretraining frameworks performs the best for all tasks, which introduces inconvenience for model development and selection. We propose a novel pretraining framework GLM (General Language Model) to address this challenge. Compared to previous work, our architecture has three major benefits: (1) it performs well on classification, unconditional generation, and conditional generation tasks with one single pretrained model; (2) it outperforms BERT-like models on classification due to improved pretrain-finetune consistency; (3) it naturally handles variable-length blank filling which is crucial for many downstream tasks. Empirically, GLM substantially outperforms BERT on the SuperGLUE natural language understanding benchmark with the same amount of pre-training data. Moreover, GLM with 1.25x parameters of BERT-Large achieves the best performance in NLU, conditional and unconditional generation at the same time, which demonstrates its generalizability to different downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2022

BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model

Pretrained language models have served as important backbones for natura...
research
02/14/2023

Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Pretrained large language models have become indispensable for solving v...
research
08/30/2021

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

Standard multi-task benchmarks are essential for driving the progress of...
research
06/19/2019

XLNet: Generalized Autoregressive Pretraining for Language Understanding

With the capability of modeling bidirectional contexts, denoising autoen...
research
10/03/2020

Mining Knowledge for Natural Language Inference from Wikipedia Categories

Accurate lexical entailment (LE) and natural language inference (NLI) of...
research
04/13/2021

Mediators in Determining what Processing BERT Performs First

Probing neural models for the ability to perform downstream tasks using ...
research
05/28/2021

Domain-Adaptive Pretraining Methods for Dialogue Understanding

Language models like BERT and SpanBERT pretrained on open-domain data ha...

Please sign up or login with your details

Forgot password? Click here to reset