DeepAI AI Chat
Log In Sign Up

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

by   Jian Yang, et al.
Beihang University
The University of Hong Kong

Pre-trained models have achieved remarkable success in natural language processing (NLP). However, existing pre-training methods underutilize the benefits of language understanding for generation. Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model. Our model, named as GanLM, is trained with two pre-training objectives: replaced token detection and replaced token denoising. Specifically, given masked source sentences, the generator outputs the target distribution and the discriminator predicts whether the target sampled tokens from distribution are incorrect. The target sentence is replaced with misclassified tokens to construct noisy previous context, which is used to generate the gold sentence. In general, both tasks improve the ability of language understanding and generation by selectively using the denoising data. Extensive experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models (PLMs) and achieves state-of-the-art performance.


page 1

page 2

page 3

page 4


SegaBERT: Pre-training of Segment-aware BERT for Language Understanding

Pre-trained language models have achieved state-of-the-art results in va...

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

This paper presents a new pre-trained language model, DeBERTaV3, which i...

MC-BERT: Efficient Language Pre-Training via a Meta Controller

Pre-trained contextual representations (e.g., BERT) have become the foun...

Learning to Sample Replacements for ELECTRA Pre-Training

ELECTRA pretrains a discriminator to detect replaced tokens, where the r...

Knowledge Transfer by Discriminative Pre-training for Academic Performance Prediction

The needs for precisely estimating a student's academic performance have...

Improving negation detection with negation-focused pre-training

Negation is a common linguistic feature that is crucial in many language...

AraELECTRA: Pre-Training Text Discriminators for Arabic Language Understanding

Advances in English language representation enabled a more sample-effici...