Denoising based Sequence-to-Sequence Pre-training for Text Generation

08/22/2019
by   Liang Wang, et al.
0

This paper presents a new sequence-to-sequence (seq2seq) pre-training method PoDA (Pre-training of Denoising Autoencoders), which learns representations suitable for text generation tasks. Unlike encoder-only (e.g., BERT) or decoder-only (e.g., OpenAI GPT) pre-training approaches, PoDA jointly pre-trains both the encoder and decoder by denoising the noise-corrupted text, and it also has the advantage of keeping the network architecture unchanged in the subsequent fine-tuning stage. Meanwhile, we design a hybrid model of Transformer and pointer-generator networks as the backbone architecture for PoDA. We conduct experiments on two text generation tasks: abstractive summarization, and grammatical error correction. Results on four datasets show that PoDA can improve model performance over strong baselines without using any task-specific techniques and significantly speed up convergence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2022

GENIE: Large Scale Pre-training for Text Generation with Diffusion Model

In this paper, we propose a large-scale language pre-training for text G...
research
10/17/2022

Table-To-Text generation and pre-training with TabT5

Encoder-only transformer models have been successfully applied to differ...
research
07/10/2023

Enhancing Biomedical Text Summarization and Question-Answering: On the Utility of Domain-Specific Pre-Training

Biomedical summarization requires large datasets to train for text gener...
research
08/26/2021

Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

In this paper, we present a denoising sequence-to-sequence (seq2seq) aut...
research
02/12/2015

Convergence of gradient based pre-training in Denoising autoencoders

The success of deep architectures is at least in part attributed to the ...
research
02/18/2021

Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder

Many real-world applications use Siamese networks to efficiently match t...
research
07/16/2021

Beyond In-Place Corruption: Insertion and Deletion In Denoising Probabilistic Models

Denoising diffusion probabilistic models (DDPMs) have shown impressive r...

Please sign up or login with your details

Forgot password? Click here to reset