STEP: Sequence-to-Sequence Transformer Pre-training for Document Summarization

04/04/2020
by   Yanyan Zou, et al.
0

Abstractive summarization aims to rewrite a long document to its shorter form, which is usually modeled as a sequence-to-sequence (Seq2Seq) learning problem. Seq2Seq Transformers are powerful models for this problem. Unfortunately, training large Seq2Seq Transformers on limited supervised summarization data is challenging. We, therefore, propose STEP (as shorthand for Sequence-to-Sequence Transformer Pre-training), which can be trained on large scale unlabeled documents. Specifically, STEP is pre-trained using three different tasks, namely sentence reordering, next sentence generation, and masked document generation. Experiments on two summarization datasets show that all three tasks can improve performance upon a heavily tuned large Seq2Seq Transformer which already includes a strong pre-trained encoder by a large margin. By using our best task to pre-train STEP, we outperform the best published abstractive model on CNN/DailyMail by 0.8 ROUGE-2 and New York Times by 2.4 ROUGE-2.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2019

HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Neural extractive summarization models usually employ a hierarchical enc...
research
04/15/2021

Hierarchical Learning for Generation with Long Source Sequences

One of the challenges for current sequence to sequence (seq2seq) models ...
research
01/13/2020

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training

In this paper, we present a new sequence-to-sequence pre-training model ...
research
10/16/2021

PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

Recently proposed pre-trained generation models achieve strong performan...
research
03/29/2020

Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models

Pre-trained sequence-to-sequence (seq-to-seq) models have significantly ...
research
06/07/2021

Attention Temperature Matters in Abstractive Summarization Distillation

Recent progress of abstractive text summarization largely relies on larg...
research
03/21/2022

DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization

Large-scale pre-trained sequence-to-sequence models like BART and T5 ach...

Please sign up or login with your details

Forgot password? Click here to reset