TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising

01/03/2020
by   ZiYi Yang, et al.
0

Text summarization aims to extract essential information from a piece of text and transform it into a concise version. Existing unsupervised abstractive summarization models use recurrent neural networks framework and ignore abundant unlabeled corpora resources. In order to address these issues, we propose TED, a transformer-based unsupervised summarization system with pretraining on large-scale data. We first leverage the lead bias in news articles to pretrain the model on large-scale corpora. Then, we finetune TED on target domains through theme modeling and a denoising autoencoder to enhance the quality of summaries. Notably, TED outperforms all unsupervised abstractive baselines on NYT, CNN/DM and English Gigaword datasets with various document styles. Further analysis shows that the summaries generated by TED are abstractive and containing even higher proportions of novel tokens than those from supervised models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2020

Liputan6: A Large-scale Indonesian Dataset for Text Summarization

In this paper, we introduce a large-scale Indonesian summarization datas...
research
10/18/2018

Unsupervised Neural Text Simplification

The paper presents a first attempt towards unsupervised neural text simp...
research
06/18/2021

Subjective Bias in Abstractive Summarization

Due to the subjectivity of the summarization, it is a good practice to h...
research
12/25/2019

Make Lead Bias in Your Favor: A Simple and Effective Method for News Summarization

Lead bias is a common phenomenon in news summarization, where early part...
research
06/08/2019

Sentence Centrality Revisited for Unsupervised Summarization

Single document summarization has enjoyed renewed interests in recent ye...
research
02/11/2020

Two Huge Title and Keyword Generation Corpora of Research Articles

Recent developments in sequence-to-sequence learning with neural network...
research
09/07/2018

Unsupervised Sentence Compression using Denoising Auto-Encoders

In sentence compression, the task of shortening sentences while retainin...

Please sign up or login with your details

Forgot password? Click here to reset