Transforming Wikipedia into Augmented Data for Query-Focused Summarization

by   Haichao Zhu, et al.

The manual construction of a query-focused summarization corpus is costly and timeconsuming. The limited size of existing datasets renders training data-driven summarization models challenging. In this paper, we use Wikipedia to automatically collect a large query-focused summarization dataset (named as WIKIREF) of more than 280,000 examples, which can serve as a means of data augmentation. Moreover, we develop a query-focused summarization model based on BERT to extract summaries from the documents. Experimental results on three DUC benchmarks show that the model pre-trained on WIKIREF has already achieved reasonable performance. After fine-tuning on the specific datasets, the model with data augmentation outperforms the state of the art on the benchmarks.



page 1


Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

The progress in Query-focused Multi-Document Summarization (QMDS) has be...

ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization

Neural models trained with large amount of parallel data have achieved i...

Aspect-Oriented Summarization through Query-Focused Extraction

A reader interested in a particular topic might be interested in summari...

Exploring Neural Models for Query-Focused Summarization

Query-focused summarization (QFS) aims to produce summaries that answer ...

Text Summarization with Latent Queries

The availability of large-scale datasets has driven the development of n...

AttSum: Joint Learning of Focusing and Summarization with Neural Attention

Query relevance ranking and sentence saliency ranking are the two main t...

Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization

This paper explores three simple data manipulation techniques (synthesis...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.