Transforming Wikipedia into Augmented Data for Query-Focused Summarization

by   Haichao Zhu, et al.

The manual construction of a query-focused summarization corpus is costly and timeconsuming. The limited size of existing datasets renders training data-driven summarization models challenging. In this paper, we use Wikipedia to automatically collect a large query-focused summarization dataset (named as WIKIREF) of more than 280,000 examples, which can serve as a means of data augmentation. Moreover, we develop a query-focused summarization model based on BERT to extract summaries from the documents. Experimental results on three DUC benchmarks show that the model pre-trained on WIKIREF has already achieved reasonable performance. After fine-tuning on the specific datasets, the model with data augmentation outperforms the state of the art on the benchmarks.


OASum: Large-Scale Open Domain Aspect-based Summarization

Aspect or query-based summarization has recently caught more attention, ...

Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

The progress in Query-focused Multi-Document Summarization (QMDS) has be...

ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization

Neural models trained with large amount of parallel data have achieved i...

Few-shot Query-Focused Summarization with Prefix-Merging

Query-focused summarization has been considered as an important extensio...

Aspect-Oriented Summarization through Query-Focused Extraction

A reader interested in a particular topic might be interested in summari...

Text Summarization with Latent Queries

The availability of large-scale datasets has driven the development of n...

Exploring Neural Models for Query-Focused Summarization

Query-focused summarization (QFS) aims to produce summaries that answer ...

Please sign up or login with your details

Forgot password? Click here to reset