Transforming Wikipedia into Augmented Data for Query-Focused Summarization

11/08/2019
by   Haichao Zhu, et al.
0

The manual construction of a query-focused summarization corpus is costly and timeconsuming. The limited size of existing datasets renders training data-driven summarization models challenging. In this paper, we use Wikipedia to automatically collect a large query-focused summarization dataset (named as WIKIREF) of more than 280,000 examples, which can serve as a means of data augmentation. Moreover, we develop a query-focused summarization model based on BERT to extract summaries from the documents. Experimental results on three DUC benchmarks show that the model pre-trained on WIKIREF has already achieved reasonable performance. After fine-tuning on the specific datasets, the model with data augmentation outperforms the state of the art on the benchmarks.

READ FULL TEXT

Authors

page 1

03/02/2021

Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

The progress in Query-focused Multi-Document Summarization (QMDS) has be...
01/14/2022

ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization

Neural models trained with large amount of parallel data have achieved i...
10/15/2021

Aspect-Oriented Summarization through Query-Focused Extraction

A reader interested in a particular topic might be interested in summari...
12/14/2021

Exploring Neural Models for Query-Focused Summarization

Query-focused summarization (QFS) aims to produce summaries that answer ...
05/31/2021

Text Summarization with Latent Queries

The availability of large-scale datasets has driven the development of n...
04/01/2016

AttSum: Joint Learning of Focusing and Summarization with Neural Attention

Query relevance ranking and sentence saliency ranking are the two main t...
09/17/2021

Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization

This paper explores three simple data manipulation techniques (synthesis...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.