Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

03/02/2021
by   Ramakanth Pasunuru, et al.
2

The progress in Query-focused Multi-Document Summarization (QMDS) has been limited by the lack of sufficient largescale high-quality training datasets. We present two QMDS training datasets, which we construct using two data augmentation methods: (1) transferring the commonly used single-document CNN/Daily Mail summarization dataset to create the QMDSCNN dataset, and (2) mining search-query logs to create the QMDSIR dataset. These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries. To cover both these real summary and query aspects, we build abstractive end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets. We also introduce new hierarchical encoders that enable a more efficient encoding of the query together with multiple documents. Empirical results demonstrate that our data augmentation and encoding methods outperform baseline models on automatic metrics, as well as on human evaluations along multiple attributes.

READ FULL TEXT
research
11/08/2019

Transforming Wikipedia into Augmented Data for Query-Focused Summarization

The manual construction of a query-focused summarization corpus is costl...
research
10/23/2020

AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization

Summarization is the task of compressing source document(s) into coheren...
research
11/03/2020

WSL-DS: Weakly Supervised Learning with Distant Supervision for Query Focused Multi-Document Abstractive Summarization

In the Query Focused Multi-Document Summarization (QF-MDS) task, a set o...
research
10/15/2021

Aspect-Oriented Summarization through Query-Focused Extraction

A reader interested in a particular topic might be interested in summari...
research
05/31/2021

Text Summarization with Latent Queries

The availability of large-scale datasets has driven the development of n...
research
02/18/2020

Transfer Learning for Abstractive Summarization at Controllable Budgets

Summarizing a document within an allocated budget while maintaining its ...
research
10/07/2021

HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles

We present HowSumm, a novel large-scale dataset for the task of query-fo...

Please sign up or login with your details

Forgot password? Click here to reset