DeepAI
Log In Sign Up

AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization

10/23/2020
by   Sayali Kulkarni, et al.
0

Summarization is the task of compressing source document(s) into coherent and succinct passages. This is a valuable tool to present users with concise and accurate sketch of the top ranked documents related to their queries. Query-based multi-document summarization (qMDS) addresses this pervasive need, but the research is severely limited due to lack of training and evaluation datasets as existing single-document and multi-document summarization datasets are inadequate in form and scale. We propose a scalable approach called AQuaMuSe to automatically mine qMDS examples from question answering datasets and large document corpora. Our approach is unique in the sense that it can general a dual dataset – for extractive and abstractive summaries both. We publicly release a specific instance of an AQuaMuSe dataset with 5,519 query-based summaries, each associated with an average of 6 input documents selected from an index of 355M documents from Common Crawl. Extensive evaluation of the dataset along with baseline summarization model experiments are provided.

READ FULL TEXT
02/17/2020

GameWikiSum: a Novel Large Multi-Document Summarization Dataset

Today's research progress in the field of multi-document summarization i...
10/27/2020

QBSUM: a Large-Scale Query-Based Document Summarization Dataset from Real-world Applications

Query-based document summarization aims to extract or generate a summary...
11/28/2016

Improving Multi-Document Summarization via Text Classification

Developed so far, multi-document summarization has reached its bottlenec...
04/14/2017

Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

Concept maps can be used to concisely represent important information an...
03/02/2021

Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

The progress in Query-focused Multi-Document Summarization (QMDS) has be...
05/08/2021

D2S: Document-to-Slide Generation Via Query-Based Text Summarization

Presentations are critical for communication in all areas of our lives, ...