AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization

10/23/2020
by   Sayali Kulkarni, et al.
0

Summarization is the task of compressing source document(s) into coherent and succinct passages. This is a valuable tool to present users with concise and accurate sketch of the top ranked documents related to their queries. Query-based multi-document summarization (qMDS) addresses this pervasive need, but the research is severely limited due to lack of training and evaluation datasets as existing single-document and multi-document summarization datasets are inadequate in form and scale. We propose a scalable approach called AQuaMuSe to automatically mine qMDS examples from question answering datasets and large document corpora. Our approach is unique in the sense that it can general a dual dataset – for extractive and abstractive summaries both. We publicly release a specific instance of an AQuaMuSe dataset with 5,519 query-based summaries, each associated with an average of 6 input documents selected from an index of 355M documents from Common Crawl. Extensive evaluation of the dataset along with baseline summarization model experiments are provided.

READ FULL TEXT
research
05/12/2000

Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

We present a multi-document summarizer, called MEAD, which generates sum...
research
09/16/2023

ODSum: New Benchmarks for Open Domain Multi-Document Summarization

Open-domain Multi-Document Summarization (ODMDS) is a critical tool for ...
research
02/17/2020

GameWikiSum: a Novel Large Multi-Document Summarization Dataset

Today's research progress in the field of multi-document summarization i...
research
10/27/2020

QBSUM: a Large-Scale Query-Based Document Summarization Dataset from Real-world Applications

Query-based document summarization aims to extract or generate a summary...
research
04/14/2017

Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

Concept maps can be used to concisely represent important information an...
research
03/02/2021

Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

The progress in Query-focused Multi-Document Summarization (QMDS) has be...
research
11/28/2016

Improving Multi-Document Summarization via Text Classification

Developed so far, multi-document summarization has reached its bottlenec...

Please sign up or login with your details

Forgot password? Click here to reset