HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles

10/07/2021
by   Odellia Boni, et al.
0

We present HowSumm, a novel large-scale dataset for the task of query-focused multi-document summarization (qMDS), which targets the use-case of generating actionable instructions from a set of sources. This use-case is different from the use-cases covered in existing multi-document summarization (MDS) datasets and is applicable to educational and industrial scenarios. We employed automatic methods, and leveraged statistics from existing human-crafted qMDS datasets, to create HowSumm from wikiHow website articles and the sources they cite. We describe the creation of the dataset and discuss the unique features that distinguish it from other summarization corpora. Automatic and human evaluations of both extractive and abstractive summarization models on the dataset reveal that there is room for improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2020

Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles

Multi-document summarization is a challenging task for which there exist...
research
05/20/2020

A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal

Multi-document summarization (MDS) aims to compress the content in large...
research
06/04/2021

AgreeSum: Agreement-Oriented Multi-Document Summarization

We aim to renew interest in a particular multi-document summarization (M...
research
09/01/2020

SuperPAL: Supervised Proposition ALignment for Multi-Document Summarization and Derivative Sub-Tasks

Multi-document summarization (MDS) is a challenging task, often decompos...
research
10/27/2020

QBSUM: a Large-Scale Query-Based Document Summarization Dataset from Real-world Applications

Query-based document summarization aims to extract or generate a summary...
research
03/02/2021

Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

The progress in Query-focused Multi-Document Summarization (QMDS) has be...
research
12/16/2021

A Proposition-Level Clustering Approach for Multi-Document Summarization

Text clustering methods were traditionally incorporated into multi-docum...

Please sign up or login with your details

Forgot password? Click here to reset