Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles

10/27/2020
by   Yao Lu, et al.
0

Multi-document summarization is a challenging task for which there exists little large-scale datasets. We propose Multi-XScience, a large-scale multi-document summarization dataset created from scientific articles. Multi-XScience introduces a challenging multi-document summarization task: writing the related-work section of a paper based on its abstract and the articles it references. Our work is inspired by extreme summarization, a dataset construction protocol that favours abstractive modeling approaches. Descriptive statistics and empirical results—using several state-of-the-art models trained on the Multi-XScience dataset—reveal that Multi-XScience is well suited for abstractive models.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/07/2021

HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles

We present HowSumm, a novel large-scale dataset for the task of query-fo...
06/04/2019

Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model

Automatic generation of summaries from multiple news articles is a valua...
07/19/2019

What is this Article about? Extreme Summarization with Topic-aware Convolutional Neural Networks

We introduce 'extreme summarization', a new single-document summarizatio...
12/13/2021

Keyphrase Generation Beyond the Boundaries of Title and Abstract

Keyphrase generation aims at generating phrases (keyphrases) that best d...
05/20/2020

A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal

Multi-document summarization (MDS) aims to compress the content in large...
08/27/2018

Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

We introduce extreme summarization, a new single-document summarization ...
04/18/2021

Generating Related Work

Communicating new research ideas involves highlighting similarities and ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.