BookSum: A Collection of Datasets for Long-form Narrative Summarization

05/18/2021
by   Wojciech Kryściński, et al.
11

The majority of available text summarization datasets include short-form source documents that lack long-range causal and temporal dependencies, and often contain strong layout and stylistic biases. While relevant, such datasets will offer limited challenges for future generations of text summarization systems. We address these issues by introducing BookSum, a collection of datasets for long-form narrative summarization. Our dataset covers source documents from the literature domain, such as novels, plays and stories, and includes highly abstractive, human written summaries on three levels of granularity of increasing difficulty: paragraph-, chapter-, and book-level. The domain and structure of our dataset poses a unique set of challenges for summarization systems, which include: processing very long documents, non-trivial causal and temporal dependencies, and rich discourse structures. To facilitate future work, we trained and evaluated multiple extractive and abstractive summarization models as baselines for our dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2023

LoRaLay: A Multilingual and Multimodal Dataset for Long Range and Layout-Aware Summarization

Text Summarization is a popular task and an active area of research for ...
research
11/10/2022

CREATIVESUMM: Shared Task on Automatic Summarization for Creative Writing

This paper introduces the shared task of summarizing documents in severa...
research
10/30/2019

Discourse-Aware Neural Extractive Model for Text Summarization

Recently BERT has been adopted in state-of-the-art text summarization mo...
research
08/23/2019

Neural Text Summarization: A Critical Evaluation

Text summarization aims at compressing long documents into a shorter for...
research
02/12/2021

SumeCzech: Large Czech News-Based Summarization Dataset

Document summarization is a well-studied NLP task. With the emergence of...
research
01/30/2023

LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization

While human evaluation remains best practice for accurately judging the ...
research
08/08/2022

Abstractive Meeting Summarization: A Survey

Recent advances in deep learning, and especially the invention of encode...

Please sign up or login with your details

Forgot password? Click here to reset