On Generating Extended Summaries of Long Documents

by   Sajad Sotudeh, et al.

Prior work in document summarization has mainly focused on generating short summaries of a document. While this type of summary helps get a high-level view of a given document, it is desirable in some cases to know more detailed information about its salient points that can't fit in a short summary. This is typically the case for longer documents such as a research paper, legal document, or a book. In this paper, we present a new method for generating extended summaries of long papers. Our method exploits hierarchical structure of the documents and incorporates it into an extractive summarization model through a multi-task learning approach. We then present our results on three long summarization datasets, arXiv-Long, PubMed-Long, and Longsumm. Our method outperforms or matches the performance of strong baselines. Furthermore, we perform a comprehensive analysis over the generated results, shedding insights on future research for long-form summary generation task. Our analysis shows that our multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences across diverse sections. Our datasets, and codes are publicly available at https://github.com/Georgetown-IR-Lab/ExtendedSumm



There are no comments yet.


page 7


On Extractive and Abstractive Neural Document Summarization with Transformer Language Models

We present a method to produce abstractive summaries of long documents t...

EmailSum: Abstractive Email Thread Summarization

Recent years have brought about an interest in the challenging task of s...

Long Document Summarization in a Low Resource Setting using Pretrained Language Models

Abstractive summarization is the task of compressing a long document int...

A Divide-and-Conquer Approach to the Summarization of Academic Articles

We present a novel divide-and-conquer method for the summarization of lo...

NetReAct: Interactive Learning for Network Summarization

Generating useful network summaries is a challenging and important probl...

HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization

Document structure is critical for efficient information consumption. Ho...

SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

In the summarization domain, a key requirement for summaries is to be fa...

Code Repositories


On Generating Extended Summaries of Long Documents

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.