On Generating Extended Summaries of Long Documents

12/28/2020
by   Sajad Sotudeh, et al.
36

Prior work in document summarization has mainly focused on generating short summaries of a document. While this type of summary helps get a high-level view of a given document, it is desirable in some cases to know more detailed information about its salient points that can't fit in a short summary. This is typically the case for longer documents such as a research paper, legal document, or a book. In this paper, we present a new method for generating extended summaries of long papers. Our method exploits hierarchical structure of the documents and incorporates it into an extractive summarization model through a multi-task learning approach. We then present our results on three long summarization datasets, arXiv-Long, PubMed-Long, and Longsumm. Our method outperforms or matches the performance of strong baselines. Furthermore, we perform a comprehensive analysis over the generated results, shedding insights on future research for long-form summary generation task. Our analysis shows that our multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences across diverse sections. Our datasets, and codes are publicly available at https://github.com/Georgetown-IR-Lab/ExtendedSumm

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

09/07/2019

On Extractive and Abstractive Neural Document Summarization with Transformer Language Models

We present a method to produce abstractive summaries of long documents t...
07/30/2021

EmailSum: Abstractive Email Thread Summarization

Recent years have brought about an interest in the challenging task of s...
03/01/2021

Long Document Summarization in a Low Resource Setting using Pretrained Language Models

Abstractive summarization is the task of compressing a long document int...
04/13/2020

A Divide-and-Conquer Approach to the Summarization of Academic Articles

We present a novel divide-and-conquer method for the summarization of lo...
12/22/2020

NetReAct: Interactive Learning for Network Summarization

Generating useful network summaries is a challenging and important probl...
03/21/2022

HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization

Document structure is critical for efficient information consumption. Ho...
11/18/2021

SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

In the summarization domain, a key requirement for summaries is to be fa...

Code Repositories

ExtendedSumm

On Generating Extended Summaries of Long Documents


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.