TSTR: Too Short to Represent, Summarize with Details! Intro-Guided Extended Summary Generation

06/02/2022
by   Sajad Sotudeh, et al.
3

Many scientific papers such as those in arXiv and PubMed data collections have abstracts with varying lengths of 50-1000 words and average length of approximately 200 words, where longer abstracts typically convey more information about the source paper. Up to recently, scientific summarization research has typically focused on generating short, abstract-like summaries following the existing datasets used for scientific summarization. In domains where the source text is relatively long-form, such as in scientific documents, such summary is not able to go beyond the general and coarse overview and provide salient information from the source document. The recent interest to tackle this problem motivated curation of scientific datasets, arXiv-Long and PubMed-Long, containing human-written summaries of 400-600 words, hence, providing a venue for research in generating long/extended summaries. Extended summaries facilitate a faster read while providing details beyond coarse information. In this paper, we propose TSTR, an extractive summarizer that utilizes the introductory information of documents as pointers to their salient information. The evaluations on two existing large-scale extended summarization datasets indicate statistically significant improvement in terms of Rouge and average Rouge (F1) scores (except in one case) as compared to strong baselines and state-of-the-art. Comprehensive human evaluations favor our generated extended summaries in terms of cohesion and completeness.

READ FULL TEXT
research
12/28/2020

On Generating Extended Summaries of Long Documents

Prior work in document summarization has mainly focused on generating sh...
research
03/29/2022

LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents

Identifying keyphrases (KPs) from text documents is a fundamental task i...
research
01/30/2023

LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization

While human evaluation remains best practice for accurately judging the ...
research
10/04/2021

Leveraging Information Bottleneck for Scientific Document Summarization

This paper presents an unsupervised extractive approach to summarize sci...
research
02/09/2023

Generating a Structured Summary of Numerous Academic Papers: Dataset and Method

Writing a survey paper on one research topic usually needs to cover the ...
research
04/14/2017

Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

Concept maps can be used to concisely represent important information an...
research
01/10/2021

Summaformers @ LaySumm 20, LongSumm 20

Automatic text summarization has been widely studied as an important tas...

Please sign up or login with your details

Forgot password? Click here to reset