An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics

07/03/2022
by   Huan Yee Koh, et al.
0

Long documents such as academic articles and business reports have been the standard format to detail out important issues and complicated subjects that require extra attention. An automatic summarization system that can effectively condense long documents into short and concise texts to encapsulate the most important information would thus be significant in aiding the reader's comprehension. Recently, with the advent of neural architectures, significant research efforts have been made to advance automatic text summarization systems, and numerous studies on the challenges of extending these systems to the long document domain have emerged. In this survey, we provide a comprehensive overview of the research on long document summarization and a systematic evaluation across the three principal components of its research setting: benchmark datasets, summarization models, and evaluation metrics. For each component, we organize the literature within the context of long document summarization and conduct an empirical analysis to broaden the perspective on current research progress. The empirical analysis includes a study on the intrinsic characteristics of benchmark datasets, a multi-dimensional analysis of summarization models, and a review of the summarization evaluation metrics. Based on the overall findings, we conclude by proposing possible directions for future exploration in this rapidly growing field.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2019

Neural Text Summarization: A Critical Evaluation

Text summarization aims at compressing long documents into a shorter for...
research
06/26/2020

Evaluation of Text Generation: A Survey

The paper surveys evaluation methods of natural language generation (NLG...
research
12/16/2022

Meeting Summarization: A Survey of the State of the Art

Information overloading requires the need for summarizers to extract sal...
research
07/12/2022

Are We Building on the Rock? On the Importance of Data Preprocessing for Code Summarization

Code summarization, the task of generating useful comments given the cod...
research
07/01/2015

Dimensionality on Summarization

Summarization is one of the key features of human intelligence. It plays...
research
09/12/2023

Content Reduction, Surprisal and Information Density Estimation for Long Documents

Many computational linguistic methods have been proposed to study the in...
research
03/23/2023

Is ChatGPT A Good Keyphrase Generator? A Preliminary Study

The emergence of ChatGPT has recently garnered significant attention fro...

Please sign up or login with your details

Forgot password? Click here to reset