How Far are We from Robust Long Abstractive Summarization?

10/30/2022
by   Huan Yee Koh, et al.
0

Abstractive summarization has made tremendous progress in recent years. In this work, we perform fine-grained human annotations to evaluate long document abstractive summarization systems (i.e., models and metrics) with the aim of implementing them to generate reliable summaries. For long document abstractive models, we show that the constant strive for state-of-the-art ROUGE results can lead us to generate more relevant summaries but not factual ones. For long document evaluation metrics, human evaluation results show that ROUGE remains the best at evaluating the relevancy of a summary. It also reveals important limitations of factuality metrics in detecting different types of factual errors and the reasons behind the effectiveness of BARTScore. We then suggest promising directions in the endeavor of developing factual consistency metrics. Finally, we release our annotated long document dataset with the hope that it can contribute to the development of metrics across a broader range of summarization settings.

READ FULL TEXT
research
07/30/2021

EmailSum: Abstractive Email Thread Summarization

Recent years have brought about an interest in the challenging task of s...
research
05/19/2022

SNaC: Coherence Error Detection for Narrative Summarization

Progress in summarizing long texts is inhibited by the lack of appropria...
research
03/23/2021

SAFEval: Summarization Asks for Fact-based Evaluation

Summarization evaluation remains an open research problem: current metri...
research
05/08/2021

D2S: Document-to-Slide Generation Via Query-Based Text Summarization

Presentations are critical for communication in all areas of our lives, ...
research
05/02/2020

On Faithfulness and Factuality in Abstractive Summarization

It is well known that the standard likelihood training and approximate d...
research
12/20/2022

BUMP: A Benchmark of Unfaithful Minimal Pairs for Meta-Evaluation of Faithfulness Metrics

The proliferation of automatic faithfulness metrics for summarization ha...
research
01/30/2023

LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization

While human evaluation remains best practice for accurately judging the ...

Please sign up or login with your details

Forgot password? Click here to reset