How to Find Strong Summary Coherence Measures? A Toolbox and a Comparative Study for Summary Coherence Measure Evaluation

09/14/2022
by   Julius Steen, et al.
0

Automatically evaluating the coherence of summaries is of great significance both to enable cost-efficient summarizer evaluation and as a tool for improving coherence by selecting high-scoring candidate summaries. While many different approaches have been suggested to model summary coherence, they are often evaluated using disparate datasets and metrics. This makes it difficult to understand their relative performance and identify ways forward towards better summary coherence modelling. In this work, we conduct a large-scale investigation of various methods for summary coherence modelling on an even playing field. Additionally, we introduce two novel analysis measures, intra-system correlation and bias matrices, that help identify biases in coherence measures and provide robustness against system-level confounders. While none of the currently available automatic coherence measures are able to assign reliable coherence scores to system summaries across all evaluation metrics, large-scale language models fine-tuned on self-supervised tasks show promising results, as long as fine-tuning takes into account that they need to generalize across different summary lengths.

READ FULL TEXT
research
05/19/2022

SNaC: Coherence Error Detection for Narrative Summarization

Progress in summarizing long texts is inhibited by the lack of appropria...
research
07/11/2022

SummScore: A Comprehensive Evaluation Metric for Summary Quality Based on Cross-Encoder

Text summarization models are often trained to produce summaries that me...
research
05/24/2023

Is Summary Useful or Not? An Extrinsic Human Evaluation of Text Summaries on Downstream Tasks

Research on automated text summarization relies heavily on human and aut...
research
05/24/2023

Neural Summarization of Electronic Health Records

Hospital discharge documentation is among the most essential, yet time-c...
research
12/29/2020

Is human scoring the best criteria for summary evaluation?

Normally, summary quality measures are compared with quality scores prod...
research
07/24/2017

Thread Reconstruction in Conversational Data using Neural Coherence Models

Discussion forums are an important source of information. They are often...
research
08/29/2019

Nearly-Linear uncertainty measures

Several easy to understand and computationally tractable imprecise proba...

Please sign up or login with your details

Forgot password? Click here to reset