Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics

11/08/2020
by   Manik Bhandari, et al.
0

In text summarization, evaluating the efficacy of automatic metrics without human judgments has become recently popular. One exemplar work concludes that automatic metrics strongly disagree when ranking high-scoring summaries. In this paper, we revisit their experiments and find that their observations stem from the fact that metrics disagree in ranking summaries from any narrow scoring range. We hypothesize that this may be because summaries are similar to each other in a narrow scoring range and are thus, difficult to rank. Apart from the width of the scoring range of summaries, we analyze three other properties that impact inter-metric agreement - Ease of Summarization, Abstractiveness, and Coverage. To encourage reproducible research, we make all our analysis code and data publicly available.

READ FULL TEXT
research
10/14/2020

Re-evaluating Evaluation in Text Summarization

Automated evaluation metrics as a stand-in for manual evaluation are an ...
research
06/26/2019

User-Oriented Summaries Using a PSO Based Scoring Optimization Method

Automatic text summarization tools have a great impact on many fields, s...
research
11/13/2018

Text Assisted Insight Ranking Using Context-Aware Memory Network

Extracting valuable facts or informative summaries from multi-dimensiona...
research
10/25/2022

Universal Evasion Attacks on Summarization Scoring

The automatic scoring of summaries is important as it guides the develop...
research
10/27/2022

Improving abstractive summarization with energy-based re-ranking

Current abstractive summarization systems present important weaknesses w...
research
09/08/2022

Extractive is not Faithful: An Investigation of Broad Unfaithfulness Problems in Extractive Summarization

The problems of unfaithful summaries have been widely discussed under th...
research
06/21/2021

How well do you know your summarization datasets?

State-of-the-art summarization systems are trained and evaluated on mass...

Please sign up or login with your details

Forgot password? Click here to reset