Is human scoring the best criteria for summary evaluation?

12/29/2020
by   Oleg Vasilyev, et al.
0

Normally, summary quality measures are compared with quality scores produced by human annotators. A higher correlation with human scores is considered to be a fair indicator of a better measure. We discuss observations that cast doubt on this view. We attempt to show a possibility of an alternative indicator. Given a family of measures, we explore a criterion of selecting the best measure not relying on correlations with human scores. Our observations for the BLANC family of measures suggest that the criterion is universal across very different styles of summaries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2021

Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings

We propose a new reference-free summary quality evaluation measure, with...
research
02/23/2020

Fill in the BLANC: Human-free quality estimation of document summaries

We present BLANC, a new approach to the automatic estimation of document...
research
10/13/2020

Sensitivity of BLANC to human-scored qualities of text summaries

We explore the sensitivity of a document summary quality estimator, BLAN...
research
09/14/2022

How to Find Strong Summary Coherence Measures? A Toolbox and a Comparative Study for Summary Coherence Measure Evaluation

Automatically evaluating the coherence of summaries is of great signific...
research
03/27/2019

Rethinking the Evaluation of Video Summaries

Video summarization is a technique to create a short skim of the origina...
research
09/30/2022

Equity Scores for Public Transit Lines from Open-Data and Accessibility Measures

Current transit suffers from an evident inequity: the level of service o...
research
04/11/2022

Human vs Objective Evaluation of Colourisation Performance

Automatic colourisation of grey-scale images is the process of creating ...

Please sign up or login with your details

Forgot password? Click here to reset