Redundancy Aware Multi-Reference Based Gainwise Evaluation of Extractive Summarization

08/04/2023
by   Mousumi Akter, et al.
0

While very popular for evaluating extractive summarization task, the ROUGE metric has long been criticized for its lack of semantic awareness and its ignorance about the ranking quality of the summarizer. Thanks to previous research that has addressed these issues by proposing a gain-based automated metric called Sem-nCG, which is both rank and semantic aware. However, Sem-nCG does not consider the amount of redundancy present in a model-generated summary and currently does not support evaluation with multiple reference summaries. Unfortunately, addressing both these limitations simultaneously is not trivial. Therefore, in this paper, we propose a redundancy-aware Sem-nCG metric and demonstrate how this new metric can be used to evaluate model summaries against multiple references. We also explore different ways of incorporating redundancy into the original metric through extensive experiments. Experimental results demonstrate that the new redundancy-aware metric exhibits a higher correlation with human judgments than the original Sem-nCG metric for both single and multiple reference scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2020

Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning

Evaluation of a document summarization system has been a critical factor...
research
06/26/2021

A Training-free and Reference-free Summarization Evaluation Metric via Centrality-weighted Relevance and Self-referenced Redundancy

In recent years, reference-based and supervised summarization evaluation...
research
07/25/2019

Summary Refinement through Denoising

We propose a simple method for post-processing the outputs of a text sum...
research
11/22/2019

Joint Learning of Answer Selection and Answer Summary Generation in Community Question Answering

Community question answering (CQA) gains increasing popularity in both a...
research
01/14/2022

Multi-Narrative Semantic Overlap Task: Evaluation and Benchmark

In this paper, we introduce an important yet relatively unexplored NLP t...
research
09/20/2019

Towards Neural Language Evaluators

We review three limitations of BLEU and ROUGE -- the most popular metric...
research
05/26/2023

UMSE: Unified Multi-scenario Summarization Evaluation

Summarization quality evaluation is a non-trivial task in text summariza...

Please sign up or login with your details

Forgot password? Click here to reset