Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary

10/01/2020
by   Daniel Deutsch, et al.
0

Recently, there has been growing interest in using question-answering (QA) models to evaluate the content quality of summaries. While previous work has shown initial promising results in this direction, their experimentation has been limited, leading to a poor understanding of the utility of QA in evaluating summary content. In this work, we perform an extensive evaluation of a QA-based metric for summary content quality, calculating its performance with today's state-of-the-art models as well as estimating its potential upper-bound performance. We analyze a proposed metric, QAEval, which is more widely applicable than previous work. We show that QAEval already achieves state-of-the-art performance at scoring summarization systems, beating all other metrics including the gold-standard Pyramid Method, while its performance on individual summaries is at best competitive to other automatic metrics. Through a careful analysis of each component of QAEval, we identify the performance bottlenecks and estimate that with human-level performance, QAEval's summary-level results have the potential to approach that of the Pyramid Method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2020

FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization

Neural abstractive summarization models are prone to generate content in...
research
12/16/2021

QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization

Factual consistency is an essential quality of text summarization models...
research
11/15/2021

Question-Based Salient Span Selection for More Controllable Text Summarization

In this work, we propose a method for incorporating question-answering (...
research
05/24/2023

Is Summary Useful or Not? An Extrinsic Human Evaluation of Text Summaries on Downstream Tasks

Research on automated text summarization relies heavily on human and aut...
research
04/11/2019

Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation

Conducting a manual evaluation is considered an essential part of summar...
research
11/27/2020

FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

In this paper, we propose FFCI, a framework for automatic summarization ...
research
10/06/2022

Just ClozE! A Fast and Simple Method for Evaluating the Factual Consistency in Abstractive Summarization

The issue of factual consistency in abstractive summarization has attrac...

Please sign up or login with your details

Forgot password? Click here to reset