Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics

04/21/2022
by   Daniel Deutsch, et al.
0

Question answering-based summarization evaluation metrics must automatically determine whether the QA model's prediction is correct or not, a task known as answer verification. In this work, we benchmark the lexical answer verification methods which have been used by current QA-based metrics as well as two more sophisticated text comparison methods, BERTScore and LERC. We find that LERC out-performs the other methods in some settings while remaining statistically indistinguishable from lexical overlap in others. However, our experiments reveal that improved verification performance does not necessarily translate to overall QA-based metric quality: In some scenarios, using a worse verification method – or using none at all – has comparable performance to using the best verification method, a result that we attribute to properties of the datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2021

QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization

Factual consistency is an essential quality of text summarization models...
research
04/15/2021

Towards Deconfounding the Influence of Subject's Demographic Characteristics in Question Answering

Question Answering (QA) tasks are used as benchmarks of general machine ...
research
10/06/2022

Just ClozE! A Fast and Simple Method for Evaluating the Factual Consistency in Abstractive Summarization

The issue of factual consistency in abstractive summarization has attrac...
research
04/08/2021

Video Question Answering with Phrases via Semantic Roles

Video Question Answering (VidQA) evaluation metrics have been limited to...
research
11/27/2020

FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

In this paper, we propose FFCI, a framework for automatic summarization ...
research
06/25/2022

Evaluation of Semantic Answer Similarity Metrics

There are several issues with the existing general machine translation o...
research
07/01/2022

Conditional Generation with a Question-Answering Blueprint

The ability to convey relevant and faithful information is critical for ...

Please sign up or login with your details

Forgot password? Click here to reset