QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization

12/16/2021
by   Alexander R. Fabbri, et al.
0

Factual consistency is an essential quality of text summarization models in practical settings. Existing work in evaluating this dimension can be broadly categorized into two lines of research, entailment-based metrics and question answering (QA)-based metrics. However, differing experimental setups presented in recent work lead to contrasting conclusions as to which paradigm performs best. In this work, we conduct an extensive comparison of entailment and QA-based metrics, demonstrating that carefully choosing the components of a QA-based metric is critical to performance. Building on those insights, we propose an optimized metric, which we call QAFactEval, that leads to a 15 average improvement over previous QA-based metrics on the SummaC factual consistency benchmark. Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance on this benchmark. Furthermore, we find that QA-based and entailment-based metrics offer complementary signals and combine the two into a single, learned metric for further performance boost. Through qualitative and quantitative analyses, we point to question generation and answerability classification as two critical components for future work in QA-based metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2022

Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics

Question answering-based summarization evaluation metrics must automatic...
research
10/06/2022

Just ClozE! A Fast and Simple Method for Evaluating the Factual Consistency in Abstractive Summarization

The issue of factual consistency in abstractive summarization has attrac...
research
10/01/2020

Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary

Recently, there has been growing interest in using question-answering (Q...
research
05/26/2023

AlignScore: Evaluating Factual Consistency with a Unified Alignment Function

Many text generation applications require the generated text to be factu...
research
10/14/2021

MoFE: Mixture of Factual Experts for Controlling Hallucinations in Abstractive Summarization

Neural abstractive summarization models are susceptible to generating fa...
research
05/18/2022

Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner

Large language models have achieved high performance on various question...
research
10/31/2022

RLET: A Reinforcement Learning Based Approach for Explainable QA with Entailment Trees

Interpreting the reasoning process from questions to answers poses a cha...

Please sign up or login with your details

Forgot password? Click here to reset