Q^2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering

04/16/2021
by   Or Honovich, et al.
0

Neural knowledge-grounded generative models for dialogue often produce content that is factually inconsistent with the source text they rely on. As a consequence, such models are unreliable, limiting their real-world applicability. Inspired by recent work on evaluating factual consistency in abstractive summarization (Durmus et al., 2020; Wang et al., 2020), we propose an automatic evaluation metric for factual consistency in knowledge-grounded dialogue models using automatic question generation and question answering. Unlike previous works which use naive token-based comparison of answer spans, our metric makes use of co-reference resolution and natural language inference capabilities which greatly improve its performance. To foster proper evaluation, we curate a novel dataset of state-of-the-art dialogue system outputs for the Wizard-of-Wikipedia dataset (Dinan et al., 2019), which we manually annotate for factual consistency. We perform a thorough meta-evaluation of our metric against other metrics using the new dataset and two others, where it greatly outperforms the baselines.

READ FULL TEXT
research
10/20/2018

A Knowledge-Grounded Multimodal Search-Based Conversational Agent

Multimodal search-based dialogue is a challenging new task: It extends v...
research
08/28/2019

DeepCopy: Grounded Response Generation with Hierarchical Pointer Networks

Recent advances in neural sequence-to-sequence models have led to promis...
research
04/11/2022

TRUE: Re-evaluating Factual Consistency Evaluation

Grounded text generation systems often generate text that contains factu...
research
05/30/2017

Generative Models of Visually Grounded Imagination

It is easy for people to imagine what a man with pink hair looks like, e...
research
11/10/2019

Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training

Generative dialogue models currently suffer from a number of problems wh...
research
04/13/2020

Public Self-consciousness for Endowing Dialogue Agents with Consistent Persona

Although consistency has been a long-standing issue in dialogue agents, ...
research
08/23/2023

Evaluation of Faithfulness Using the Longest Supported Subsequence

As increasingly sophisticated language models emerge, their trustworthin...

Please sign up or login with your details

Forgot password? Click here to reset