Evaluating Groundedness in Dialogue Systems: The BEGIN Benchmark

04/30/2021
by   Nouha Dziri, et al.
0

Knowledge-grounded dialogue agents are systems designed to conduct a conversation based on externally provided background information, such as a Wikipedia page. Such dialogue agents, especially those based on neural network language models, often produce responses that sound fluent but are not justified by the background information. Progress towards addressing this problem requires developing automatic evaluation metrics that can quantify the extent to which responses are grounded in background information. To facilitate evaluation of such metrics, we introduce the Benchmark for Evaluation of Grounded INteraction (BEGIN). BEGIN consists of 8113 dialogue turns generated by language-model-based dialogue systems, accompanied by humans annotations specifying the relationship between the system's response and the background information. These annotations are based on an extension of the natural language inference paradigm. We use the benchmark to demonstrate the effectiveness of adversarially generated data for improving an evaluation metric based on existing natural language inference datasets.

READ FULL TEXT
research
05/01/2020

Learning an Unreferenced Metric for Online Dialogue Evaluation

Evaluating the quality of a dialogue interaction between two agents is a...
research
04/22/2022

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

The goal of information-seeking dialogue is to respond to seeker queries...
research
06/04/2021

Improving Computer Generated Dialog with Auxiliary Loss Functions and Custom Evaluation Metrics

Although people have the ability to engage in vapid dialogue without eff...
research
07/15/2023

A Dialogue System for Assessing Activities of Daily Living: Improving Consistency with Grounded Knowledge

In healthcare, the ability to care for oneself is reflected in the "Acti...
research
05/23/2023

WikiChat: A Few-Shot LLM-Based Chatbot Grounded with Wikipedia

Despite recent advances in Large Language Models (LLMs), users still can...
research
05/22/2023

Evaluating Pragmatic Abilities of Image Captioners on A3DS

Evaluating grounded neural language model performance with respect to pr...
research
06/16/2022

DialogueScript: Using Dialogue Agents to Produce a Script

We present a novel approach to generating scripts by using agents with d...

Please sign up or login with your details

Forgot password? Click here to reset