Evaluating Coherence in Dialogue Systems using Entailment

04/06/2019
by   Nouha Dziri, et al.
0

Evaluating open-domain dialogue systems is difficult due to the diversity of possible correct answers. Automatic metrics such as BLEU correlate weakly with human annotations, resulting in a significant bias across different models and datasets. Some researchers resort to human judgment experimentation for assessing response quality, which is expensive, time consuming, and not scalable. Moreover, judges tend to evaluate a small number of dialogues, meaning that minor differences in evaluation configuration may lead to dissimilar results. In this paper, we present interpretable metrics for evaluating topic coherence by making use of distributed sentence representations. Furthermore, we introduce calculable approximations of human judgment based on conversational coherence by adopting state-of-the-art entailment techniques. Results show that our metrics can be used as a surrogate for human judgment, making it easy to evaluate dialogue systems on large-scale datasets and allowing an unbiased estimate for the quality of the responses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2020

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

Automatically evaluating dialogue coherence is a challenging but high-de...
research
08/23/2017

Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

Automatically evaluating the quality of dialogue responses for unstructu...
research
05/01/2020

Learning an Unreferenced Metric for Online Dialogue Evaluation

Evaluating the quality of a dialogue interaction between two agents is a...
research
02/22/2023

Topic-switch adapted Japanese Dialogue System based on PLATO-2

Large-scale open-domain dialogue systems such as PLATO-2 have achieved s...
research
10/15/2021

Jurassic is (almost) All You Need: Few-Shot Meaning-to-Text Generation for Open-Domain Dialogue

One challenge with open-domain dialogue systems is the need to produce h...
research
07/25/2018

Evaluating Creativity in Computational Co-Creative Systems

This paper provides a framework for evaluating creativity in co-creative...
research
06/30/2021

Evaluation of Thematic Coherence in Microblogs

Collecting together microblogs representing opinions about the same topi...

Please sign up or login with your details

Forgot password? Click here to reset