Towards a Metric for Automated Conversational Dialogue System Evaluation and Improvement

09/26/2019
by   Jan Deriu, et al.
0

We present "AutoJudge", an automated evaluation method for conversational dialogue systems. The method works by first generating dialogues based on self-talk, i.e. dialogue systems talking to itself. Then, it uses human ratings on these dialogues to train an automated judgement model. Our experiments show that AutoJudge correlates well with the human ratings and can be used to automatically evaluate dialogue systems, even in deployed systems. In a second part, we attempt to apply AutoJudge to improve existing systems. This works well for re-ranking a set of candidate utterances. However, our experiments show that AutoJudge cannot be applied as reward for reinforcement learning, although the metric can distinguish good from bad dialogues. We discuss potential reasons, but state here already that this is still an open question for further research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2019

Measuring Conversational Fluidity in Automated Dialogue Agents

We present an automated evaluation method to measure fluidity in convers...
research
09/23/2019

Towards Best Experiment Design for Evaluating Dialogue System Output

To overcome the limitations of automated metrics (e.g. BLEU, METEOR) for...
research
06/10/2020

Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols

As conversational AI-based dialogue management has increasingly become a...
research
05/16/2023

Mirages: On Anthropomorphism in Dialogue Systems

Automated dialogue or conversational systems are anthropomorphised by de...
research
12/02/2018

A Study on Dialogue Reward Prediction for Open-Ended Conversational Agents

The amount of dialogue history to include in a conversational agent is o...
research
06/06/2022

Detecting Interlocutor Confusion in Situated Human-Avatar Dialogue: A Pilot Study

In order to enhance levels of engagement with conversational systems, ou...
research
11/19/2022

Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems

Automation of dialogue system evaluation is a driving force for the effi...

Please sign up or login with your details

Forgot password? Click here to reset