Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols

06/10/2020
by   Sarah E. Finch, et al.
0

As conversational AI-based dialogue management has increasingly become a trending topic, the need for a standardized and reliable evaluation procedure grows even more pressing. The current state of affairs suggests various evaluation protocols to assess chat-oriented dialogue management systems, rendering it difficult to conduct fair comparative studies across different approaches and gain an insightful understanding of their values. To foster this research, a more robust evaluation protocol must be set in place. This paper presents a comprehensive synthesis of both automated and human evaluation methods on dialogue systems, identifying their shortcomings while accumulating evidence towards the most effective evaluation dimensions. A total of 20 papers from the last two years are surveyed to analyze three types of evaluation protocols: automated, static, and interactive. Finally, the evaluation dimensions used in these papers are compared against our expert evaluation on the system-user dialogue data collected from the Alexa Prize 2020.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2019

Measuring Conversational Fluidity in Automated Dialogue Agents

We present an automated evaluation method to measure fluidity in convers...
research
09/26/2019

Towards a Metric for Automated Conversational Dialogue System Evaluation and Improvement

We present "AutoJudge", an automated evaluation method for conversationa...
research
08/03/2021

How to Evaluate Your Dialogue Models: A Review of Approaches

Evaluating the quality of a dialogue system is an understudied problem. ...
research
09/29/2017

The First Evaluation of Chinese Human-Computer Dialogue Technology

In this paper, we introduce the first evaluation of Chinese human-comput...
research
11/19/2022

Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems

Automation of dialogue system evaluation is a driving force for the effi...
research
01/09/2018

Denotation Extraction for Interactive Learning in Dialogue Systems

This paper presents a novel task using real user data obtained in human-...
research
06/30/2021

An Analysis of the Recent Visibility of the SigDial Conference

Automated speech and text interfaces are continuing to improve, resultin...

Please sign up or login with your details

Forgot password? Click here to reset