DeepAI AI Chat
Log In Sign Up

Learning an Unreferenced Metric for Online Dialogue Evaluation

05/01/2020
by   Koustuv Sinha, et al.
0

Evaluating the quality of a dialogue interaction between two agents is a difficult task, especially in open-domain chit-chat style dialogue. There have been recent efforts to develop automatic dialogue evaluation metrics, but most of them do not generalize to unseen datasets and/or need a human-generated reference response during inference, making it infeasible for online evaluation. Here, we propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances, and leverages the temporal transitions that exist between them. We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.

READ FULL TEXT

page 10

page 11

page 12

04/30/2021

Evaluating Groundedness in Dialogue Systems: The BEGIN Benchmark

Knowledge-grounded dialogue agents are systems designed to conduct a con...
12/14/2020

Time to Transfer: Predicting and Evaluating Machine-Human Chatting Handoff

Is chatbot able to completely replace the human agent? The short answer ...
04/10/2020

Designing Precise and Robust Dialogue Response Evaluators

Automatic dialogue response evaluator has been proposed as an alternativ...
04/13/2021

On the Use of Linguistic Features for the Evaluation of Generative Dialogue Systems

Automatically evaluating text-based, non-task-oriented dialogue systems ...
11/01/2020

Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems

Many automatic evaluation metrics have been proposed to score the overal...
08/24/2020

How To Evaluate Your Dialogue System: Probe Tasks as an Alternative for Token-level Evaluation Metrics

Though generative dialogue modeling is widely seen as a language modelin...
04/06/2019

Evaluating Coherence in Dialogue Systems using Entailment

Evaluating open-domain dialogue systems is difficult due to the diversit...

Code Repositories

online_dialog_eval

Online Dialog Evaluation Metric - ACL Submission


view repo