Open-Domain Dialog Evaluation using Follow-Ups Likelihood

09/12/2022
by   Maxime De Bruyn, et al.
0

Automatic evaluation of open-domain dialogs remains an unsolved problem. Moreover, existing methods do not correlate strongly with human annotations. This paper presents a new automated evaluation method using follow-ups: we measure the probability that a language model will continue the conversation with a fixed set of follow-ups (e.g., not really relevant here, what are you trying to say). When compared against twelve existing methods, our new evaluation achieves the highest correlation with human evaluations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/11/2017

RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems

Open-domain human-computer conversation has been attracting increasing a...
research
07/24/2019

Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References

The aim of this paper is to mitigate the shortcomings of automatic evalu...
research
06/22/2022

GODEL: Large-Scale Pre-Training for Goal-Directed Dialog

We introduce GODEL (Grounded Open Dialogue Language Model), a large pre-...
research
06/21/2019

Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems

Building an open-domain conversational agent is a challenging problem. C...
research
02/20/2021

Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach

Reliable automatic evaluation of dialogue systems under an interactive e...
research
12/18/2022

Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems

There has been great recent advancement in human-computer chat. However,...
research
10/13/2020

On the Efficiency of K-Means Clustering: Evaluation, Optimization, and Algorithm Selection

This paper presents a thorough evaluation of the existing methods that a...

Please sign up or login with your details

Forgot password? Click here to reset