Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems

06/21/2019
by   Asma Ghandeharioun, et al.
MIT
3

Building an open-domain conversational agent is a challenging problem. Current evaluation methods, mostly post-hoc judgments of single-turn evaluation, do not capture conversation quality in a realistic interactive context. In this paper, we investigate interactive human evaluation and provide evidence for its necessity; we then introduce a novel, model-agnostic, and dataset-agnostic method to approximate it. In particular, we propose a self-play scenario where the dialog system talks to itself and we calculate a combination of proxies such as sentiment and semantic coherence on the conversation trajectory. We show that this metric is capable of capturing the human-rated quality of a dialog model better than any automated metric known to-date, achieving a significant Pearson correlation (r>.7, p<.05). To investigate the strengths of this novel metric and interactive evaluation in comparison to state-of-the-art metrics and one-turn evaluation, we perform extended experiments with a set of models, including several that make novel improvements to recent hierarchical dialog generation architectures through sentiment and semantic knowledge distillation on the utterance level. Finally, we open-source the interactive evaluation platform we built and the dataset we collected to allow researchers to efficiently deploy and evaluate generative dialog models.

READ FULL TEXT

page 2

page 6

page 7

page 9

page 10

page 11

page 14

page 17

01/11/2017

RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems

Open-domain human-computer conversation has been attracting increasing a...
05/01/2020

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

The lack of meaningful automatic evaluation metrics for dialog has imped...
09/17/2019

Hierarchical Reinforcement Learning for Open-Domain Dialog

Open-domain dialog generation is a challenging problem; maximum likeliho...
10/12/2021

We've had this conversation before: A Novel Approach to Measuring Dialog Similarity

Dialog is a core building block of human natural language interactions. ...
12/31/2020

Discovering Dialog Structure Graph for Open-Domain Dialog Generation

Learning interpretable dialog structure from human-human dialogs yields ...
09/12/2022

Open-Domain Dialog Evaluation using Follow-Ups Likelihood

Automatic evaluation of open-domain dialogs remains an unsolved problem....
09/11/2019

Proposal Towards a Personalized Knowledge-powered Self-play Based Ensemble Dialog System

This is the application document for the 2019 Amazon Alexa competition. ...

Code Repositories

neural_chat

Code to support training, evaluating and interacting neural network dialog models, and training them with reinforcement learning. Code to deploy a web server which hosts the models live online is available at: https://github.com/asmadotgh/neural_chat_web


view repo

neural_chat_web

The server portion of the Neural Chat project to deploy chatbots on web. This code is accompanied by another repository that includes the chatbot models: for training, evaluating and interacting with our open-sourced neural dialog models, use https://github.com/natashamjaques/neural_chat.


view repo

Please sign up or login with your details

Forgot password? Click here to reset