EnDex: Evaluation of Dialogue Engagingness at Scale

10/22/2022
by   Guangxuan Xu, et al.
3

We propose EnDex, the first human-reaction based model to evaluate dialogue engagingness. EnDex is trained on 80k Reddit-based Engagement Dataset (RED) curated using a novel distant-supervision framework. Engagingness is a key measure that captures high-level quality of AI dialogue systems and closely reflects actual user experience. However, data shortage, plus the abstract and extensive definition of engagingness makes it challenging to develop an automatic metric. Our work departs from mainstream approaches that use synthetic negative examples to train binary classifiers, and instead, proposes a solution using distant-supervision from human-reaction feedback. To support the soundness of our EnDex metric, we offer a theoretical foundation for engagement, an extensive ablation study, and empirical evidence of high correlation on five engagingness related datasets. We will release code, off-the-shelf EnDex model, and a large-scale dataset upon paper publication to facilitate future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2019

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

User engagement is a critical metric for evaluating the quality of open-...
research
10/08/2020

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

Automatically evaluating dialogue coherence is a challenging but high-de...
research
03/18/2022

DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations

Automatic evaluation metrics are essential for the rapid development of ...
research
02/22/2023

Topic-switch adapted Japanese Dialogue System based on PLATO-2

Large-scale open-domain dialogue systems such as PLATO-2 have achieved s...
research
06/03/2022

Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric

In this work, we evaluate various existing dialogue relevance metrics, f...
research
05/23/2023

How to Choose How to Choose Your Chatbot: A Massively Multi-System MultiReference Data Set for Dialog Metric Evaluation

We release MMSMR, a Massively Multi-System MultiReference dataset to ena...

Please sign up or login with your details

Forgot password? Click here to reset