DynaEval: Unifying Turn and Dialogue Level Evaluation

06/02/2021
by   Chen Zhang, et al.
5

A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation metrics should reflect the dynamics of such interaction. Existing automatic metrics are focused very much on the turn-level quality, while ignoring such dynamics. To this end, we propose DynaEval, a unified automatic evaluation framework which is not only capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue. In DynaEval, the graph convolutional network (GCN) is adopted to model a dialogue in totality, where the graph nodes denote each individual utterance and the edges represent the dependency between pairs of utterances. A contrastive loss is then applied to distinguish well-formed dialogues from carefully constructed negative samples. Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model, and correlates strongly with human judgements across multiple dialogue evaluation aspects at both turn and dialogue level.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2020

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

Automatically evaluating dialogue coherence is a challenging but high-de...
research
03/18/2022

DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations

Automatic evaluation metrics are essential for the rapid development of ...
research
06/27/2023

C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation

Existing reference-free turn-level evaluation metrics for chatbots inade...
research
04/07/2022

Towards Fair Evaluation of Dialogue State Tracking by Flexible Incorporation of Turn-level Performances

Dialogue State Tracking (DST) is primarily evaluated using Joint Goal Ac...
research
09/06/2019

ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons

While dialogue remains an important end-goal of natural language researc...
research
08/22/2019

A Neural Model for Dialogue Coherence Assessment

Dialogue quality assessment is crucial for evaluating dialogue agents. A...
research
01/12/2020

Stochastic Natural Language Generation Using Dependency Information

This article presents a stochastic corpus-based model for generating nat...

Please sign up or login with your details

Forgot password? Click here to reset