Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric

06/03/2022
by   Ian Berlot-Attwell, et al.
0

In this work, we evaluate various existing dialogue relevance metrics, find strong dependency on the dataset, often with poor correlation with human scores of relevance, and propose modifications to reduce data requirements and domain sensitivity while improving correlation. Our proposed metric achieves state-of-the-art performance on the HUMOD dataset while reducing measured sensitivity to dataset by 37 pretrained language model, and using only 3,750 unannotated human dialogues and a single negative example. Despite these limitations, we demonstrate competitive performance on four datasets from different domains. Our code, including our metric and experiments, is open sourced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2022

ED-FAITH: Evaluating Dialogue Summarization on Faithfulness

Abstractive summarization models typically generate content unfaithful t...
research
03/28/2013

Relevance As a Metric for Evaluating Machine Learning Algorithms

In machine learning, the choice of a learning algorithm that is suitable...
research
06/19/2022

MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue

Automatic open-domain dialogue evaluation is a crucial component of dial...
research
04/06/2020

PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue Systems

Open-domain generative dialogue systems have attracted considerable atte...
research
06/06/2021

Semantic-Enhanced Explainable Finetuning for Open-Domain Dialogues

In this paper, we propose to combine pretrained language models with the...
research
04/13/2021

On the Use of Linguistic Features for the Evaluation of Generative Dialogue Systems

Automatically evaluating text-based, non-task-oriented dialogue systems ...
research
10/22/2022

EnDex: Evaluation of Dialogue Engagingness at Scale

We propose EnDex, the first human-reaction based model to evaluate dialo...

Please sign up or login with your details

Forgot password? Click here to reset