On the Use of Linguistic Features for the Evaluation of Generative Dialogue Systems

04/13/2021
by   Ian Berlot-Attwell, et al.
0

Automatically evaluating text-based, non-task-oriented dialogue systems (i.e., `chatbots') remains an open problem. Previous approaches have suffered challenges ranging from poor correlation with human judgment to poor generalization and have often required a gold standard reference for comparison or human-annotated data. Extending existing evaluation methods, we propose that a metric based on linguistic features may be able to maintain good correlation with human judgment and be interpretable, without requiring a gold-standard reference or human-annotated data. To support this proposition, we measure and analyze various linguistic features on dialogues produced by multiple dialogue models. We find that the features' behaviour is consistent with the known properties of the models tested, and is similar across domains. We also demonstrate that this approach exhibits promising properties such as zero-shot generalization to new domains on the related task of evaluating response relevance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2023

RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue

Evaluating open-domain dialogue systems is challenging for reasons such ...
research
06/29/2017

Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation

Automated metrics such as BLEU are widely used in the machine translatio...
research
05/01/2020

Learning an Unreferenced Metric for Online Dialogue Evaluation

Evaluating the quality of a dialogue interaction between two agents is a...
research
05/25/2023

Linguistic Properties of Truthful Response

We investigate the phenomenon of an LLM's untruthful response using a la...
research
06/03/2022

Relevance in Dialogue: Is Less More? An Empirical Comparison of Existing Metrics, and a Novel Simple Metric

In this work, we evaluate various existing dialogue relevance metrics, f...
research
12/18/2022

PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment

Chatbots are expected to be knowledgeable across multiple domains, e.g. ...
research
04/07/2018

Simple Models for Word Formation in English Slang

We propose generative models for three types of extra-grammatical word f...

Please sign up or login with your details

Forgot password? Click here to reset