Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation

06/29/2017
by   Shikhar Sharma, et al.
0

Automated metrics such as BLEU are widely used in the machine translation literature. They have also been used recently in the dialogue community for evaluating dialogue response generation. However, previous work in dialogue response generation has shown that these metrics do not correlate strongly with human judgment in the non task-oriented dialogue setting. Task-oriented dialogue responses are expressed on narrower domains and exhibit lower diversity. It is thus reasonable to think that these automated metrics would correlate well with human judgment in the task-oriented setting where the generation task consists of translating dialogue acts into a sentence. We conduct an empirical study to confirm whether this is the case. Our findings indicate that these automated metrics have stronger correlation with human judgments in the task-oriented setting compared to what has been observed in the non task-oriented setting. We also observe that these metrics correlate even better for datasets which provide multiple ground truth reference sentences. In addition, we show that some of the currently available corpora for task-oriented language generation can be solved with simple models and advocate for more challenging datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2016

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

We investigate evaluation metrics for dialogue response generation syste...
research
08/04/2023

Dataflow Dialogue Generation

We demonstrate task-oriented dialogue generation within the dataflow dia...
research
09/14/2023

Exploring the Impact of Human Evaluator Group on Chat-Oriented Dialogue Evaluation

Human evaluation has been widely accepted as the standard for evaluating...
research
04/13/2021

On the Use of Linguistic Features for the Evaluation of Generative Dialogue Systems

Automatically evaluating text-based, non-task-oriented dialogue systems ...
research
06/10/2021

A Template-guided Hybrid Pointer Network for Knowledge-basedTask-oriented Dialogue Systems

Most existing neural network based task-oriented dialogue systems follow...
research
03/02/2012

Establishing linguistic conventions in task-oriented primeval dialogue

In this paper, we claim that language is likely to have emerged as a mec...
research
08/15/2022

Efficient Task-Oriented Dialogue Systems with Response Selection as an Auxiliary Task

The adoption of pre-trained language models in task-oriented dialogue sy...

Please sign up or login with your details

Forgot password? Click here to reset