Hi Model, generating 'nice' instead of 'good' is not as bad as generating 'rice'! Towards Context and Semantic Infused Dialogue Generation Loss Function and Evaluation Metric

09/11/2023
by   Abhisek Tiwari, et al.
0

Over the past two decades, dialogue modeling has made significant strides, moving from simple rule-based responses to personalized and persuasive response generation. However, despite these advancements, the objective functions and evaluation metrics for dialogue generation have remained stagnant, i.e., cross-entropy and BLEU, respectively. These lexical-based metrics have the following key limitations: (a) word-to-word matching without semantic consideration: It assigns the same credit for failure to generate 'nice' and 'rice' for 'good'. (b) missing context attribute for evaluating the generated response: Even if a generated response is relevant to the ongoing dialogue context, it may still be penalized for not matching the gold utterance provided in the corpus. In this paper, we first investigate these limitations comprehensively and propose a new loss function called Semantic Infused Contextualized diaLogue (SemTextualLogue) loss function. Furthermore, we formulate a new evaluation metric called Dialuation, which incorporates both context relevance and semantic appropriateness while evaluating a generated response. We conducted experiments on two benchmark dialogue corpora, encompassing both task-oriented and open-domain scenarios. We found that the dialogue generation model trained with SemTextualLogue loss attained superior performance (in both quantitative and qualitative evaluation) compared to the traditional cross-entropy loss function across the datasets and evaluation metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2016

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

We investigate evaluation metrics for dialogue response generation syste...
research
10/11/2022

Measuring and Improving Semantic Diversity of Dialogue Generation

Response diversity has become an important criterion for evaluating the ...
research
06/20/2021

A Brief Study on the Effects of Training Generative Dialogue Models with a Semantic loss

Neural models trained for next utterance generation in dialogue task lea...
research
05/05/2022

Balancing Multi-Domain Corpora Learning for Open-Domain Response Generation

Open-domain conversational systems are assumed to generate equally good ...
research
10/04/2020

Generating Dialogue Responses from a Semantic Latent Space

Existing open-domain dialogue generation models are usually trained to m...
research
05/17/2023

FACE: Evaluating Natural Language Generation with Fourier Analysis of Cross-Entropy

Measuring the distance between machine-produced and human language is a ...
research
11/01/2020

Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems

Many automatic evaluation metrics have been proposed to score the overal...

Please sign up or login with your details

Forgot password? Click here to reset