Towards Fair Evaluation of Dialogue State Tracking by Flexible Incorporation of Turn-level Performances

04/07/2022
by   Suvodip Dey, et al.
0

Dialogue State Tracking (DST) is primarily evaluated using Joint Goal Accuracy (JGA) defined as the fraction of turns where the ground-truth dialogue state exactly matches the prediction. Generally in DST, the dialogue state or belief state for a given turn contains all the intents shown by the user till that turn. Due to this cumulative nature of the belief state, it is difficult to get a correct prediction once a misprediction has occurred. Thus, although being a useful metric, it can be harsh at times and underestimate the true potential of a DST model. Moreover, an improvement in JGA can sometimes decrease the performance of turn-level or non-cumulative belief state prediction due to inconsistency in annotations. So, using JGA as the only metric for model selection may not be ideal for all scenarios. In this work, we discuss various evaluation metrics used for DST along with their shortcomings. To address the existing issues, we propose a new evaluation metric named Flexible Goal Accuracy (FGA). FGA is a generalized version of JGA. But unlike JGA, it tries to give penalized rewards to mispredictions that are locally correct i.e. the root cause of the error is an earlier turn. By doing so, FGA considers the performance of both cumulative and turn-level prediction flexibly and provides a better insight than the existing metrics. We also show that FGA is a better discriminator of DST model performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2022

Mismatch between Multi-turn Dialogue and its Evaluation Metric in Dialogue State Tracking

Dialogue state tracking (DST) aims to extract essential information from...
research
06/02/2021

DynaEval: Unifying Turn and Dialogue Level Evaluation

A dialogue is essentially a multi-turn interaction among interlocutors. ...
research
12/03/2018

Toward Scalable Neural Dialogue State Tracking Model

The latency in the current neural based dialogue state tracking models p...
research
09/16/2020

Neural Dialogue State Tracking with Temporally Expressive Networks

Dialogue state tracking (DST) is an important part of a spoken dialogue ...
research
09/16/2019

Domain Transfer in Dialogue Systems without Turn-Level Supervision

Task oriented dialogue systems rely heavily on specialized dialogue stat...
research
11/10/2019

Efficient Dialogue State Tracking by Selectively Overwriting Memory

Recent works in dialogue state tracking (DST) focus on an open vocabular...
research
06/27/2023

C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation

Existing reference-free turn-level evaluation metrics for chatbots inade...

Please sign up or login with your details

Forgot password? Click here to reset