Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation

05/15/2020
by   Ryuichi Takanobu, et al.
0

There is a growing interest in developing goal-oriented dialog systems which serve users in accomplishing complex tasks through multi-turn conversations. Although many methods are devised to evaluate and improve the performance of individual dialog components, there is a lack of comprehensive empirical study on how different components contribute to the overall performance of a dialog system. In this paper, we perform a system-wise evaluation and present an empirical analysis on different types of dialog systems which are composed of different modules in different settings. Our results show that (1) a pipeline dialog system trained using fine-grained supervision signals at different component levels often obtains better performance than the systems that use joint or end-to-end models trained on coarse-grained labels, (2) component-wise, single-turn evaluation results are not always consistent with the overall performance of a dialog system, and (3) despite the discrepancy between simulators and human users, simulated evaluation is still a valid alternative to the costly human evaluation especially in the early stage of development.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2020

Unsupervised Evaluation of Interactive Dialog with DialoGPT

It is important to define meaningful and interpretable automatic evaluat...
research
06/09/2021

Joint System-Wise Optimization for Pipeline Goal-Oriented Dialog System

Recent work (Takanobu et al., 2020) proposed the system-wise evaluation ...
research
11/29/2022

BotSIM: An End-to-End Bot Simulation Toolkit for Commercial Task-Oriented Dialog Systems

We introduce BotSIM, a modular, open-source Bot SIMulation environment w...
research
05/24/2016

Learning End-to-End Goal-Oriented Dialog

Traditional dialog systems used in goal-oriented applications require a ...
research
10/05/2020

Effects of Naturalistic Variation in Goal-Oriented Dialog

Existing benchmarks used to evaluate the performance of end-to-end neura...
research
08/07/2020

Which Kind Is Better in Open-domain Multi-turn Dialog,Hierarchical or Non-hierarchical Models? An Empirical Study

Currently, open-domain generative dialog systems have attracted consider...
research
12/20/2022

Enhancing Task Bot Engagement with Synthesized Open-Domain Dialog

Many efforts have been made to construct dialog systems for different ty...

Please sign up or login with your details

Forgot password? Click here to reset