Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with User Simulator

10/26/2022
by   Qinyuan Cheng, et al.
0

Task-Oriented Dialogue (TOD) systems are drawing more and more attention in recent studies. Current methods focus on constructing pre-trained models or fine-tuning strategies while the evaluation of TOD is limited by a policy mismatch problem. That is, during evaluation, the user utterances are from the annotated dataset while these utterances should interact with previous responses which can have many alternatives besides annotated texts. Therefore, in this work, we propose an interactive evaluation framework for TOD. We first build a goal-oriented user simulator based on pre-trained models and then use the user simulator to interact with the dialogue system to generate dialogues. Besides, we introduce a sentence-level and a session-level score to measure the sentence fluency and session coherence in the interactive evaluation. Experimental results show that RL-based TOD systems trained by our proposed user simulator can achieve nearly 98 interactive evaluation of MultiWOZ dataset and the proposed scores measure the response quality besides the inform and success rates. We are hoping that our work will encourage simulator-based interactive evaluations in the TOD task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2019

A Corpus-free State2Seq User Simulator for Task-oriented Dialogue

Recent reinforcement learning algorithms for task-oriented dialogue syst...
research
06/04/2021

Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances

Nowadays, open-domain dialogue models can generate acceptable responses ...
research
04/02/2022

Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems

Task-oriented dialogue systems (TDSs) are assessed mainly in an offline ...
research
05/08/2021

Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems

Evaluation is crucial in the development process of task-oriented dialog...
research
10/17/2022

A Generative User Simulator with GPT-based Architecture and Goal State Tracking for Reinforced Multi-Domain Dialog Systems

Building user simulators (USs) for reinforcement learning (RL) of task-o...
research
04/01/2018

Joint Learning of Interactive Spoken Content Retrieval and Trainable User Simulator

User-machine interaction is crucial for information retrieval, especiall...
research
03/20/2022

Hierarchical Inductive Transfer for Continual Dialogue Learning

Pre-trained models have achieved excellent performance on the dialogue t...

Please sign up or login with your details

Forgot password? Click here to reset