DeepAI AI Chat
Log In Sign Up

Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems

by   Weiwei Sun, et al.

Evaluation is crucial in the development process of task-oriented dialogue systems. As an evaluation method, user simulation allows us to tackle issues such as scalability and cost-efficiency, making it a viable choice for large-scale automatic evaluation. To help build a human-like user simulator that can measure the quality of a dialogue, we propose the following task: simulating user satisfaction for the evaluation of task-oriented dialogue systems. The purpose of the task is to increase the evaluation power of user simulations and to make the simulation more human-like. To overcome a lack of annotated data, we propose a user satisfaction annotation dataset, USS, that includes 6,800 dialogues sampled from multiple domains, spanning real-world e-commerce dialogues, task-oriented dialogues constructed through Wizard-of-Oz experiments, and movie recommendation dialogues. All user utterances in those dialogues, as well as the dialogues themselves, have been labeled based on a 5-level satisfaction scale. We also share three baseline methods for user satisfaction prediction and action prediction tasks. Experiments conducted on the USS dataset suggest that distributed representations outperform feature-based methods. A model based on hierarchical GRUs achieves the best performance in in-domain user satisfaction prediction, while a BERT-based model has better cross-domain generalization ability.


page 1

page 2

page 3

page 4


CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset

To advance multi-domain (cross-domain) dialogue modeling as well as alle...

Understanding User Satisfaction with Task-oriented Dialogue Systems

Dialogue systems are evaluated depending on their type and purpose. Two ...

Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems

Task-oriented dialogue systems (TDSs) are assessed mainly in an offline ...

Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with User Simulator

Task-Oriented Dialogue (TOD) systems are drawing more and more attention...

A Transformer-Based User Satisfaction Prediction for Proactive Interaction Mechanism in DuerOS

Recently, spoken dialogue systems have been widely deployed in a variety...

Domain-Independent turn-level Dialogue Quality Evaluation via User Satisfaction Estimation

An automated metric to evaluate dialogue quality is vital for optimizing...

Endowing Empathetic Dialogue Systems with Personas

Empathetic dialogue systems have been shown to improve user satisfaction...