Dialogue Evaluation with Offline Reinforcement Learning

09/02/2022
by   Nurul Lubis, et al.
0

Task-oriented dialogue systems aim to fulfill user goals through natural language interactions. They are ideally evaluated with human users, which however is unattainable to do at every iteration of the development phase. Simulated users could be an alternative, however their development is nontrivial. Therefore, researchers resort to offline metrics on existing human-human corpora, which are more practical and easily reproducible. They are unfortunately limited in reflecting real performance of dialogue systems. BLEU for instance is poorly correlated with human judgment, and existing corpus-based metrics such as success rate overlook dialogue context mismatches. There is still a need for a reliable metric for task-oriented systems with good generalization and strong correlation with human judgements. In this paper, we propose the use of offline reinforcement learning for dialogue evaluation based on a static corpus. Such an evaluator is typically called a critic and utilized for policy optimization. We go one step further and show that offline RL critics can be trained on a static corpus for any dialogue system as external evaluators, allowing dialogue performance comparisons across various types of systems. This approach has the benefit of being corpus- and model-independent, while attaining strong correlation with human judgements, which we confirm via an interactive user trial.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2022

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

Conventionally, generation of natural language for dialogue agents may b...
research
05/17/2018

Neural User Simulation for Corpus-based Policy Optimisation for Spoken Dialogue Systems

User Simulators are one of the major tools that enable offline training ...
research
02/20/2021

Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach

Reliable automatic evaluation of dialogue systems under an interactive e...
research
03/10/2021

Causal-aware Safe Policy Improvement for Task-oriented dialogue

The recent success of reinforcement learning's (RL) in solving complex t...
research
09/29/2017

Learning how to learn: an adaptive dialogue agent for incrementally learning visually grounded word meanings

We present an optimised multi-modal dialogue agent for interactive learn...
research
07/23/2023

On the Effectiveness of Offline RL for Dialogue Response Generation

A common training technique for language models is teacher forcing (TF)....
research
06/10/2021

Shades of BLEU, Flavours of Success: The Case of MultiWOZ

The MultiWOZ dataset (Budzianowski et al.,2018) is frequently used for b...

Please sign up or login with your details

Forgot password? Click here to reset