Scalable and Safe Remediation of Defective Actions in Self-Learning Conversational Systems

05/17/2023
by   Sarthak Ahuja, et al.
0

Off-Policy reinforcement learning has been a driving force for the state-of-the-art conversational AIs leading to more natural humanagent interactions and improving the user satisfaction for goal-oriented agents. However, in large-scale commercial settings, it is often challenging to balance between policy improvements and experience continuity on the broad spectrum of applications handled by such system. In the literature, off-policy evaluation and guard-railing on aggregate statistics has been commonly used to address this problem. In this paper, we propose a method for curating and leveraging high-precision samples sourced from historical regression incident reports to validate, safe-guard, and improve policies prior to the online deployment. We conducted extensive experiments using data from a real-world conversational system and actual regression incidents. The proposed method is currently deployed in our production system to protect customers against broken experiences and enable long-term policy improvements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2022

Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems

Recently, self-learning methods based on user satisfaction metrics and c...
research
04/14/2022

Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems

Skill routing is an important component in large-scale conversational sy...
research
11/08/2021

Safe Optimal Design with Applications in Policy Learning

Motivated by practical needs in online experimentation and off-policy le...
research
10/21/2020

Self-Supervised Contrastive Learning for Efficient User Satisfaction Prediction in Conversational Agents

Turn-level user satisfaction is one of the most important performance me...
research
05/29/2020

Large-scale Hybrid Approach for Predicting User Satisfaction with Conversational Agents

Measuring user satisfaction level is a challenging task, and a critical ...
research
11/06/2019

Feedback-Based Self-Learning in Large-Scale Conversational AI Agents

Today, most large-scale conversational AI agents (e.g. Alexa, Siri, or G...
research
12/01/2016

Large-scale Validation of Counterfactual Learning Methods: A Test-Bed

The ability to perform effective off-policy learning would revolutionize...

Please sign up or login with your details

Forgot password? Click here to reset