Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

12/23/2022
by   Zuyue Fu, et al.
0

Motivated by the human-machine interaction such as training chatbots for improving customer satisfaction, we study human-guided human-machine interaction involving private information. We model this interaction as a two-player turn-based game, where one player (Alice, a human) guides the other player (Bob, a machine) towards a common goal. Specifically, we focus on offline reinforcement learning (RL) in this game, where the goal is to find a policy pair for Alice and Bob that maximizes their expected total rewards based on an offline dataset collected a priori. The offline setting presents two challenges: (i) We cannot collect Bob's private information, leading to a confounding bias when using standard RL methods, and (ii) a distributional mismatch between the behavior policy used to collect data and the desired policy we aim to learn. To tackle the confounding bias, we treat Bob's previous action as an instrumental variable for Alice's current decision making so as to adjust for the unmeasured confounding. We develop a novel identification result and use it to propose a new off-policy evaluation (OPE) method for evaluating policy pairs in this two-player turn-based game. To tackle the distributional mismatch, we leverage the idea of pessimism and use our OPE method to develop an off-policy learning algorithm for finding a desirable policy pair for both Alice and Bob. Finally, we prove that under mild assumptions such as partial coverage of the offline data, the policy pair obtained through our method converges to the optimal one at a satisfactory rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2022

Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

We study the offline reinforcement learning (RL) in the face of unmeasur...
research
05/26/2022

Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

We study offline reinforcement learning (RL) in partially observable Mar...
research
11/15/2022

Offline Reinforcement Learning with Adaptive Behavior Regularization

Offline reinforcement learning (RL) defines a sample-efficient learning ...
research
06/01/2023

Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding

A prominent challenge of offline reinforcement learning (RL) is the issu...
research
10/15/2022

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

Offline reinforcement learning (RL) methods can generally be categorized...
research
12/16/2020

Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation

Most of the existing deep reinforcement learning (RL) approaches for ses...
research
03/20/2023

A Unified Framework of Policy Learning for Contextual Bandit with Confounding Bias and Missing Observations

We study the offline contextual bandit problem, where we aim to acquire ...

Please sign up or login with your details

Forgot password? Click here to reset