On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

10/13/2021
by   Guy Tennenholtz, et al.
0

We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning. We begin by defining the problem of learning from confounded expert data in a contextual MDP setup. We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup. We then discuss the problem of distribution shift between the expert data and the online environment when the data is only partially observable. We prove possibility and impossibility results for imitation learning under arbitrary distribution shift of the missing covariates. When additional external reward is provided, we propose a sampling procedure that addresses the unknown shift and prove convergence to an optimal solution. Finally, we validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.

READ FULL TEXT

page 2

page 14

research
02/06/2023

DITTO: Offline Imitation Learning with World Models

We propose DITTO, an offline imitation learning algorithm which uses wor...
research
08/12/2022

Causal Imitation Learning with Unobserved Confounders

One of the common ways children learn is by mimicking adults. Imitation ...
research
07/23/2020

Bridging the Imitation Gap by Adaptive Insubordination

Why do agents often obtain better reinforcement learning policies when i...
research
02/12/2021

Scalable Bayesian Inverse Reinforcement Learning

Bayesian inference over the reward presents an ideal solution to the ill...
research
04/03/2018

Learning to Search via Self-Imitation

We study the problem of learning a good search policy. To do so, we prop...
research
06/22/2019

Learning Belief Representations for Imitation Learning in POMDPs

We consider the problem of imitation learning from expert demonstrations...
research
09/24/2019

Avoidance Learning Using Observational Reinforcement Learning

Imitation learning seeks to learn an expert policy from sampled demonstr...

Please sign up or login with your details

Forgot password? Click here to reset