Log In Sign Up

Sequential Causal Imitation Learning with Unobserved Confounders

by   Daniel Kumor, et al.

"Monkey see monkey do" is an age-old adage, referring to naïve imitation without a deep understanding of a system's underlying mechanics. Indeed, if a demonstrator has access to information unavailable to the imitator (monkey), such as a different set of sensors, then no matter how perfectly the imitator models its perceived environment (See), attempting to reproduce the demonstrator's behavior (Do) can lead to poor outcomes. Imitation learning in the presence of a mismatch between demonstrator and imitator has been studied in the literature under the rubric of causal imitation learning (Zhang et al., 2020), but existing solutions are limited to single-stage decision-making. This paper investigates the problem of causal imitation learning in sequential settings, where the imitator must make multiple decisions per episode. We develop a graphical criterion that is necessary and sufficient for determining the feasibility of causal imitation, providing conditions when an imitator can match a demonstrator's performance despite differing capabilities. Finally, we provide an efficient algorithm for determining imitability and corroborate our theory with simulations.


page 1

page 2

page 3

page 4


Causal Imitation Learning with Unobserved Confounders

One of the common ways children learn is by mimicking adults. Imitation ...

Tracking the Race Between Deep Reinforcement Learning and Imitation Learning – Extended Version

Learning-based approaches for solving large sequential decision making p...

Deconfounded Imitation Learning

Standard imitation learning can fail when the expert demonstrators have ...

Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces

We propose a computationally efficient algorithm that combines compresse...

Learning to Generalize for Sequential Decision Making

We consider problems of making sequences of decisions to accomplish task...

Maximum Causal Tsallis Entropy Imitation Learning

In this paper, we propose a novel maximum causal Tsallis entropy (MCTE) ...

Feedback in Imitation Learning: The Three Regimes of Covariate Shift

Imitation learning practitioners have often noted that conditioning poli...