Robust Asymmetric Learning in POMDPs

12/31/2020
by   Andrew Warrington, et al.
0

Policies for partially observed Markov decision processes can be efficiently learned by imitating policies for the corresponding fully observed Markov decision processes. Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and so may encourage actions that are sub-optimal, even unsafe, under partial information. We derive an objective to instead train the expert to maximize the expected reward of the imitating agent policy, and use it to construct an efficient algorithm, adaptive asymmetric DAgger (A2D), that jointly trains the expert and the agent. We show that A2D produces an expert policy that the agent can safely imitate, in turn outperforming policies learned by imitating a fixed expert.

READ FULL TEXT

page 6

page 8

research
03/24/2015

Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes

It is well known that for any finite state Markov decision process (MDP)...
research
10/26/2020

Expert Selection in High-Dimensional Markov Decision Processes

In this work we present a multi-armed bandit framework for online expert...
research
02/27/2019

Learning Factored Markov Decision Processes with Unawareness

Methods for learning and planning in sequential decision problems often ...
research
06/14/2018

Configurable Markov Decision Processes

In many real-world problems, there is the possibility to configure, to a...
research
01/13/2020

POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning

Many medical decision-making settings can be framed as partially observe...
research
05/15/2019

Exploration-Exploitation Trade-off in Reinforcement Learning on Online Markov Decision Processes with Global Concave Rewards

We consider an agent who is involved in a Markov decision process and re...
research
12/07/2022

Policy Transfer via Enhanced Action Space

Though transfer learning is promising to increase the learning efficienc...

Please sign up or login with your details

Forgot password? Click here to reset