Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

11/13/2019
by   Dipendra Misra, et al.
15

We present an algorithm, HOMER, for exploration and reinforcement learning in rich observation environments that are summarizable by an unknown latent state space. The algorithm interleaves representation learning to identify a new notion of kinematic state abstraction with strategic exploration to reach new states using the learned abstraction. The algorithm provably explores the environment with sample complexity scaling polynomially in the number of latent states and the time horizon, and, crucially, with no dependence on the size of the observation space, which could be infinitely large. This exploration guarantee further enables sample-efficient global policy optimization for any reward function. On the computational side, we show that the algorithm can be implemented efficiently whenever certain supervised learning problems are tractable. Empirically, we evaluate HOMER on a challenging exploration problem, where we show that the algorithm is exponentially more sample efficient than standard reinforcement learning baselines.

READ FULL TEXT
research
01/31/2022

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

We present BRIEE (Block-structured Representation learning with Interlea...
research
03/13/2023

Fast exploration and learning of latent graphs with aliased observations

Consider this scenario: an agent navigates a latent graph by performing ...
research
12/03/2019

Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

In many environments, only a relatively small subset of the complete sta...
research
08/29/2018

Approximate Exploration through State Abstraction

Although exploration in reinforcement learning is well understood from a...
research
03/01/2022

A Theory of Abstraction in Reinforcement Learning

Reinforcement learning defines the problem facing agents that learn to m...
research
10/30/2022

Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction

The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the...
research
11/20/2019

Bayesian Curiosity for Efficient Exploration in Reinforcement Learning

Balancing exploration and exploitation is a fundamental part of reinforc...

Please sign up or login with your details

Forgot password? Click here to reset