Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

01/31/2022
by   Xuezhou Zhang, et al.
10

We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i.e., Block MDPs), where rich observations are generated from a set of unknown latent states. BRIEE interleaves latent states discovery, exploration, and exploitation together, and can provably learn a near-optimal policy with sample complexity scaling polynomially in the number of latent states, actions, and the time horizon, with no dependence on the size of the potentially infinite observation space. Empirically, we show that BRIEE is more sample efficient than the state-of-art Block MDP algorithm HOMER and other empirical RL baselines on challenging rich-observation combination lock problems that require deep exploration.

READ FULL TEXT
research
11/13/2019

Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

We present an algorithm, HOMER, for exploration and reinforcement learni...
research
11/11/2016

Reinforcement Learning in Rich-Observation MDPs using Spectral Methods

Designing effective exploration-exploitation algorithms in Markov decisi...
research
03/15/2020

Provably Efficient Exploration for RL with Unsupervised Learning

We study how to use unsupervised learning for efficient exploration in r...
research
10/17/2021

Provable RL with Exogenous Distractors via Multistep Inverse Dynamics

Many real-world applications of reinforcement learning (RL) require the ...
research
08/17/2022

Nearly Optimal Latent State Decoding in Block MDPs

We investigate the problems of model estimation and reward-free learning...
research
09/18/2023

Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles

The key assumption underlying linear Markov Decision Processes (MDPs) is...
research
01/25/2019

Provably efficient RL with Rich Observations via Latent State Decoding

We study the exploration problem in episodic MDPs with rich observations...

Please sign up or login with your details

Forgot password? Click here to reset