On Polynomial Time PAC Reinforcement Learning with Rich Observations

03/01/2018
by   Christoph Dann, et al.
0

We study the computational tractability of provably sample-efficient (PAC) reinforcement learning in episodic environments with high-dimensional observations. We present new sample efficient algorithms for environments with deterministic hidden state dynamics but stochastic rich observations. These methods represent computationally efficient alternatives to prior algorithms that rely on enumerating exponentially many functions. We show that the only known statistically efficient algorithm for the more general stochastic transition setting requires NP-hard computation which cannot be implemented via standard optimization primitives. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2020

Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Partial observability is a common challenge in many reinforcement learni...
research
02/08/2016

PAC Reinforcement Learning with Rich Observations

We propose and study a new model for reinforcement learning with rich ob...
research
06/29/2020

Extracting Latent State Representations with Linear Dynamics from Rich Observations

Recently, many reinforcement learning techniques were shown to have prov...
research
12/09/2019

Optimism in Reinforcement Learning with Generalized Linear Function Approximation

We design a new provably efficient algorithm for episodic reinforcement ...
research
05/24/2023

Replicable Reinforcement Learning

The replicability crisis in the social, behavioral, and data sciences ha...
research
04/12/2023

Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL

We study the design of sample-efficient algorithms for reinforcement lea...
research
02/07/2020

Causally Correct Partial Models for Reinforcement Learning

In reinforcement learning, we can learn a model of future observations a...

Please sign up or login with your details

Forgot password? Click here to reset