A PAC RL Algorithm for Episodic POMDPs

05/25/2016
by   Zhaohan Daniel Guo, et al.
0

Many interesting real world domains involve reinforcement learning (RL) in partially observable environments. Efficient learning in such domains is important, but existing sample complexity bounds for partially observable RL are at least exponential in the episode length. We give, to our knowledge, the first partially observable RL algorithm with a polynomial bound on the number of episodes on which the algorithm may not achieve near-optimal performance. Our algorithm is suitable for an important class of episodic POMDPs. Our approach builds on recent advances in method of moments for latent variable model estimation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2022

PAC Reinforcement Learning for Predictive State Representations

In this paper we study online Reinforcement Learning (RL) in partially o...
research
03/03/2023

POPGym: Benchmarking Partially Observable Reinforcement Learning

Real world applications of Reinforcement Learning (RL) are often partial...
research
01/09/2017

Reinforcement Learning via Recurrent Convolutional Neural Networks

Deep Reinforcement Learning has enabled the learning of policies for com...
research
12/10/2021

Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning

This paper proposes a new sequential model learning architecture to solv...
research
04/17/2018

On Improving Deep Reinforcement Learning for POMDPs

Deep Reinforcement Learning (RL) recently emerged as one of the most com...
research
05/23/2018

When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms

Efficient exploration is one of the key challenges for reinforcement lea...
research
12/23/2019

Variational Recurrent Models for Solving Partially Observable Control Tasks

In partially observable (PO) environments, deep reinforcement learning (...

Please sign up or login with your details

Forgot password? Click here to reset