Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

06/24/2022
by   Masatoshi Uehara, et al.
26

We study Reinforcement Learning for partially observable dynamical systems using function approximation. We propose a new Partially Observable Bilinear Actor-Critic framework, that is general enough to include models such as observable tabular Partially Observable Markov Decision Processes (POMDPs), observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition. Under this framework, we propose an actor-critic style algorithm that is capable of performing agnostic policy learning. Given a policy class that consists of memory based policies (that look at a fixed-length window of recent observations), and a value function class that consists of functions taking both memory and future observations as inputs, our algorithm learns to compete against the best memory-based policy in the given policy class. For certain examples such as undercomplete observable tabular POMDPs, observable LQGs and observable POMDPs with latent low-rank transition, by implicitly leveraging their special properties, our algorithm is even capable of competing against the globally optimal policy without paying an exponential dependence on the horizon in its sample complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2022

PAC Reinforcement Learning for Predictive State Representations

In this paper we study online Reinforcement Learning (RL) in partially o...
research
07/04/2012

A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies

We consider the estimation of the policy gradient in partially observabl...
research
11/23/2019

Combined Model for Partially-Observable and Non-Observable Task Switching:Solving Hierarchical Reinforcement Learning Problems

An integral function of fully autonomous robots and humans is the abilit...
research
12/01/2013

Efficient Learning and Planning with Compressed Predictive States

Predictive state representations (PSRs) offer an expressive framework fo...
research
05/29/2018

The Actor Search Tree Critic (ASTC) for Off-Policy POMDP Learning in Medical Decision Making

Off-policy reinforcement learning enables near-optimal policy from subop...
research
06/24/2022

Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

We study reinforcement learning with function approximation for large-sc...
research
02/20/2022

Learning to Control Partially Observed Systems with Finite Memory

We consider the reinforcement learning problem for partially observed Ma...

Please sign up or login with your details

Forgot password? Click here to reset