Learning to Control Partially Observed Systems with Finite Memory

02/20/2022
by   Semih Cayci, et al.
0

We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain. We consider a natural actor-critic method that employs a finite internal memory for policy parameterization, and a multi-step temporal difference learning algorithm for policy evaluation. We establish, to the best of our knowledge, the first non-asymptotic global convergence of actor-critic methods for partially observed systems under function approximation. In particular, in addition to the function approximation and statistical errors that also arise in MDPs, we explicitly characterize the error due to the use of finite-state controllers. This additional error is stated in terms of the total variation distance between the traditional belief state in POMDPs and the posterior distribution of the hidden state when using a finite-state controller. Further, we show that this error can be made small in the case of sliding-block controllers by using larger block sizes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2012

A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies

We consider the estimation of the policy gradient in partially observabl...
research
10/18/2019

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Reinforcement learning, mathematically described by Markov Decision Prob...
research
07/19/2022

Actor-Critic based Improper Reinforcement Learning

We consider an improper reinforcement learning setting where a learner i...
research
06/24/2022

Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

We study Reinforcement Learning for partially observable dynamical syste...
research
10/15/2020

Near Optimality of Finite Memory Feedback Policies in Partially Observed Markov Decision Processes

In the theory of Partially Observed Markov Decision Processes (POMDPs), ...
research
07/06/2019

Entropic Regularization of Markov Decision Processes

An optimal feedback controller for a given Markov decision process (MDP)...
research
01/03/2022

Monitoring and Anomaly Detection Actor-Critic Based Controlled Sensing

We address the problem of monitoring a set of binary stochastic processe...

Please sign up or login with your details

Forgot password? Click here to reset