Near Optimality of Finite Memory Feedback Policies in Partially Observed Markov Decision Processes

10/15/2020
by   Ali Devran Kara, et al.
0

In the theory of Partially Observed Markov Decision Processes (POMDPs), existence of optimal policies have in general been established via converting the original partially observed stochastic control problem to a fully observed one on the belief space, leading to a belief-MDP. However, computing an optimal policy for this fully observed model, and so for the original POMDP, using classical dynamic or linear programming methods is challenging even if the original system has finite state and action spaces, since the state space of the fully observed belief-MDP model is always uncountable. Furthermore, there exist very few rigorous approximation results, as regularity conditions needed often require a tedious study involving the spaces of probability measures leading to properties such as Feller continuity. In this paper, we rigorously establish near optimality of finite window control policies in POMDPs under mild non-linear filter stability conditions and the assumption that the measurement and action sets are finite (and the state space is real vector valued). We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound, where the rate of convergence is exponential under explicit and testable geometric filter stability conditions. While there exist many experimental results and few rigorous asymptotic convergence results, an explicit rate of convergence result is new in the literature, to our knowledge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2021

Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality of Learned Policies under Filter Stability

In this paper, for POMDPs, we provide the convergence of a Q learning al...
research
11/12/2021

Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity

Reinforcement learning algorithms often require finiteness of state and ...
research
05/27/2022

Solving infinite-horizon POMDPs with memoryless stochastic policies in state-action space

Reward optimization in fully observable Markov decision processes is equ...
research
02/20/2022

Learning to Control Partially Observed Systems with Finite Memory

We consider the reinforcement learning problem for partially observed Ma...
research
01/06/2019

Large-Scale Markov Decision Problems via the Linear Programming Dual

We consider the problem of controlling a fully specified Markov decision...
research
03/19/2021

Zero-Delay Lossy Coding of Linear Vector Markov Sources: Optimality of Stationary Codes and Near Optimality of Finite Memory Codes

Optimal zero-delay coding (quantization) of ℝ^d-valued linearly generate...
research
02/23/2023

Intermittently Observable Markov Decision Processes

This paper investigates MDPs with intermittent state information. We con...

Please sign up or login with your details

Forgot password? Click here to reset