Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity

11/12/2021
by   Ali Devran Kara, et al.
0

Reinforcement learning algorithms often require finiteness of state and action spaces in Markov decision processes (MDPs) and various efforts have been made in the literature towards the applicability of such algorithms for continuous state and action spaces. In this paper, we show that under very mild regularity conditions (in particular, involving only weak continuity of the transition kernel of an MDP), Q-learning for standard Borel MDPs via quantization of states and actions converge to a limit, and furthermore this limit satisfies an optimality equation which leads to near optimality with either explicit performance bounds or which are guaranteed to be asymptotically optimal. Our approach builds on (i) viewing quantization as a measurement kernel and thus a quantized MDP as a POMDP, (ii) utilizing near optimality and convergence results of Q-learning for POMDPs, and (iii) finally, near-optimality of finite state model approximations for MDPs with weakly continuous kernels which we show to correspond to the fixed point of the constructed POMDP. Thus, our paper presents a very general convergence and approximation result for the applicability of Q-learning for continuous MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2020

Near Optimality of Finite Memory Feedback Policies in Partially Observed Markov Decision Processes

In the theory of Partially Observed Markov Decision Processes (POMDPs), ...
research
03/22/2021

Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality of Learned Policies under Filter Stability

In this paper, for POMDPs, we provide the convergence of a Q learning al...
research
09/19/2017

Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning

In this paper, a sparse Markov decision process (MDP) with novel causal ...
research
12/18/2020

Voronoi Progressive Widening: Efficient Online Solvers for Continuous Space MDPs and POMDPs with Provably Optimal Components

Markov decision processes (MDPs) and partially observable MDPs (POMDPs) ...
research
05/20/2020

MDPs with Unawareness in Robotics

We formalize decision-making problems in robotics and automated control ...
research
09/30/2022

On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly-Communicating MDPs

We show two average-reward off-policy control algorithms, Differential Q...
research
10/10/2022

Generalized Optimality Guarantees for Solving Continuous Observation POMDPs through Particle Belief MDP Approximation

Partially observable Markov decision processes (POMDPs) provide a flexib...

Please sign up or login with your details

Forgot password? Click here to reset