Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

06/05/2023
by   Saghar Adler, et al.
0

Models of many real-life applications, such as queuing models of communication networks or computing systems, have a countably infinite state-space. Algorithmic and learning procedures that have been developed to produce optimal policies mainly focus on finite state settings, and do not directly apply to these models. To overcome this lacuna, in this work we study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes (MDPs) governed by an unknown parameter θ∈Θ, and defined on a countably-infinite state space 𝒳=ℤ_+^d, with finite action space 𝒜, and an unbounded cost function. We take a Bayesian perspective with the random unknown parameter θ^* generated via a given fixed prior distribution on Θ. To optimally control the unknown MDP, we propose an algorithm based on Thompson sampling with dynamically-sized episodes: at the beginning of each episode, the posterior distribution formed via Bayes' rule is used to produce a parameter estimate, which then decides the policy applied during the episode. To ensure the stability of the Markov chain obtained by following the policy chosen for each parameter, we impose ergodicity assumptions. From this condition and using the solution of the average cost Bellman equation, we establish an Õ(√(|𝒜|T)) upper bound on the Bayesian regret of our algorithm, where T is the time-horizon. Finally, to elucidate the applicability of our algorithm, we consider two different queuing models with unknown dynamics, and show that our algorithm can be applied to develop approximately optimal control algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2020

Policy Learning of MDPs with Mixed Continuous/Discrete Variables: A Case Study on Model-Free Control of Markovian Jump Systems

Markovian jump linear systems (MJLS) are an important class of dynamical...
research
12/29/2017

Characterizing optimal hierarchical policy inference on graphs via non-equilibrium thermodynamics

Hierarchies are of fundamental interest in both stochastic optimal contr...
research
02/07/2010

A Minimum Relative Entropy Controller for Undiscounted Markov Decision Processes

Adaptive control problems are notoriously difficult to solve even in the...
research
06/08/2020

Stable Reinforcement Learning with Unbounded State Space

We consider the problem of reinforcement learning (RL) with unbounded st...
research
04/06/2019

A Bayesian Theory of Change Detection in Statistically Periodic Random Processes

A new class of stochastic processes called independent and periodically ...
research
05/10/2019

Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management

We consider a stochastic inventory control problem under censored demand...
research
05/08/2018

Deception in Optimal Control

In this paper, we consider an adversarial scenario where one agent seeks...

Please sign up or login with your details

Forgot password? Click here to reset