Models and algorithms for skip-free Markov decision processes on trees

09/17/2013
by   E. J. Collins, et al.
0

We introduce a class of models for multidimensional control problems which we call skip-free Markov decision processes on trees. We describe and analyse an algorithm applicable to Markov decision processes of this type that are skip-free in the negative direction. Starting with the finite average cost case, we show that the algorithm combines the advantages of both value iteration and policy iteration -- it is guaranteed to converge to an optimal policy and optimal value function after a finite number of iterations but the computational effort required for each iteration step is comparable with that for value iteration. We show that the algorithm can also be used to solve discounted cost models and continuous time models, and that a suitably modified algorithm can be used to solve communicating models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2011

Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes

Partially observable Markov decision processes (POMDPs) have recently be...
research
12/12/2016

Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes

This paper presents a new method to learn online policies in continuous ...
research
03/19/2023

Going faster to see further: GPU-accelerated value iteration and simulation for perishable inventory control using JAX

Value iteration can find the optimal replenishment policy for a perishab...
research
06/03/2020

Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes

We propose a principled kernel-based policy iteration algorithm to solve...
research
04/21/2021

Automatic model training under restrictive time constraints

We develop a hyperparameter optimisation algorithm, Automated Budget Con...
research
01/16/2015

Value Iteration with Options and State Aggregation

This paper presents a way of solving Markov Decision Processes that comb...
research
03/18/2023

Welfare Maximization Algorithm for Solving Budget-Constrained Multi-Component POMDPs

Partially Observable Markov Decision Processes (POMDPs) provide an effic...

Please sign up or login with your details

Forgot password? Click here to reset