Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

07/06/2017
by   Jan Křetínský, et al.
0

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related properties. Strategy iteration is one of the solution techniques applicable in this context. While in many other contexts it is the technique of choice due to advantages over e.g. value iteration, such as precision or possibility of domain-knowledge-aware initialization, it is rarely used for MDPs, since there it scales worse than value iteration. We provide several techniques that speed up strategy iteration by orders of magnitude for many MDPs, eliminating the performance disadvantage while preserving all its advantages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2013

A Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes

We present a technique for speeding up the convergence of value iteratio...
research
04/19/2020

Faster Algorithms for Quantitative Analysis of Markov Chains and Markov Decision Processes with Small Treewidth

Discrete-time Markov Chains (MCs) and Markov Decision Processes (MDPs) a...
research
06/09/2019

Toward Solving 2-TBSG Efficiently

2-TBSG is a two-player game model which aims to find Nash equilibriums a...
research
01/29/2021

Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure

We consider Markov Decision Processes (MDPs) in which every stationary p...
research
06/07/2022

Concentration bounds for SSP Q-learning for average cost MDPs

We derive a concentration bound for a Q-learning algorithm for average c...
research
06/20/2019

Max-Plus Matching Pursuit for Deterministic Markov Decision Processes

We consider deterministic Markov decision processes (MDPs) and apply max...
research
06/19/2019

Strategy Representation by Decision Trees with Linear Classifiers

Graph games and Markov decision processes (MDPs) are standard models in ...

Please sign up or login with your details

Forgot password? Click here to reset