Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

by   Jan Křetínský, et al.

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related properties. Strategy iteration is one of the solution techniques applicable in this context. While in many other contexts it is the technique of choice due to advantages over e.g. value iteration, such as precision or possibility of domain-knowledge-aware initialization, it is rarely used for MDPs, since there it scales worse than value iteration. We provide several techniques that speed up strategy iteration by orders of magnitude for many MDPs, eliminating the performance disadvantage while preserving all its advantages.


page 1

page 2

page 3

page 4


A Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes

We present a technique for speeding up the convergence of value iteratio...

Faster Algorithms for Quantitative Analysis of Markov Chains and Markov Decision Processes with Small Treewidth

Discrete-time Markov Chains (MCs) and Markov Decision Processes (MDPs) a...

Concentration bounds for SSP Q-learning for average cost MDPs

We derive a concentration bound for a Q-learning algorithm for average c...

Toward Solving 2-TBSG Efficiently

2-TBSG is a two-player game model which aims to find Nash equilibriums a...

Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure

We consider Markov Decision Processes (MDPs) in which every stationary p...

Strategy Representation by Decision Trees with Linear Classifiers

Graph games and Markov decision processes (MDPs) are standard models in ...

Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs - Extended Version

In this paper we propose an algorithm for polynomial-time reinforcement ...