Stopping Criteria for Value Iteration on Stochastic Games with Quantitative Objectives

04/19/2023
by   Jan Křetínský, et al.
0

A classic solution technique for Markov decision processes (MDP) and stochastic games (SG) is value iteration (VI). Due to its good practical performance, this approximative approach is typically preferred over exact techniques, even though no practical bounds on the imprecision of the result could be given until recently. As a consequence, even the most used model checkers could return arbitrarily wrong results. Over the past decade, different works derived stopping criteria, indicating when the precision reaches the desired level, for various settings, in particular MDP with reachability, total reward, and mean payoff, and SG with reachability. In this paper, we provide the first stopping criteria for VI on SG with total reward and mean payoff, yielding the first anytime algorithms in these settings. To this end, we provide the solution in two flavours: First through a reduction to the MDP case and second directly on SG. The former is simpler and automatically utilizes any advances on MDP. The latter allows for more local computations, heading towards better practical efficiency. Our solution unifies the previously mentioned approaches for MDP and SG and their underlying ideas. To achieve this, we isolate objective-specific subroutines as well as identify objective-independent concepts. These structural concepts, while surprisingly simple, form the very essence of the unified solution.

READ FULL TEXT
research
09/18/2019

Stopping Criteria for Value and Strategy Iteration on Concurrent Stochastic Reachability Games

We consider concurrent stochastic games played on graphs with reachabili...
research
11/30/2022

The Smoothed Complexity of Policy Iteration for Markov Decision Processes

We show subexponential lower bounds (i.e., 2^Ω (n^c)) on the smoothed co...
research
04/15/2021

Stochastic Processes with Expected Stopping Time

Markov chains are the de facto finite-state model for stochastic dynamic...
research
10/19/2012

Implementation and Comparison of Solution Methods for Decision Processes with Non-Markovian Rewards

This paper examines a number of solution methods for decision processes ...
research
01/24/2023

A Practitioner's Guide to MDP Model Checking Algorithms

Model checking undiscounted reachability and expected-reward properties ...
research
07/29/2022

Optimistic and Topological Value Iteration for Simple Stochastic Games

While value iteration (VI) is a standard solution approach to simple sto...
research
08/07/2023

Generalized Early Stopping in Evolutionary Direct Policy Search

Lengthy evaluation times are common in many optimization problems such a...

Please sign up or login with your details

Forgot password? Click here to reset