Condition numbers of stochastic mean payoff games and what they say about nonarchimedean semidefinite programming

02/21/2018 ∙ by Xavier Allamigeon, et al. ∙ 0

Semidefinite programming can be considered over any real closed field, including fields of Puiseux series equipped with their nonarchimedean valuation. Nonarchimedean semidefinite programs encode parametric families of classical semidefinite programs, for sufficiently large values of the parameter. Recently, a correspondence has been established between nonarchimedean semidefinite programs and stochastic mean payoff games with perfect information. This correspondence relies on tropical geometry. It allows one to solve generic nonarchimedean semidefinite feasibility problems, of large scale, by means of stochastic game algorithms. In this paper, we show that the mean payoff of these games can be interpreted as a condition number for the corresponding nonarchimedean feasibility problems. This number measures how close a feasible instance is from being infeasible, and vice versa. We show that it coincides with the maximal radius of a ball in Hilbert's projective metric, that is included in the feasible set. The geometric interpretation of the condition number relies in particular on a duality theorem for tropical semidefinite feasibility programs. Then, we bound the complexity of the feasibility problem in terms of the condition number. We finally give explicit bounds for this condition number, in terms of the characteristics of the stochastic game. As a consequence, we show that the simplest algorithm to decide whether a stochastic mean payoff game is winning, namely value iteration, has a pseudopolynomial complexity when the number of random positions is fixed.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

1.1. Motivation

Semidefinite programming (SDP) consists in optimizing a linear function over a spectrahedron, the latter being the intersection of a cone of positive semidefinite matrices with an affine space. Semidefinite programs arise in a number of applications from engineering sciences and combinatorial optimization. We refer the reader to 

[BPT13, GM12] for more background on the theory and applications of semidefinite programming.

Spectrahedra form a class of convex semialgebraic sets. Even though these sets are usually defined over the field of real numbers, their definition is meaningful over any real closed field. In particular, the complexity of SDP and related questions can be investigated over real closed nonarchimedean fields, like fields of Puiseux series. Such nonarchimedean SDP instances, which arise in perturbation theory, encode parametric families of classical SDP instances (over the reals), for large enough (or small enough) values of the parameter. The study of the nonarchimedean case is also motivated by unsettled questions concerning the complexity of ordinary SDP. Indeed, the latter are solvable in “polynomial time” only in a restricted sense. More precisely, complexity bounds for SDP, obtained by the ellipsoid or interior point methods, are only polynomial in the log of certain metric estimates whose bit-size can be doubly exponential in the size of the input 

[dKV16]. It is unknown whether the SDP feasibility problem belongs to NP.

Semidefinite feasibility problems over the nonarchimedean valued field of Puiseux series have been studied in [AGS18]. It is shown there that, under a genericity condition, these problems are equivalent to solving stochastic mean payoff games with perfect information and finite state and action spaces. Stochastic mean payoff games have an unsettled complexity: they belong to but no polynomial time algorithm is currently known [Con92, AM09]

. However, several practically efficient algorithms to solve stochastic mean payoff games have been developed. In this way, one can solve nonarchimedean semidefinite instances of a scale probably unreachable by interior point methods. For instance, the benchmarks presented in 

[AGS18] show that random instances of these problems with as many as

variables could be solved by value iteration in a few seconds. Hard instances are experimentally concentrated in a small “phase transition” region of the parameter space.

1.2. Main results

In order to explain why value iteration is so efficient on many nonarchimedean SDP instances, we introduce here a notion of condition number for stochastic mean payoff games. Essentially, for a feasible instance, the condition number is the inverse of the distance of the data to an infeasible instance, and vice versa. We show that this condition number coincides with the absolute value of the mean payoff. We establish a universal bound for the time of convergence of value iteration, involving the condition number and an auxiliary metric estimate, the distance of point

to the set of “bias vectors” (

Theorem 18). Then, we effectively bound the condition number and the latter distance, for stochastic mean payoff games with perfect information (Theorems 21 and 20). We arrive, in particular, at a bound that becomes pseudopolynomial when the number of “random” positions of the game is fixed.

To arrive at these results, we develop a metric geometry approach of the condition number. We use Hilbert’s projective metric, which arises in Perron–Frobenius theory [Nus88]. The same metric, up to a logarithmic change of variable, arises in tropical geometry [CGQ04, AGG12]. We also prove duality results for stochastic mean payoff games, showing, essentially, that the condition number of the primal and dual problems coincide. In summary, our main results show that the complexity of value iteration is governed by metric geometry properties: this leads to a general method to derive complexity bounds, which can be applied to various classes of Shapley operators.

1.3. Related works

When specialized to stochastic mean payoff games with perfect information, our bounds should be compared with the one of Boros, Elbassioni, Gurvich, and Makino [BEGM15]. The authors of [BEGM15] generalize the “pumping” algorithm, developed for deterministic games by Gurvich, Karzanov, and Khachiyan [GKK88], to the case of stochastic games. The resulting algorithm is also pseudopolynomial if the number of random positions is fixed. The algorithm of Ibsen-Jensen and Miltersen [IJM12] yields a stronger bound in the case of simple stochastic games, still assuming that the number of random positions is fixed.

The duality results in Section 4.1 extend to stochastic games some duality results for deterministic games by Grigoriev and Podolskii [GP15]. In contrast, our approach builds on [AGG12], deriving duality results from a minimax Collatz–Wielandt type theorem of Nussbaum [Nus86]. Other duality results, by Bodirsky and Mamino, in the context of satisfiability problems, have appeared in [BM16].

1.4. Organization of the paper

Earlier results on the relation between nonarchimedean semidefinite programming and stochastic mean payoff games are presented in Section 2, leading to the introduction of the notion of condition number. Some background on nonlinear Perron–Frobenius theory is presented in Section 3. The new results are included in Section 4, in which we characterize the condition number, and in Section 5, in which we derive complexity estimates for value iteration in terms of the condition number. This is a preliminary announcement of the results. The proofs will appear in a subsequent version.

2. Motivation: the correspondence between nonarchimedean semidefinite programming and stochastic mean payoff games

In this section, we summarize some of the main results of [AGS18], which motivate the present work. Throughout this paper, given , we denote the set by .

2.1. Nonarchimedean semidefinite programs

We start by introducing semidefinite programming over nonarchimedean fields. More specifically, the model of nonarchimedean field used in this paper is the field of (absolutely convergent generalized real) Puiseux series, which are series in the parameter of the form

(1)

where (i) is a strictly decreasing sequence of real numbers that is either finite or unbounded, (ii) for all , (iii) and the series Eq. 1 is absolutely convergent for sufficiently large. There is also a special, empty series, which is denoted by . The field is ordered, with a total order defined by . In addition, it is known that is a real closed field [vdDS98]. Actually, our approach applies to other nonarchimedean fields with a real value group [AGS16], but it is helpful to have a concrete field in mind, like . Henceforth, we denote by the set of nonnegative series, i.e., the series that satisfy or .

Given symmetric matrices , we define the associated spectrahedron (over Puiseux series) as the set

(2)

where “PSD” stands for positive semidefinite. (We point out that the definition of positive semidefinite matrices makes sense over any real closed field.) The problem which we are interested in is to determine whether a spectrahedron over Puiseux series is empty or not. This corresponds to the analog of the semidefinite feasibility problem over the field . This problem is also related to the standard semidefinite feasibility problem over the field of real numbers associated with the spectrahedra

(3)

for large enough. Here, stands for the real symmetric matrix obtained by evaluating the entries of at the value . The relation between the problem over Puiseux series and the one over real numbers is described in the following lemma, and is a consequence of quantifier elimination over real closed fields:

Lemma 1.

The spectrahedron Eq. 2 over the field is empty if and only if, for sufficiently large, the spectrahedron Eq. 3 over is empty.

In this paper, we consider a slightly different problem which already retains much of the difficulty of the semidefinite feasibility problem over the field : given symmetric matrices , determine whether the following spectrahedral cone

(4)

is trivial, meaning that it is reduced to the zero point. We refer to [AGS18] for further details on the relation between the original semidefinite feasibility problems and the problems above for spectrahedral cones.

2.2. Valuation map and tropical semifield

As a nonarchimedean field, is equipped with a valuation map defined by for as in Eq. 1, and . This valuation map has the following properties:

(5)
(6)

We point out that equality holds in Eq. 5 as soon as the leading terms of and do not cancel. In particular, this condition is satisfied when .

The tropical (or max-plus) semifield can be though of as the image of by the valuation map. More precisely, this semifield is defined as the set endowed with the addition and the multiplication . The term “semifield” refers to the fact that the addition does not have an opposite law. The reader may consult [BCOQ92, But10, MS15] for more information on the tropical semifield.

The operations above are extended in the usual way to matrices with entries in . The resulting matrix product is also denoted by . Henceforth, for any and , we denote by the vector of with entries . Finally, we denote by the neutral element for addition in (i.e., ), as well as any vector that has all components equal to .

We consider equipped with the topology defined by the distance , and equipped with the product topology. On we also use Hilbert’s seminorm [CGQ04], defined by , where and . This seminorm induces a norm on the quotient space of by the tropical parallelism relation, which is defined by: if, and only if, there exists such that . We denote by the Hilbert ball of center and radius , i.e., . We also endow with the standard order , which is extended to vectors entrywise.

Another algebraic structure that we will use in this paper is the completed min-plus semiring , which is the set equipped with as addition and as multiplication (with the convention ). The corresponding matrix product for matrices with entries in will be denoted by . Given , the operator is defined by:

where denotes the transpose of . The operator will be called the adjoint of , being an adjoint in a categorical sense as it satisfies the following property:

(7)

for any and .

2.3. Stochastic zero-sum games with mean payoff

A stochastic mean payoff game can be specified by two matrices and

, and a row-stochastic matrix

, where . The rules of the game are as follows. Two players, called Max and Min, control disjoint sets of states, respectively indexed by and , and alternatively move a pawn over these states. When the pawn is located on a state of Player Min, she selects a state such that , moves the pawn to state and pays to Player Max the amount . When the pawn is on a state of Player Max, he selects a state such that , moves the pawn to state and receives from Player Min the payment . Finally, at state the pawn is moved by nature to state with probability .

We shall make the following finiteness assumption, which assures that players Max and Min have at least one move available in each state.

Assumption 1.

Every row of has at least one finite entry, and the same is true for every column of .

A (positional) strategy for Player Min is a function such that for all . Similarly, a (positional) strategy for Player Max is a function such that for all . If Min and Max play according to the strategies and , and start from state

, the movement of the pawn is described by a Markov chain over the disjoint union

. Then, the payoff (of Player Max) is defined as the average payoff

where refers to the expectation over the trajectories , with respect to the probability measure determined by these strategies. The objective of Players Min and Max is to find a strategy which respectively minimizes and maximizes the payoff. Liggett and Lippman [LL69] showed that there exists a pair of optimal strategies , which satisfies

for every initial state and pair of Min/Max strategies . In this case, the quantity is referred to as the value of the game when starting from state . The state is said to be winning (for Player Max) when the associated value is nonnegative. It is said to be strictly winning for the same player if the associated value is positive. A dual terminology applies to Player Min.

With any such a game is associated a Shapley operator, which is the map defined by

(8)

i.e., , where denotes the usual matrix-vector product of and . The finiteness assumption (creftype 1) on the entries of the matrices imply that preserves both and . It is convenient to consider the vector , for , where denotes the th iterate of . The th entry represents the value of the game in finite horizon with initial state , associated with the same data. The vector

is known as the escape rate vector of . We shall recall in Section 3 why this escape rate exists. It is known that

(9)

i.e., the value of the mean payoff game coincides with the limit of the mean value per time unit of the finite horizon game, as the horizon tends to infinity. In this way, solving a mean payoff games reduces to a dynamical systems issue: computing the escape rate vector of a Shapley operator.

Mean payoff games can be defined in different guises: this only changes the explicit form of the Shapley operator, without impact on the complexity of the problem, as shown by the following remark.

Remark 2.

Here, we assumed that Players Min, Max, and nature, play successively, in this order. Starting with Player Max, instead of Min, while keeping the same circular order, would result in replacing the Shapley operator by its cyclic conjugate

(10)

defined on . If the order was changed in a non cyclic way, nature playing for instance after Min and before Max, then, the original Shapley operator would be replaced by:

(11)

for some matrices , , and . It is also convenient to consider the effect of Players Max and Min swapping their roles in the original game. This would amount to replacing by:

(12)

recalling that denotes the transposition. Observe that . Moreover, if can be factored as , and being any compositions of maps of the form , , or , it can be shown that where is the recession function of , whose evaluation is straightforward. Hence, one can recover the escape rate vector of from the escape rate vector of its cyclic conjugate , and vice versa. Therefore, for an operator given in any of the forms Eqs. 12, 11, 10 and 8, the complexity of computing the escape rate is independent of the choice of the special form.

2.4. Zero-sum games associated with nonarchimedean semidefinite programs

The correspondence between semidefinite feasibility problems for spectrahedral cones and stochastic mean payoff games is given in the next theorem:

Theorem 3.

With every spectrahedral cone of the form Eq. 4 is associated a stochastic mean payoff game that satisfies the following property: if the valuation of the entries of the matrices are chosen in a generic way, then is nontrivial if and only if at least one state in the associated game is winning.

This correspondence is established in [AGS18] by considering the following problem:

where is the Shapley operator of the game associated with the spectrahedral cone . This problem is said to be feasible when it admits a solution, and infeasible otherwise. We point out that is feasible if, and only if, the associated stochastic mean payoff game has a winning state. Equivalently, this amounts to the fact that the set

(13)

is nontrivial, meaning that it is not reduced to the point .

The correspondence between nonarchimedean semidefinite programming and stochastic games is simpler to present if we assume that the matrices are (negated) Metzler matrices, which means that their off-diagonal entries are nonpositive. In this case, if the genericity assumption of Theorem 3 is satisfied, is precisely the image under the valuation map of . Similarly, we can consider the problem:

where stands for the fact that for all . This problem is feasible if, and only if, the set is strictly nontrivial, meaning that there exists such that . This corresponds to the property where every state of the game has a positive value.

The Shapley operator and the associated feasibility problems and provide further conditions under which game algorithms are directly applicable to solve nonarchimedean feasibility problems, disregarding the genericity conditions of Theorem 3.

Theorem 4.

For any Metzler matrices , we have:

  1. if is infeasible, or equivalently, is trivial, then is trivial.

  2. if is feasible, or equivalently, is strictly nontrivial, then is strictly nontrivial, meaning that there exists such that the matrix is positive definite.

Following the analogy with the classical condition number in linear programming (see, e.g.,

[Ren95]), we are interested in finding a numerical quantity measuring (the inverse of) the distance to triviality when the instance is nontrivial or to nontriviality when it is trivial. In more details, we define the condition number of the above problem by:

(14)

if is feasible, and

(15)

if is infeasible (with the convention ). Here, stands for the map , where the addition is understood entrywise, and stands for the sup-norm, i.e., . The condition number of the problem is defined in the same way as in Eq. 14 and Eq. 15 but replacing by .

Remark 5.

Looking for additive perturbations of the form is a canonical approach; such perturbations have been already used to reveal the ergodicity properties of the game [AGH15]. This is also the finite dimensional analogue of perturbing the Hamiltonian of a Hamilton–Jacobi PDE by adding a potential [FR13].

Remark 6.

In [AGS18], the Shapley operator associated with a nonarchimedean SDP feasibility problem is written as instead of . As there are reductions between the games corresponding to both forms (as discussed in Remark 2), we consider here a Shapley operator in the latter form. This is more suitable to state the complexity estimates in Section 5.

3. Preliminary results of nonlinear Perron–Frobenius theory

In this section, we recall some elements of nonlinear Perron–Frobenius theory which will be used to study the condition numbers introduced above. To do so, we next axiomatize essential properties of the Shapley operators considered in Section 2.3, following the “operator approach” of stochastic games [RS01, Ney03].

A self-map of is said to be order-preserving when

and additively homogeneous when

We point out that any order-preserving and additively homogeneous self-map of that preserves is nonexpansive in the sup-norm, meaning that

Given an order-preserving and additively homogeneous self-map of , the vectors satisfying can be thought of as the nonlinear analogues of subharmonic functions. A central role in determining the existence of such vectors is played by the limit , for . When this limit exists, it can be shown to be independent of the choice of , and so it coincides with the escape rate vector of . The following theorem of Kohlberg implies that the limit does exist when preserves and its restriction to is piecewise affine (meaning that can be covered by finitely many polyhedra such that restricted to any of them is affine).

Theorem 7.

[Koh80] A piecewise affine self-map of that is nonexpansive in any norm admits an invariant half-line, meaning that there exist such that

for any large enough. In particular, the escape rate vector exists, and is given by the vector .

Kohlberg’s theorem applies to Shapley operators of stochastic mean payoff games with finite state and action spaces and perfect information. Indeed, the Shapley operator Eq. 8 of the game described in Section 2.3 is order-preserving and additively homogeneous, and its restriction to is piecewise affine.

For a general order-preserving and additively homogeneous self-map of , the escape rate vector may not exist. We can still, however, recover information about the sequences through the Collatz–Wielandt numbers of . Assuming that is a continuous, order-preserving, and additively homogeneous self-map of , we define the upper Collatz–Wielandt number of by:

(16)

and the lower Collatz–Wielandt number of by:

(17)

A relation between the escape rate vector and the upper Collatz–Wielandt number is given in the next theorem, which is derived in [AGG12] from a minimax result of Nussbaum [Nus86].

Theorem 8.

[AGG12, Lemma 2.8] Let be a continuous, order-preserving, and additively homogeneous self-map of . Then,

for any .

It is known that an order-preserving and additively homogeneous self-map of admits a unique continuous extension to , see [BNS03]. Then, as noted in [AGG12, Remark 2.10], the previous result can be dualized when preserves .

Corollary 9.

Let be a continuous, order-preserving, and additively homogeneous self-map of that preserves . Then,

for any .

As a consequence, when the escape rate vector exists, we simply have

Specializing this to the case where is the Shapley operator of a game, the quantities and respectively correspond to the greatest and smallest values of the states for the mean payoff problem.

In the sequel, we will consider especially the situation in which there is a vector and a scalar such that

(18)

The scalar , which is unique, is known as the ergodic constant, and Eq. 18 is referred to as the ergodic equation. We will denote this scalar by as it is a nonlinear extension of the spectral radius. The vector is known as a bias, or a potential. It is easily seen that if admits such a bias vector, then

and the condition that means that the game is winning for every initial state. The existence of a bias vector is guaranteed by certain “ergodicity” assumptions [AGH15].

4. Metric geometry properties of condition numbers

4.1. Condition numbers vs Collatz–Wielandt numbers, and duality

We point out that the definitions given in Section 2.4 of the condition numbers and can be generalized to any continuous, order-preserving, and additively homogeneous self-map of . The next proposition provides a characterization of these condition numbers in terms of the Collatz–Wielandt numbers of .

Proposition 10.

Let be a continuous, order-preserving, and additively homogeneous self-map of . Then,

We define the dual of the mean payoff game of Section 2.4 as the one whose Shapley operator is . The following theorem will allow us to relate with and .

Theorem 11 (Duality theorem).

Let and , where and satisfy creftype 1, has at least one finite entry per row, and is a row-stochastic matrix. Then,

As a consequence of Theorem 11, we obtain:

Corollary 12.

Let and , where and satisfy creftype 1, has at least one finite entry per row, and is a row-stochastic matrix. Then,

  1. The condition number of coincides with the condition number of .

  2. Either is feasible or is feasible.

  3. Only one of the problems and can be feasible.

4.2. A geometric characterization of condition numbers

In this section, we study the inner radius of the feasible sets of games, that is, given the Shapley operator of a game, we study the maximal radius of a Hilbert ball contained in the set Eq. 13.

We start with the following simple lemma.

Lemma 13.

Let be an order-preserving and additively homogeneous self-map of . Assume and are such that . Then, the Hilbert ball is contained in .

For the condition in the previous lemma to be also necessary for the inclusion to hold, we need an additional assumption on .

Definition 1.

An order-preserving and additively homogeneous self-map of is said to be diagonal free when is independent of for all . In other words, is diagonal free if for all , and for all such that for , we have .

Lemma 14.

When is diagonal free, for any and the Hilbert ball is contained in only if .

If is not diagonal free, the conclusion of Lemma 14 does not necessarily hold, as shown in the next example.

Example 15.

Let us consider the order-preserving and additively homogeneous map , where and . Then, for , it can be verified that . However, we have

and so .

As a consequence of Lemmas 14 and 13, we obtain:

Theorem 16.

Let be a diagonal free self-map of . Then, contains a Hilbert ball of positive radius if and only if . Moreover, when contains a Hilbert ball of positive radius, the supremum of the radii of the Hilbert balls contained in coincides with .

Sergeev established in [Ser07] a characterization of the inner radius of polytropes, which corresponds to the special case of Theorem 16 in which is the Shapley operator of a game with only one player and deterministic transitions.

Remark 17.

The condition in Theorem 16 is not too restrictive. Indeed, it can be shown that in most cases of interest, if the Shapley operator of a mean payoff game is not diagonal free, one can construct another mean payoff game such that its Shapley operator is diagonal free and the inner radius of its feasible set coincides with the one of .

5. Bounding the complexity of value iteration by the condition numbers

In this section, is an order-preserving and additively homogeneous self-map of which preserves . We also assume that admits a bias vector , as in Eq. 18.

5.1. A universal complexity bound for value iteration

The most straightforward idea to solve a mean payoff game is probably value iteration: we infer whether or not the mean payoff game is winning by solving the finite horizon game, for a large enough horizon. This is formalized in Fig. 1.

1:procedure ValueIteration()
2: a Shapley operator from to
3:The algorithm will report whether Player Max or Player Min wins the mean payoff game represented by
4:     
5:     while  and  do At iteration , is the value vector of the game in finite horizon
6:     end while
7:     if  then return “Player Min wins”
8:     else  return “Player Max wins”
9:     end if
10:end procedure
Figure 1. Basic value iteration algorithm.

We next show, in Theorem 18, that this value iteration algorithm terminates and is correct, provided the mean payoff of the game is nonzero (i.e., ), and the operations are performed in exact arithmetic. We shall see in Corollaries 26 and 25 that these two restrictions can be eliminated, at the price of an increase of the complexity bound.

It is convenient to introduce the following metric estimate, which represents the minimal Hilbert’s seminorm of a bias vector

Since is assumed to have a bias vector , we have and . Hence, by Proposition 10,

We shall denote by this common quantity.

Note that has a remarkable interpretation, as the value of an auxiliary game, in which there is an initial stage, at which Player Max can decide either to keep his role or to swap it with the role of Player Min. Then, the two players play the mean payoff game as usual. Swapping roles amounts to replacing by the Shapley operator . Observe also that as noted in Remark 2. Hence, the value of this modified game is precisely .

The following result bounds the complexity of value iteration in terms of and of the condition number .

Theorem 18.

Suppose that the Shapley operator has a bias vector and that the ergodic constant is nonzero. Then, procedure ValueIteration terminates after

iterations and returns the correct answer.

5.2. Bounding the condition number and the bias vector of a stochastic mean payoff game

We next bound the condition number , and the metric estimate , when is a Shapley operator of a stochastic game with perfect information and finite action spaces. As in Section 2.3, we assume that

(19)

where has at least one finite entry per column, has at least one finite entry per row, and is a row-stochastic matrix. To obtain explicit bounds, we will assume that the finite entries of the matrices and are integers, and we set

This is not more special than assuming that the finite entries of and are rational numbers (we may always rescale rational payments so that they become integers). We also assume that the probabilities are rational, and that they have a common denominator , , where for all and .

We say that a state is nondeterministic if there are at least two indices such that and .

The following lemma improves an estimate in [BEGM15].

Lemma 19.

Suppose that a Markov chain with states is irreducible, and that the transition probabilities are rational numbers whose denominators divide an integer . Let denote the number of states with at least possible successors. Let denote the invariant measure of the chain. Then, the least common denominator of the rational numbers is not greater than .

We deduce the following result.

Theorem 20.

Let be a Shapley operator as above, still supposing that has a bias vector and that is nonzero. If is the number of nondeterministic states of the game, then .

To bound , we use the following idea. For , let denote the value of the discounted game associated with , meaning that . Since represents a zero-sum game with perfect information and finite state and action spaces, it is known that has a Laurent series expansion in powers of with a pole of order at most  at , see [Koh80]. We can deduce from this that the limit of as exists and that it is a bias, which we call the Blackwell bias. By working out the limit, we arrive at the following estimate.

Theorem 21.

Let be the Shapley operator in Eq. 19, still supposing that it has a bias vector, and let be its Blackwell bias. Then,

By combining Theorems 20 and 21, we arrive at the following.

Corollary 22.

Let be the Shapley operator in Eq. 19, still supposing that it has a bias vector and that is nonzero. Then, procedure ValueIteration stops after

(20)

iterations and correctly decides which of the two players is winning.

We next show that when specialized to deterministic games, the universal estimate of Theorem 18 gives precisely the complexity bound of Zwick–Paterson [ZP96].

Lemma 23.

Let , where , and suppose that there exists such that . Then

where is defined as in Section 5.2, setting .

For deterministic games with integer payments, the mean payoff is given by the average weight of a circuit, which has length at most . It follows that , unless . Note also that . By applying Theorem 20, we arrive at the following bound for the number of iterations of the algorithm in Fig. 1.

Corollary 24 (Compare with [Zp96]).

Let be the Shapley operator of a deterministic game, where the finite entries of are integers. If there exists such that with , then

The assumption that is used in Theorem 18 can be relaxed, by appealing to the following perturbation and scaling argument. This leads to a bound in which the exponents of and of are increased.

Corollary 25.

Let . Then, procedure ValueIteration, applied to the perturbed and rescaled Shapley operator , satisfies

iterations, and this holds unconditionally. If the algorithm reports that Max wins, then Max is winning in the original mean payoff game. If the algorithm reports that Min wins, then Min is strictly winning in the original mean payoff game.

The algorithm in Fig. 1 can be adapted to work in finite precision arithmetic. Consider the variant of the main body of this algorithm, given in Fig. 2. We assume that each evaluation of the Shapley operator is performed with an error of at most in the sup-norm.

, ,
while  and  do
      ; The operator is evaluated in approximate arithmetic, so that is at most at distance in the sup-norm from its true value.
end while
if  then return “Player Min wins”
end if
if  then return “Player Max wins”
end if
Figure 2. Modification of the basic value iteration algorithm to work in finite precision arithmetic.
Corollary 26.

Let be the Shapley operator in Eq. 19, still supposing that it has a bias vector and that is nonzero. Let . Then, for any , value iteration performed with a numerical precision of at each step (i.e., the algorithm in Fig. 2) stops after

(21)

iterations and correctly decides which of the two players is winning.

Observe that Eq. 21 is the bound Eq. 20 multiplied by .

6. Concluding remarks

We introduced a notion of condition number for stochastic mean payoff games, and bounded the complexity of value iteration in terms of this condition number. Whereas condition numbers are familiar for problems over archimedean fields, this leads to an appropriate notion of condition number for nonarchimedean semidefinite programming. In particular, our present results explain, at least in part, the perhaps surprising benchmarks of [AGS18], revealing that random nonarchimedean semidefinite feasibility instances with generic valuations can be simpler to solve than their archimedean analogues. In some sense, “good conditioning” provides a quantitative version of “genericity,” and most instances in [AGS18] are well conditioned. This raises the issue of evaluating the condition number on random instances. It is also an interesting question to investigate whether the solution of nonarchimedean SDP could be used, in general, to solve archimedean SDP, and vice versa.

Acknowledgement

The second author thanks Vladimir Gurvich for enlightening discussions on the pumping algorithm of [GKK88] and its extension to stochastic games in [BEGM15].

References

  • [AGG12] M. Akian, S. Gaubert, and A. Guterman. Tropical polyhedra are equivalent to mean payoff games. Int. J. Algebra Comput., 22(1):125001 (43 pages), 2012.
  • [AGH15] M. Akian, S. Gaubert, and A. Hochart. Ergodicity conditions for zero-sum games. Discrete Contin. Dyn. Syst., 35(9):3901–3931, 2015.
  • [AGS16] X. Allamigeon, S. Gaubert, and M. Skomra. Tropical spectrahedra. arXiv:1610.06746, 2016.
  • [AGS18] X. Allamigeon, S. Gaubert, and M. Skomra. Solving generic nonarchimedean semidefinite programs using stochastic game algorithms. J. Symb. Comp., 85:25–54, 2018.
  • [AM09] D. Andersson and P. B. Miltersen. The complexity of solving stochastic games on graphs. In Proceedings of the 20th International Symposium on Algorithms and Computation (ISAAC), volume 5878 of Lecture Notes in Comput. Sci., pages 112–121. Springer, 2009.
  • [BCOQ92] F. Baccelli, G. Cohen, G. J. Olsder, and J.-P. Quadrat. Synchronization and Linearity: An Algebra for Discrete Event Systems. Wiley, 1992.
  • [BEGM15] E. Boros, K. Elbassioni, V. Gurvich, and K. Makino. A pseudo-polynomial algorithm for mean payoff stochastic games with perfect information and few random positions. arXiv:1508.03431, 2015.
  • [BM16] M. Bodirsky and M. Mamino. Max-closed semilinear constraint satisfaction. In Proceedings of the 11th International Computer Science Symposium in Russia (CSR), volume 9691 of Lecture Notes in Comput. Sci., pages 88–101. Springer, 2016.
  • [BNS03] A. D. Burbanks, R. D. Nussbaum, and C. T. Sparrow. Extension of order-preserving maps on a cone. Proc. Roy. Soc. Edinburgh Sect. A, 133(1):35–59, 2003.
  • [BPT13] G. Blekherman, P. A. Parrilo, and R. R. Thomas. Semidefinite Optimization and Convex Algebraic Geometry, volume 13 of MOS-SIAM Ser. Optim. SIAM, Philadelphia, PA, 2013.
  • [But10] P. Butkovič. Max-linear Systems: Theory and Algorithms. Springer Monogr. Math. Springer, London, 2010.
  • [CGQ04] G. Cohen, S. Gaubert, and J.-P Quadrat. Duality and separation theorems in idempotent semimodules. Linear Algebra Appl., 379:395–422, 2004.
  • [Con92] A. Condon. The complexity of stochastic games. Inform. and Comput., 96(2):203–224, 1992.
  • [dKV16] E. de Klerk and F. Vallentin. On the Turing model complexity of interior point methods for semidefinite programming. SIAM J. Optim., 26(3):1944–1961, 2016.
  • [FR13] A. Figalli and L. Rifford. Aubry sets, Hamilton–Jacobi equations, and the Mañé conjecture. In

    Geometric analysis, mathematical relativity, and nonlinear partial differential equations

    , volume 599 of Contemp. Math., pages 83–104. AMS, 2013.
  • [GKK88] V. A. Gurvich, A. V. Karzanov, and L. G. Khachiyan. Cyclic games and finding minimax mean cycles in digraphs. Zh. Vychisl. Mat. Mat. Fiz., 28(9):1406–1417, 1988.
  • [GM12] B. Gärtner and J. Matoušek. Approximation Algorithms and Semidefinite Programming. Springer, Heidelberg, 2012.
  • [GP15] D. Grigoriev and V. V. Podolskii. Tropical effective primary and dual Nullstellensätze. In Proceedings of the 32nd International Symposium on Theoretical Aspects of Computer Science (STACS), volume 30 of LIPIcs. Leibniz Int. Proc. Inform., pages 379–391, Wadern, 2015. Schloss Dagstuhl–Leibniz-Zentrum für Informatik.
  • [IJM12] R. Ibsen-Jensen and P. B. Miltersen. Solving simple stochastic games with few coin toss positions. In Algorithms – ESA 2012, volume 7501 of Lecture Notes in Comput. Sci., pages 636–647. Springer, 2012.
  • [Koh80] E. Kohlberg.

    Invariant half-lines of nonexpansive piecewise-linear transformations.

    Math. Oper. Res., 5(3):366–372, 1980.
  • [LL69] T. M. Liggett and S. A. Lippman. Stochastic games with perfect information and time average payoff. SIAM Rev., 11(4):604–607, 1969.
  • [MS15] D. Maclagan and B. Sturmfels. Introduction to Tropical Geometry, volume 161 of Grad. Stud. Math. AMS, Providence, RI, 2015.
  • [Ney03] A. Neyman. Stochastic games and nonexpansive maps. In A. Neyman and S. Sorin, editors, Stochastic Games and Applications, volume 570 of Nato Science Series C, chapter 26, pages 397–415. Kluwer Academic Publishers, 2003.
  • [Nus86] R. D. Nussbaum. Convexity and log convexity for the spectral radius. Linear Algebra Appl., 73:59–122, 1986.
  • [Nus88] R. D. Nussbaum. Hilbert’s projective metric and iterated nonlinear maps. Mem. Amer. Math. Soc., 75(391), 1988.
  • [Ren95] J. Renegar. Incorporating condition measures into the complexity theory of linear programming. SIAM J. Optim., 5(3):506–524, 1995.
  • [RS01] D. Rosenberg and S. Sorin. An operator approach to zero-sum repeated games. Israel J. Math., 121:221–246, 2001.
  • [Ser07] S. Sergeev.

    Max-plus definite matrix closures and their eigenspaces.

    Linear Algebra Appl., 421(2):182–201, 2007.
  • [vdDS98] L. van den Dries and P. Speissegger. The real field with convergent generalized power series. Trans. Amer. Math. Soc., 350(11):4377–4421, 1998.
  • [ZP96] U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theoret. Comput. Sci., 158(1–2):343–359, 1996.