 # Polyhedral value iteration for discounted games and energy games

We present a deterministic algorithm, solving discounted games with n nodes in n^O(1)· (2 + √(2))^n-time. For bipartite discounted games our algorithm runs in n^O(1)· 2^n-time. Prior to our work no deterministic algorithm running in time 2^o(nlog n) regardless of the discount factor was known. We call our approach polyhedral value iteration. We rely on a well-known fact that the values of a discounted game can be found from the so-called optimality equations. In the algorithm we consider a polyhedron obtained by relaxing optimality equations. We iterate points on the border of this polyhedron by moving each time along a carefully chosen shift as far as possible. This continues until the current point satisfies optimality equations. Our approach is heavily inspired by a recent algorithm of Dorfman et al. (ICALP 2019) for energy games. For completeness, we present their algorithm in terms of polyhedral value iteration. Our exposition, unlike the original algorithm, does not require edge weights to be integers and works for arbitrary real weights.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We study discounted games, mean payoff games and energy games. All these three kinds of games are played on finite weighted directed graphs between two players called and . Players shift a pebble along the edges of a graph. Nodes of the graph are partitioned into two subsets, one where controls the pebble and the other where controls the pebble. One should also indicate in advance a starting node (a node where the pebble is located initially). By making infinitely many moves players give rise to an infinite sequence of edges of the graph (here is the th edge passed by the pebble). The outcome of the game is a real number determined by a sequence , where is the weight of the edge . We assume that outcome serves as the amount of fine paid by player to player . In other words, the goal of is to maximize the outcome and the goal of is to minimize it.

The outcome is computed differently in discounted, mean payoff and energy games.

• the outcome of a discounted game is

 ∞∑i=1λi−1wi,

where is a fixed in advance real number called discount factor.

• the outcome of a mean payoff game is

 limsupn→∞w1+…+wnn.
• the outcome of an energy game is

 {1the sequence (w1+w2+…+wn),n∈N % is bounded from below,0otherwise,

(we interpret outcome as victory of and outcome as victory of ).

In all these three games every starting node has value, i.e., a real number such that (a) there is a ’s strategy guarantying that the outcome is at least and (b) there is a ’s strategy guarantying that the outcome is at most . Moreover [24, 7, 5] we can always choose and to be positional and independent of the starting node. Positionality means that strategy never makes two different moves in the same node. A property of having such and is often called positional determinacy.

We study algorithmic problems that arise from these games. Namely, the value problem is a problem of finding values of a given game. The decision problem is a problem of comparing the value of a node with a given threshold. Another fundamental problem is to find positional strategies establishing the value of a game.

Motivation. Positionally determined games are of great interest in design of algorithms and computational complexity. Specifically, these games serve as a source of problems that are in NPcoNP but not known to be in P.

Below we survey algorithms for discounted, mean payoff and energy games (including our contribution). Mean payoff and discounted games are also studied in context of dynamic systems . Positionally determined games in general have a broad impact on formal languages and automata theory .

Value problem vs. decision problem. The value problem, as more general one, is at least as hard as the decision problem. On the other hand, the values in discounted and mean payoff games can be obtained from a play of two positional strategies. Hence, the bit-length of values is polynomial in bit-length of weights of edges and (in case of discounted games) bit-length of discount factor. This makes the value problem polynomial-time reducible to the decision problem via binary search. For energy games there is no difference between these two problems at all.

On the other hand, for discounted and mean payoff games the value problem may turn out to be harder for strongly polynomial algorithms. Indeed, in the reduction given above one manipulates directly with binary representations of weights (to identify a range containing values). This is prohibited for strongly polynomial algorithms.

Reductions, structural complexity. It is known that wins in an energy game if and only if the value of the corresponding mean payoff game is non-negative . Hence, energy games are equivalent to decision problem for mean-payoff with threshold . Any other threshold is reducible to threshold by adding to all the weights. So energy games and mean payoff games are polynomial-time equivalent.

Decision problem for discounted games lies in UPcoUP . In turn, mean payoff games are polynomial-time reducible to discounted games . Hence, the same UPcoUP upper bound applies to mean payoff and energy games. None of these problems is known to lie in P.

Algorithms for discounted games. There are two classical approaches to discounted games. In value iteration approach, going back to Shapley 

, one manipulates with a real vector indexed by the nodes of the graph. The vector of values of a discounted game is known to be a fixed point of an explicit contracting operator. By applying this operator repeatedly to an arbitrary initial vector one obtains a sequence converging to the vector of values. Using this, Littman

 gave a deterministic -time algorithm solving the value problem for discounted games. Here is the number of nodes, is the discount factor and is the bit-length of input. This gives a polynomial time algorithm for .

Strategy iteration approach, going back to Howard  (see also ), can be seen as a sophisticated way of iterating positional strategies of players. Hansen et al.  showed that strategy iteration solves the value problem for discounted games in deterministic strongly -time. Unlike Littman’s algorithm, for this algorithm is strongly polynomial.

More recently, interior point methods we applied to discounted games . As of now, however, these methods do not outperform the algorithm of Hansen et al.

For all these algorithms the running time depends on (exponentially in the bit-length of ). As far as we know, no deterministic algorithm with running time regardless of the value of was known. One can get time by simply trying all possible positional strategies of one of the players. Our main result pushes this bound down to . More precisely, we show the following

###### Theorem 1.

The values of a discounted game on a graph with nodes can be found in deterministic strongly -time.

We also obtain a better bound for a special case of discounted games, namely for bipartite discounted games. We call a discounted game bipartite if in the underlying graph each edge is either an edge from a Max’s node to a Min’s node or an edge from a Min’s node to a Max’s node. In other words, in a bipartite discounted game players can only make moves alternatively.

###### Theorem 2.

The values of a bipartite discounted game on a graph with nodes can be found in deterministic strongly -time.

Our algorithm is the fastest known deterministic algorithm for discounted games when . For bipartite discounted games it is the fastest one for . For smaller discounts the algorithm Hansen et al. outperforms ours. One should also mention that their algorithm is applicable to more general stochastic discounted games, while our algorithm is not.

In addition, it is known that randomized algorithms can solve discounted games faster, namely, in time [18, 11, 1]. These algorithms are based on formulating discounted games as an LP-type problem .

Algorithms for mean payoff and energy games. For mean payoff and energy games it is usually assumed that weights of edges are integers, and running time often involves a parameter , the largest absolute weight. In case of rational weights one can simply multiply them by a common denominator.

Zwick and Paterson  gave an algorithm solving the value problem for mean payoff games in pseudopolynomial time, namely, in time (see also ). Brim et al.  improved the polynomial factor before . In turn, Fijalkow et al.  slightly improved the dependence on (from to ).

There are algorithms with running time depending on much better (at the cost that they are exponential in ). Lifshits and Pavlov  gave -time algorithm for energy games (here the running time does not depend at all on ). Recently, Dorfman et al.  pushed down to by giving a -time algorithm for energy games. They also claim (without proof) that factor can be removed. At the cost of an extra factor these algorithms can be lifted to the value problem for mean payoff games.

All these algorithms are deterministic. As for randomized algorithms, the state-of-the-art is -time, the same as for discounted games.

We show that:

###### Theorem 3.

For of an energy game on nodes one can find all the nodes where wins in deterministic strongly -time.

This certifies that for the algorithm of Dorfman et al.  factor can be removed. More importantly, unlike the algorithm of Dorfman et al., our algorithm is strongly -time. I.e., our algorithm can be performed for arbitrary real weights (assuming basic arithmetic operations with them are carried out by an oracle).

The main reason we provide the proof of Theorem 3 is for the sake of exposition. Our result for discounted games is highly inspired by the Dorfman et al. algorithm. So we find it instructive to give Theorem 3 along with Theorem 1. We also believe that our exposition is more transparent for the reasons discussed below.

### 1.1 Our technique

Arguably, our approach arises more naturally for discounted games, yet it roots in the algorithm of Dorfman et al. for mean payoff games.

For discounted games we iterate a real vector with coordinates indexed by the nodes of the graph, until coincides with the vector of values. Thus, our approach can also be called value iteration. However, it differs significantly from the classical value iteration, and we call it polyhedral value iteration.

We rely on a well known fact that the vector of values is a unique solution to so-called optimality equations. Optimality equations is a set of conditions that can be naturally split into two parts. The first part is just a system of linear inequalities over , where each node has some subset of inequalities associated specifically with this node. They express the fact that the players can not improve the value in a node. The second part states that among inequalities associated with a node there is one turning into equality. This part represents the fact that values can be attained.

By throwing away the second part we obtain a polyhedron containing the vector of values. We call this polyhedron optimality polyhedron. Of course, besides the vector of values there are some other points too.

We initialize by finding any point belonging to optimality polyhedron. There is little chance that will satisfy optimality equations. So until it does, we do the following. We compute a shift directed from to the interior of the optimality polyhedron. We move along this shift as far as possible, until the border of optimality polyhedron is reached. This point on the border will be the new value of .

We choose a shift in a very specific way. We consider an auxiliary discrete game which we call discounted normal play game. The graph of the game depends on what inequalities of the optimality polyhedron are tight on . The values of this game determine a shift for . The rules of the game guaranty that such shift does not violate tight inequalities. Hence our shift does not immediately lead us outside the optimality polyhedron.

It turns out that this process converges to the vector of values. Moreover, it does in steps. The complexity analysis is split into two independent parts. First, we indicate some properties of how the underlying discounted normal play games are changing from one point to another in the algorithm. These leads to a definition of an abstract process of iterating discounted normal play games according to certain rules. In the second part of the argument we care only about this abstract process (called below DNP games iteration) and forget about the context of discounted games. We show that DNP games iteration can last only steps.

It turns out that in essentially the same language one can present the algorithm of Dorfman et al. Now we search not the solution to optimality equations but a vector of potentials certifying that one of the players wins in certain nodes. Dorfman et al. build upon a potential lifting algorithm of Brim et al. . Dorfman et al. notice that in the algorithm of Brim et al. a lot of consecutive iterations may turn out to be lifting the same set of nodes. Instead, Dorfman et al. perform all these iterations at once, accelerating the algorithm of Brim et al. We notice that this can be seen as one step of polyhedral value iteration, but now for mean payoff games.

Polyhedron, inside which it all happens, is a limit of optimality polyhedrons as . This resembles a well-known representation of mean payoff games as a limit of discounted games, see, e.g., .

Again, the complexity analysis is carried out by considering DNP games iteration. In case of mean payoff games one can impose stronger restrictions on this abstract process, and this leads to a better bound.

DNP games iteration is implicit in the complexity analysis of Dorfman et al. We believe that “abstractization” makes their argument more transparent. It also might lead to some other applications besides discounted games.

## 2 Preliminaries

### 2.1 Discounted games

To specify a discounted game one has to specify

• a finite directed graph in which every node has at least one out-going edge, i.e., in which every node is not a sink;

• a partition of the set of nodes into two disjoint subsets and ;

• a weight function ;

• a real number called the discount factor.

Discounted games are played between two players called and

. There is a pebble which in each moment of time is located in one of the nodes of

. First, we have to specify a node where the pebble is located initially. After that, at each move of the game the pebble is shifted along some edge of by one of the players. Namely, if currently the pebble is in a node , then player has to move the pebble to some node satisfying . Similarly, if currently the pebble is in a node , then player has to move the pebble to some node satisfying . Since in every node has at least one out-going edge, it is always possible to make a move.

By making infinitely many moves according to the rules above players obtain an infinite path of the graph . If are edges of this path (in the order they are visited), then the outcome of the game is determined by the corresponding sequence of weights:

 w1=w(e1),w2=w(e2),w3=w(e3),…

Namely, player pays to player a fine of size

 ∞∑i=1λi−1wi. (1)

In other words, the goal of is to maximize (1) and the goal of is to minimize (1).

For any discounted game and for any starting node there exists a real number , called the value of in the node , such that:

• there is a Max’s strategy guarantying that (1) is at least ;

• there is a Min’s strategy guarantying that (1) is at most .

Moreover, the values of can be found from the following system of equations called optimality equations:

 xa =maxe=(a,b)∈Ew(e)+λxb,a∈VMax, (2) xa =mine=(a,b)∈Ew(e)+λxb,a∈VMin, (3)

where the system is over a real vector with coordinates indexed by the nodes of the graph. More specifically, (a) there exists exactly one solution to (23) and (b) for any node the value of in coincides with .

This characterization of the values of discounted games goes back to Shapley . Let us sketch Shapley’s argument for reader’s convenience. The fact that (23) has exactly one solution follows from Banach fixed point theorem. Observe that the set of solutions to (23) coincides with the set of fixed points of the following mapping:

 Δ:RV→RV,Δ(x)a=⎧⎪⎨⎪⎩maxe=(a,b)∈Ew(e)+λxba∈VMax,mine=(a,b)∈Ew(e)+λxba∈VMin..

It remains to notice that is -contracting with respect to -norm.

Now, let be the solution to (23). We have to come up with a Max’s strategy and a Min’s strategy proving that the value in the node exists and coincides with . Let be a strategy that from a node moves along an edge on which the maximum in (2) is attained. Similarly, let be a strategy that from a node moves along an edge on which the minimum in (3) is attained. It is not hard to verify that

• if the game starts in and follows , then (1) is at least ;

• if the game starts in and follows , then (1) is at most .

Remarkably, strategies and do not depend on . Moreover, strategies and are positional, i.e., the moves they make depend only on a current node and not on a path to this node. Thus, discounted games belong to a class of positionally determined games .

In this paper we are interested in an algorithmic problem of finding for a given discounted game and for every the value of in . By throwing away the context of discounted games one can simply say that we are interested in finding the solution to (23).

### 2.2 Energy games

Energy games [3, 5] are also played between two players called and . They have the same underlying mechanics as discounted games. Namely, the game takes place on a directed graph (with no sinks) equipped with a partition of into sets and and with a weight function . In the same way players produce an infinite sequence of weights of edges they visit. Now there is no discount factor and no fine paid by to . Instead, depending on the sequence , either or wins. More precisely, player wins if the sequence of partial sums is bounded from below. Player wins otherwise.

Energy games are also positionally determined. More precisely, there is always a Max’s positional strategy and a Min’s positional strategy such that for every starting node either is a Max’s winning strategy or is a Min’s winning strategy. This follows from positional determinacy of more general mean payoff games  and requires more elaborate argument than for discounted games.

It is instructive to provide a characterization of positional winning strategies in energy games in terms of cycles. First, by the weight of a cycle we mean the sum of weights of its edges. We call a cycle positive if its weight is positive. In the same way we define negative cycles, zero cycles and so on. Now, for a Max’s positional strategy let be a graph obtained from by removing edges that start in and are not consistent with strategy . I.e., in each node from has exactly one out-going edge, namely one used by in this node. It is easy to see that is winning for in energy game with starting node if and only if in the graph only non-negative cycles are reachable from .

Similarly, for a Min’s positional strategy one can define the graph where only edges used by are left for nodes in . Then a strategy is winning for in energy game starting in a node if and only if only negative cycles are reachable from in .

In this notation positional determinacy means that there is always a positional ’s strategy and a positional ’s strategy such that for every node either only non-negative cycles are reachable from in or only negative cycles are reachable from in .

We consider an algorithmic problem of finding all the nodes where wins (equivalently, all the nodes where wins).

### 2.3 Bipartite graphs and games

In the paper we use term “bipartite” for directed graphs equipped with a partition of into sets and . Namely, we call a directed graph bipartite if . Next, by bipartite discounted game or bipartite energy game we mean a game played on a bipartite graph.

## 3 nO(1)⋅(2+√2)n-time algorithm for discounted games

In this section we give an algorithm establishing Theorem 1 and 2.

We consider a discounted game on a graph with a weight function and with a partition of between the players given by the sets and . We assume that has nodes and edges.

In Subsection 3.1 we define auxiliary games that we call discounted normal play games. We use these games both in the formulation of the algorithm and in the complexity analysis. In Subsection 3.2 we define so-called optimality polyhedron by relaxing optimality equations (23).

The algorithm is given in Subsection 3.3. In the algorithm we iterate the points of the optimality polyhedron in search of the solution to (23). First we initialize by finding any point belonging to the optimality polyhedron. Then for a current point we define a shift which does not immediately lead us outside the optimality polyhedron. In the definition of the shift we use discounted normal play games. To obtain the next point we move as for as possible along the shift until we reach the border. We do so until the current point satisfies (23). Along the way we also take some measures to prevent the bit-length of the current point of growing super-polynomially.

This process always terminates and, in fact, can take only iterations. Moreover, for bipartite discounted games it can take only steps. A proof of it is deferred to Section 4.

### 3.1 Discounted normal play games.

These games will always be played on directed graphs with the same set of nodes as . Given such a graph , we equip it with the same partition of into and as in . There may be sinks in .

Two players called and move a pebble along the edges of . Player controls the pebble in the nodes from and player controls the pebble in the nodes from . If the pebble reaches a sink of after moves, then the player who can not make a move pays fine of size to his opponent. Here is the discount factor from our discounted game. If the pebble never reaches a sink, i.e., if the play lasts infinitely long, then players pay each other nothing.

By the outcome of the play we mean the income of player . Thus, the outcome is

• positive, if the play ends in a sink from ;

• zero, if the play lasts infinitely long;

• negative, if the play ends in a sink from .

It is not hard to see that in this game players have optimal positional strategies. Moreover, if is the value of this game in the node , then

 δ(s) =−1,if s is a sink from VMax, (4) δ(s) =1,if s is a sink from VMin, (5) δ(a) =λ⋅max(a,b)∈E′δ(b),if % a∈VMax and a is not a sink, (6) δ(a) =λ⋅min(a,b)∈E′δ(b),if % a∈VMin and a is not a sink. (7)

We omit proofs of these facts as below we only require the following

###### Proposition 4.

For any there exists exactly one solution to (47), which can be found in strongly polynomial time.

Before proving Proposition 4 let us note that for graphs with nodes any solution to (56) satisfies . Indeed, if is not a sink, then by (67) the node has an out-going edge leading to a node with . By following these edges we either reach a sink after at most steps (and then for some ) or we go to a loop. For all the nodes on a loop of length we have which means that everywhere on the loop (recall that ). Thus, if we reach such a loop from , we also have .

From this it is also clear that if and only if and is a sink of . Similarly, if and only if and is a sink of .

###### Proof of Proposition 4.

To show the existence of a solution and its uniqueness we employ Banach fixed point theorem. Let be the set of all vectors , satisfying

 f(s)=1 for all sinks s∈VMin,f(t)=−1 for % all sinks t∈VMax.

Define the following mapping :

 ρ(f)(a)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩−1a is a sink from VMax,1a is a sink from VMin,λ⋅max(a,b)∈Ef(b)a∈VMax and a% is not a sink,λ⋅min(a,b)∈Ef(b)a∈VMin and a% is not a sink.

The set of solutions to (47) coincides with the set of such that . It remains to notice that is -contracting with respect to -norm.

Now let us explain how to find the solution to (47) in strongly polynomial time. Let us first determine for every the set . It is clear that coincides with the set of sinks of the graph which lie in . Next, the set can be determined in strongly polynomial time once are given. Indeed, by (67) the set consists of

• all that have an out-going edge leading to ;

• all such that all edges starting at lead to .

Here . In this way we determine all the sets . Similarly way one can determine all the nodes with and also the exact value of in these nodes. All the remaining nodes satisfy . ∎

### 3.2 Optimality polyhedron

By the optimality polyhedron we mean the set of all , satisfying the following inequalities:

 w(e)+λxb−xa ⩽0 for (a,b)∈E,a∈VMax, (8) w(e)+λxb−xa ⩾0 for (a,b)∈E,a∈VMin. (9)

We denote the optimality polyhedron by . Note that the solution to optimality equations (23) belongs to .

We call a vector a valid shift for if for all small enough the vector belongs to . To determine whether a shift is valid for it is enough to look at the edges which are tight for . Namely, we call an edge tight for if , i.e., if the corresponding inequality in (89) becomes an equality on . It is clear that is valid for if and only if

 λδ(b)−δ(a)⩽0 whenever (a,b)∈E,a∈VMax and (a,b) is tight for x, (10) λδ(b)−δ(a)⩾0 whenever (a,b)∈E,a∈VMin and (a,b) is tight for x. (11)

Discounted normal play games can be used to produce for any a valid shift for . Namely, let be the set of edges that are tight for and consider the graph . I.e., is a subgraph of containing only edges that are tight for . An important observation is that is the solution to optimality equations (23) if and only if in there are no sinks.

Define to be the solution to (47) for . Note that is a valid shift for as (67) imply (1011). Not also that as long does not satisfy (23), i.e., as long as the graph has sinks, the vector is not zero.

Let us also define a procedure that we use in our algorithm to control the bit-length of the current point. The input to the procedure is a subset . The output of is a point satisfying . If there is no such , the output of is “not found”. In other words, consider a polyhedron which can be obtained from (89) by turning inequalities corresponding to edges from into equalities. The output of is a point of this polyhedron, if this polyhedron is not empty. In particular, is simply a procedure of finding a point belonging to .

Note that each inequality in (89) contains exactly two variables. Hence (see ), the output of can be computed in strongly polynomial time.

### 3.3 The algorithm

Some remarks:

• we can find in strongly polynomial time by Proposition 4;

• the value of can be found as in the simplex-method. Indeed, is the smallest for which there exists an inequality in (89) which is tight for but not for . Thus, to find it is enough to solve at most linear one-variable equations and compute the minimum over positive solutions to these equations.

• in fact, throughout the algorithm, i.e, we can not move along forever. To show this, it is enough to indicate and an inequality in (89) which is tight for but not for . First, since does not yet satisfy optimality equations (23), there exists a sink of the graph . Assume that , the argument in the case is similar. The graph is sinkless, so there exists an edge . The edge is not tight for (otherwise is not a sink of ). Hence . The left-hand side of the same inequality for looks as follows:

 w(e)+λxb−xs+ε⋅(λδx(b)−δx(s)).

In turn, the node is a sink of from , hence . I.e., the left-hand side of the inequality for the edge increases as increases, so for some positive it will become tight.

• One could consider a version of the Algorithm 1 where we do not use the procedure and simply set . A problem with this version is that it is not clear why the bit-length of the coordinates of is polynomially bounded throughout the algorithm. In turn, if we use the procedure , this problem does not occur. Indeed, we maintain the property that is an output of a strongly polynomial time algorithm on a polynomially bounded input.

## 4 Discounted games: complexity analysis

Let be a sequence of point from that arise in the Algorithm 1. The argument consists of two parts:

• first, we show that the sequence of graph can be obtained in an abstract process that we call discounted normal play games iteration (DNP games iteration for short), see Subsection 4.2;

• second, we show that any sequence of -node graphs that can be obtained in DNP games iteration has length , see Subsection 4.3.

This will establish Theorem 1. To show Theorem 2 note that if is bipartite, then so are and so on. Thus, it is enough to demonstrate that:

• any sequence of bipartite -node graphs that can be obtained in DNP games iteration has length , see Subsection 4.4.

First of all, we have to give a definition of DNP games iteration (Subsection 4.1).

### 4.1 Definition of DNP games iteration

Consider a directed graph and let be the solution to (47) for . We say that the edge is optimal for if . Next, we say that the pair is improving for if one of the following two conditions holds:

• and ;

• and .

Note that an improving pair of nodes can not be an edge of because of (67).

Consider another directed graph over the same set of nodes as . We say that can be obtained from in one step of DNP games iteration if contains all edges of that are optimal for and also at least one pair of nodes which is improving for . I.e., we can erase some non-optimal edges of , and then we can add some edges that are not in , in particular, we should add at least one improving pair.

Finally, we say that a sequence of graph can be obtained in DNP games iterations if for all the graph can be obtained from in one step of DNP games iteration.

### 4.2 Why the sequence Gx0,Gx1,Gx2,… can be obtained in DNP games iteration

Let and be two consecutive points of in the algorithm. We have to show that the graph can be obtained from in one step of DNP games iteration. By definition of the procedure the graph contains all edges of the graph , where . Hence it is enough to show the following:

1. [label=()]

2. all the edges of the graph that are optimal for are also in the graph ;

3. there is an edge of the graph which is an improving pair for the graph .

Proof of (a). Take any edge of the graph which is optimal for . The left-hand side of (89) for the edge on the point looks as follows:

 w(e)+λxb−xa+εmax⋅(λδx(b)−δx(a)). (12)

The last term of (12) is as is an optimal edge of . Since is tight for , it is also tight for , i.e., it also belongs to .

Proof of (b). In fact, any edge of the graph which is not in the graph is an improving pair for . Assume is an edge of but not of . Hence is tight for but not for . I.e., (12) is for , but

• if ;

• if .

This means that if and if . Therefore is an improving pair for .

It only remains to note that there exists an edge of which is not an edge of . Indeed, otherwise all inequalities that are tight for were tight already for . Then could be increased, contradiction.

### 4.3 O(n(2+√2)n) bound on the length of DNP games iteration

The argument has the following structure.

• Step 1. For every directed graph we define two vectors .

• Step 2. We define a linear ordering of vectors from called alternating lexicographic ordering.

• Step 3. We show that in each step of DNP games iteration (a) neither nor decrease and (b) either or increase (in the alternating lexicographic ordering).

• Step 4. We bound the number of values and can take. By step 3 this bound (multiplied by 2) is also a bound on the length of DNP games iteration.

Step 1. The first coordinate of the vector equals the number of nodes with (all such nodes are from ). The other coordinates are divided into consecutive pairs. In the th pair we first have the number of nodes from with , and then the number of nodes from with .

The vector is defined similarly, with the roles of and and and reversed. The first coordinate of equals the number of nodes with (all such nodes are from ). The other coordinates are divided into consecutive pairs. In the th pair we first have the number of nodes from with , and then the number of nodes from with .

Step 2.

Alternating lexicographic ordering is a lexicographic order obtained from the standard ordering of integers in the even coordinates and from the reverse of the standard ordering of integers in the odd coordinates. For example,

 (3,2,3)<(2,3,2),(2,3,1)>(2,2,7),

in the alternating lexicographic order.

Step 3. This step relies on the following

###### Lemma 5.

Assume that a graph can be obtained from a graph in one step of DNP games iteration. Then

1. [label=()]

2. if for some it holds that , then is greater than in the alternating lexicographic order.

3. if for some it holds that , then is greater than in the alternating lexicographic order.

Assume Lemma 5 is proved.

• Why neither nor can decrease? If does not exceed in the alternating lexicographic order, then for every by Lemma 5. On the other hand, and are determined by these sets, so . Similar argument works for and as well.

• Why either or increase? Assume that neither is greater than nor is greater than in alternating lexicographic order. By Lemma 5 we have for every that and . This means that functions and coincide. On the other hand, the graph contains as an edge a pair of nodes which is improving for . Since , this means that this pair is also improving for . Hence this pair can not be an edge of the graph , contradiction.

We now proceed to a proof of Lemma 5. Let us stress that in the proof we do not use the fact that contains an improving pair for . We only use the fact that contains all optimal edges of .

###### Proof of Lemma 5.

We only prove (a), the proof of (b) is similar. Let be the smallest element of for which . First consider the case . We claim that in this case the first coordinate of is smaller than the first coordinate of . Indeed, is the number of sinks from in the graph . In turn, is the number of sinks from in the graph . On the other hand, there all sinks of are also sinks of . Indeed, nodes that are not sinks of have in an out-going optimal edge. All these edges are also in . Hence . The equality is not possible because otherwise , contradiction with the fact that .

Now assume that . Then the sets and are distinct. Hence there are two cases.

• First case: .

• Second case: and .

In both cases the first coordinates of and coincide, because for all . Moreover, in the second case we also have . We claim that in the first case we have and in the second case we have . The rest is devoted a proof of this claim as it clearly implies that exceeds in alternating lexicographic order.

Proving in the first case. Since the sets and are distinct, it is enough to show that . For that we take any with and show that . By (67) there is an edge of the graph with . We also have that , because . On the other hand, since , the edge is optimal for , hence this edge is also in the graph . So in the graph there is an edge from to a node with . Hence by (6) we have . It remains to show why it is impossible that . Indeed, then for some . On the other hand, the node is not in the set . Hence the sets and are distinct, contradiction with the minimality of .

Proving in the second case. Since the sets and are distinct, it is enough to show that . For that we take any with and show that . It is clear that , because otherwise for some we would have that the sets and are distinct ( would belong to the first set and not to the second one). This would give us a contradiction with the minimality of . Thus, it remains to show that . Assume that this is not the case, i.e., . Since , the node is not a sink of (this would mean that ). Hence by (7) there exists an edge in the graph with . Then we also have that , because by minimality of we have and hence . But the edge is optimal for , so the edge is also in the graph . This means that in the graph there is an edge from to a node with