# Symbolic Approximation of Weighted Timed Games

Weighted timed games are zero-sum games played by two players on a timed automaton equipped with weights, where one player wants to minimise the accumulated weight while reaching a target. Weighted timed games are notoriously difficult and quickly undecidable, even when restricted to non-negative weights. For non-negative weights, the largest class that can be analysed has been introduced by Bouyer, Jaziri and Markey in 2015. Though the value problem is undecidable, the authors show how to approximate the value by considering regions with a refined granularity. In this work, we extend this class to incorporate negative weights, allowing one to model energy for instance, and prove that the value can still be approximated, with the same complexity. In addition, we show that a symbolic algorithm, relying on the paradigm of value iteration, can be used as an approximation schema on this class.

## Authors

• 3 publications
• 6 publications
• 7 publications
04/26/2021

### Optimal controller synthesis for timed systems

Weighted timed games are zero-sum games played by two players on a timed...
05/03/2021

### Playing Stochastically in Weighted Timed Games to Emulate Memory

Weighted timed games are two-player zero-sum games played in a timed aut...
04/20/2018

### Metrics that respect the support

In this work we explore the family of metrics determined by S-weights, i...
04/27/2020

### The Adversarial Stackelberg Value in Quantitative Games

In this paper, we study the notion of adversarial Stackelberg value for ...
09/07/2020

### One-Clock Priced Timed Games with Arbitrary Weights

Priced timed games are two-player zero-sum games played on priced timed ...
08/08/2018

### Bounds for the diameter of the weight polytope

A weighted game or a threshold function in general admits different weig...
05/01/2020

### Energy mu-Calculus: Symbolic Fixed-Point Algorithms for omega-Regular Energy Games

ω-regular energy games, which are weighted two-player turn-based games w...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The design of programs verifying some real-time specifications is a notoriously difficult problem, because such programs must take care of delicate timing issues, and are difficult to debug a posteriori. One research direction to ease the design of real-time software is to automatise the process. The situation may be modelled into a timed game, played by a controller and an antagonistic environment: they act, in a turn-based fashion, over a timed automaton [2], namely a finite automaton equipped with real-valued variables, called clocks, evolving with a uniform rate. A simple, yet realistic, objective for the controller is to reach a target location. We are thus looking for a strategy of the controller, that is a recipe dictating how to play so that the target is reached no matter how the environment plays. Reachability timed games are decidable [4], and -complete [19].

Weighted extensions of these games have been considered in order to measure the quality of the winning strategy for the controller [9, 1]: when the controller has several winning strategies in a given reachability timed game, the quantitative version of the game helps choosing a good one with respect to some metrics. This means that the game now takes place over a weighted (or priced) timed automaton [5, 3], where transitions are equipped with weights, and locations with rates of weights (the cost is then proportional to the time spent in this location, with the rate as proportional coefficient). While solving the optimal reachability problem on weighted timed automata has been shown to be -complete [6] (i.e. the same complexity as the non-weighted version), weighted timed games are known to be undecidable [12]. This has led to many restrictions in order to regain decidability, the first and most interesting one being the class of strictly non-Zeno cost with only non-negative weights (in transitions and locations) [9]: this hypothesis requires that every execution of the timed automaton that follows a cycle of the region automaton has a weight far from 0 (in interval , for instance).

Negative weights are crucial when one wants to model energy or other resources that can grow or decrease during the execution of the system to study. In [16], we have recently extended the strictly non-Zeno cost restriction to weighted timed games in the presence of negative weights in transitions and/or locations. We have described there the class of divergent weighted timed games where each execution that follows a cycle of the region automaton has a weight far from 0, i.e. in . We were able to obtain a doubly-exponential-time algorithm to compute the values and almost-optimal strategies, while deciding the divergence of a weighted timed game is -complete. These complexity results match the ones that could be obtained in the non-negative case from [9, 1].

The techniques used to obtain the results of [16] cannot be extended if the conditions are slightly relaxed. For instance, if we add the possibility for an execution of the timed automaton following a cycle of the region automaton to have weight exactly 0, the decision problem is known to be undecidable [10]

, even with non-negative weights only. For this extension, in the presence of non-negative weights only, it has been proposed an approximation schema to compute arbitrarily close estimates of the optimal value

[10]. To this end, the authors consider regions with a refined granularity so as to control the precision of the approximation. In this work, our contribution is two-fold: first, we extend the class considered in [10] to the presence of negative weights; second, we show that the approximation can be obtained using a symbolic computation, based on the paradigm of value iteration.

More precisely, we define the class of almost-divergent weighted timed games where, for each strongly connected component (SCC) of the region automaton, executions following a cycle of this SCC have weights either all in , or all in . In contrast, the divergent condition is equivalent to the same property on the strongly connected components, but without the presence of singleton . Given an almost-divergent weighted timed game, an initial configuration and a threshold , we compute a value that we guarantee to be -close to the optimal value when the play starts from . Moreover, we prove that deciding if a weighted timed game is almost-divergent is a -complete problem.

In order to approximate almost-divergent weighted timed games, we first adapt the approximation schema of [10] to our setting. At the very core of their schema is the notion of kernels that collect all cycles of weight exactly 0 in the game. Then, a semi-unfolding of the game (in which kernels are not unfolded) of bounded depth is shown to be equivalent to the original game. Adapting this schema to negative weights requires to address new issues:

• The definition and the approximation of these kernels is much more intricate in our setting (see Sections 4 and 6). Indeed, with only non-negative weights, a cycle of weight only encounters locations and transitions with weight . It is no longer the case with arbitrary weights, both for discrete weights on transitions (that could alternate between weight and , e.g.) and continuous rates on locations: for this continuous part, this requires to keep track of the real-time dynamics of the game.

• Some configurations may have value . While it is undecidable in general whether a configuration has value , we prove that it is decidable for almost-divergent weighted timed games (see Lemma 5).

• The identification of an adequate bound to define an equivalent semi-unfolding of bounded depth is more difficult in our setting, as having guarantees on weight accumulation is harder (we can lose accumulated weight). We deal with this by evaluating how large the value of a configuration can be, provided it is not infinite. This is presented in Section 5.

We also develop, in Section 7, a more symbolic approximation schema, in the sense that it avoids the a priori refinement of regions. Instead, all computations are performed in a symbolic way using the techniques developed in [1]. This allows to mutualise as much as possible the different computations: comparing these schemas with the evaluation of MDPs or quantitative games like mean-payoff or discounted-payoff, it is the same improvement as when using value iteration techniques instead of techniques based on the unfolding of the model into a finite tree which can contain many times the same location.

## 2 Weighted timed games

##### Clocks, guards and regions

We let be a finite set of variables called clocks. A valuation of clocks is a mapping . For a valuation , and , we define the valuation as , for all , and the valuation as if , and otherwise. The valuation assigns to every clock. A guard on clocks of is a conjunction of atomic constraints of the form , where and (we allow for rational coefficients as we will refine the granularity in the following). Guard is the closed version of a satisfiable guard where every open constraint or is replaced by its closed version or . A valuation satisfies an atomic constraint if . The satisfaction relation is extended to all guards naturally, and denoted by . We let denote the set of guards over .

We rely on the crucial notion of regions, as introduced in the seminal work on timed automata [2]: intuitively, a region is a set of valuations that are all time-abstract bisimilar. We will need some refinement of regions, with respect to a granularity , with . Formally, with respect to the set of clocks and a constant , a -region

is a subset of valuations characterised by the vector

and the order of fractional parts of , given as a partition of clocks: a valuation  is in this -region  if () , for all clocks ; () for all ; () all clocks satisfy that have the same fractional part, for all . We denote by the set of -regions, and we write as a shorthand for . We recover the traditional notion of region for . E.g., the figure on the right depicts regions as well as their refinement . For any integer guard , either all valuations of a given -region satisfy , or none of them do. A -region is said to be a time successor of the -region if there exist , , and such that . Moreover, for , we let be the -region where clocks of are reset.

##### Weighted timed games

A weighted timed game (WTG) is then a tuple where and are finite disjoint subsets of locations belonging to and , respectively, is a finite set of transitions, is the weight function, associating an integer weight with each transition and location, is a subset of target locations for player , and is a function mapping each target location and valuation of the clocks to a final weight of (possibly , , or ). The addition of target weights is not standard, but we will use it in the process of solving those games: anyway, it is possible to simply map each target location to the weight , allowing us to recover the standard definition. Without loss of generality, we suppose the absence of deadlocks except on target locations, i.e. for each location and valuation , there exists such that , and no transitions start in .

The semantics of a WTG is defined in terms of a game played on an infinite transition system whose vertices are configurations of the WTG. A configuration is a pair with a location and a valuation of the clocks. Configurations are split into players according to the location. A configuration is final if its location is a target location of . The alphabet of the transition system is given by and will encode the delay that a player wants to spend in the current location, before firing a certain transition. For every delay , transition and valuation , there is an edge if and . The weight of such an edge is given by . An example is depicted on Figure 1.

A finite play is a finite sequence of consecutive edges . We denote by the length of . The concatenation of two finite plays and , such that ends in the same configuration as starts, is denoted by . We let be the set of all finite plays in , whereas (resp. ) denote the finite plays that end in a configuration of (resp. ). A play is then a maximal sequence of consecutive edges (it is either infinite or it reaches ).

A strategy for (resp. ) is a mapping (resp. ) such that for all finite plays (resp. ) ending in non-target configuration , there exists an edge . A play or finite play conforms to a strategy of (resp. ) if for all such that belongs to (resp. ), we have that . A strategy is memoryless if for all finite plays ending in the same configuration, we have that . For all strategies and of players and , respectively, and for all configurations , we let be the outcome of and , defined as the only play conforming to and and starting in .

The objective of is to reach a target configuration, while minimising the accumulated weight up to the target. Hence, we associate to every finite play its cumulated weight, taking into account both discrete and continuous costs: . Then, the weight of a play , denoted by , is defined by if is infinite (does not reach ), and if it ends in with . Then, for all locations and valuation , we let be the value of in , defined as , where the order of the infimum and supremum does not matter, since WTGs are known to be determined111The determinacy result is stated in [13] for WTG (called priced timed games) with one clock, but the proof does not use the assumption on the number of clocks.. We say that a strategy of is -optimal if, for all , and all strategies of , . It is said optimal if this holds for . A symmetric definition holds for optimal strategies of . If the game is clear from the context, we may drop the index from all previous notations.

As usual in related work [1, 9, 10], we assume that the input WTGs have guards where all constants are integers, and all clocks are bounded, i.e. there is a constant such that every transition of the WTG is equipped with a guard such that implies for all clocks . We denote by (resp. , ) the maximal weight in absolute values of locations (resp. of transitions, edges) of , i.e.  (resp. , ). We also assume that the output weight functions are piecewise linear with a finite number of pieces and are continuous on each region. Notice that the zero output weight function satisfies this property. Moreover, the computations we will perform in the following maintain this property as an invariant, and use it to prove their correctness.

##### Region and corner abstractions

The region automaton, or region game, (abbreviated as when ) of a game is the WTG with locations and all transitions with such that the model of guard (i.e. all valuations such that ) is a -region , time successor of such that satisfies the guard , and . Distribution of locations to players, final locations and weights are taken according to . We call path a finite or infinite sequence of transitions in this automaton, and we denote by the paths. A play  in is projected on a path in , by replacing every edge by the transition , where (resp. ) is the -region containing (resp. ): we say that follows the path . It is important to notice that, even if is a cycle (i.e. starts and ends in the same location of the region game), there may exist plays following it in that are not cycles, due to the fact that regions are sets of valuations. By projecting away the region information of , we simply obtain:

###### Lemma 1

For all , -regions , and , .

On top of regions, we will need the corner-point abstraction techniques introduced in [8]. A valuation  is said to be a corner of a -region , if it belongs to the topological closure  and has coordinates multiple of (). We call corner state a triple  that contains information about a location of the region-game , and a corner of the -region . Every region has at most  corners. We now define the corner-point abstraction  of a WTG  as the WTG obtained as a refinement of where guards on transitions are enforced to stay on one of the corners of the current -region: the locations of are all corner states of , associated to each player accordingly, and transitions are all such that there exists a transition of such that the model of guard is a corner satisfying the guard (recall that is the closed version of ), , and there exist two valuations , such that for some (the latter condition ensures that the transition between corners is not spurious). Because of this closure operation, we must also define properly the final weight function: we simply define it over the only valuation  reachable in location (with ) by (the limit is well defined since is piecewise linear with a finite number of pieces on region ).

The WTG can be seen as a weighted game (with final weights), i.e. a WTG without clocks (which means that there are only weights on transitions), by removing guards, resets and rates of locations, and replacing the weights of transitions by the actual weight of jumping from one corner to another: a transition becomes an edge from to with weight (for all possible values of , which requires to allow for multi-edges222The only case where several edges could link two corners using the same transition is when all clocks are reset in , in which case there is a choice for delay .). Note that delay is necessarily a rational of the form with , since it must relate corners of -regions. In particular, this proves that the cumulated weight of a finite play in is indeed a rational number with denominator .

We will call corner play a play in the corner-point abstraction : it can also be interpreted as a timed execution in where all guards are closed (as explained in the definition above). It straightforwardly projects on a finite path in the region game : in this case, we say again that follows . Figure 2 depicts a play, its projected path in the region game and one of its associated corner plays.

Corner plays allow one to obtain faithful information on the plays that follow the same path:

###### Lemma 2

If  is a finite path in , the set  is an interval bounded by the minimum and the maximum values of the set .

##### Value iteration

We will rely on the value iteration algorithm described in [1] for a WTG .

If represents a value function—i.e. a mapping from configurations of to a value in —we denote by the image , for better readability, and by the function mapping each valuation to . One step of the game is summarised in the following operator mapping each value function to a value function defined by if , and otherwise

 (1)

where ranges over valid edges in . Then, starting from mapping every configuration to , except for the targets mapped to , we let for all . The value function represents the value , which is intuitively what can guarantee when forced to reach the target in at most steps.

More formally, we define the weight of a maximal play at horizon , as if reaches a target state in at most steps, and otherwise. Using this alternative definition of the weight of a play, we can obtain a new game value . Then, if is a tree of depth , if .

The mappings  are piecewise linear for all , and preserves piecewise linearity over regions, so all iterates are piecewise linear with a finite number of pieces. In [1], it is proved that has a number of pieces (and can be computed within a complexity) exponential in  and in the size of when . This result can be extended to handle negative weights in and output weights .

## 3 Results

We consider the value problem that asks, given a WTG , a location and a threshold , to decide whether . In the context of timed games, optimal strategies may not exist. We generally focus on finding -optimal strategies, that guarantee the optimal value, up to a small error . Moreover, when the value problem is undecidable, we also consider the approximation problem that consists, given a precision , in computing an -approximation of .

In the one-player case, computing the optimal value and an -optimal strategy for weighted timed automata is known to be -complete [6]. In the two-player case, the value problem of WTGs (also called priced timed games in the literature) is undecidable with 3 clocks [12, 10], or even 2 clocks in the presence of negative weights [15] (for the existence problem asking if a strategy of player can guarantee a given threshold). To obtain decidability, one possibility is to limit the number of clocks to 1: then, there is an exponential-time algorithm to compute the value as well as -optimal strategies in the presence of non-negative weights only [7, 20, 17], whereas the problem is only known to be -hard. A similar result can be lifted to arbitrary weights, under restrictions on the resets of the clock in cycles [13].

The other possibility to obtain a decidability result [9, 16] is to enforce a semantical property of divergence (originally called strictly non-Zeno cost): it asks that every play following a cycle in the region automaton has weight far from . It allows the authors to prove that playing for only a bounded number of steps is equivalent to the original game, which boils down to the problem of computing the value of a tree-shaped weighted timed game using the value iteration algorithm.

Other objectives, not directly related to optimal reachability, have been considered in [11] for weighted timed games, like mean-payoff and parity objectives. In this work, the authors manage to solve these problems for the so-called class of -robust WTGs that they introduce. This class includes the class we consider, but is decidable in 2-.

In [16], we generalised the strictly non-Zeno cost property of [9, 16] to weighted timed games with both positive and negative weights: we called them divergent weighted timed games. This article relaxes the divergence property, to introduce almost-divergent weighted timed games. We first define formally these classes of games. A cycle  of  is said to be a positive cycle (resp. a 0-cycle, or a negative cycle) if every finite play  following  satisfies (resp. , or ). A strongly connected component (SCC)  of  is said to be positive (resp. negative) if every cycle  is positive (resp. negative). An SCC  of  is said to be non-negative (resp. non-positive) if every play  following a cycle in  satisfies either or (resp. either or ).

###### Definition 1

A WTG  is divergent if every SCC of  is either positive or negative. As a generalisation, a WTG  is almost-divergent when every SCC of  is either non-negative or non-positive.

In [16], we showed that we can decide in the value problem for divergent WTGs. Unfortunately, it is shown in [10] that this problem is undecidable for almost-divergent WTGs (already with non-negative weights only, where almost-divergent WTGs are called simple). They propose a solution to the approximation problem, again with non-negative weights only. Our first result is the following extension of their result:

###### Theorem 3.1

Given an almost-divergent WTG , a location and , we can compute an -approximation of in time doubly-exponential in the size of and polynomial in . Moreover, deciding if a WTG is almost-divergent is -complete.

To obtain this result, we follow an approximation schema that we now outline. First, we will always reason on the region game of the almost-divergent WTG . The goal is to compute an -approximation of for some state , with the region where every clock value is 0. As already recalled, techniques of [1] allow one to compute the (exact) values of a WTG played on a finite tree, using operator . The idea is thus to decompose as much as possible the game in a WTG over a tree. First, we decompose the region game into SCCs (left of Figure 3).

During the approximation process, we must think about the final weight functions as the previously computed approximations of the values of SCCs below the current one. We will keep as an invariant that final weight functions are piecewise linear functions with a finite number of pieces, and are continuous on each region.

For an SCC of and an initial state of provided by the SCC decomposition, we show that the game on the SCC is equivalent to a game on a tree built from a semi-unfolding (see middle of Figure 3) of from of finite depth, with certain nodes of the tree being kernels. These kernels are some parts of that contain all cycles of weight 0. The semi-unfolding is stopped either when reaching a final location, or when some location (or kernel) has been visited for a certain fixed number of times: such locations deep enough are called stop leaves.

Our second result is a more symbolic approximation schema based on the value iteration only. It is more symbolic in the sense that it does not require the SCC decomposition, the computation of kernels nor the semi-unfolding of the game in a tree.

###### Theorem 3.2

Let be an almost-divergent WTG such that for all configurations. Then the sequence converges towards and for every , we can compute an integer such that is an -approximation of for all configurations.

###### Remark 1

In a weighted-timed game, it is easy to detect the set of states with value : these are all the states from which cannot ensure reachability of a target location  with . It can therefore be computed by an attractor computation, and is indeed a property constant on each region. In particular, removing those states from  does not affect the value of any other state and can be done in complexity linear in . We will therefore assume that the considered WTG have no configurations with value .

## 4 Kernels of an almost-divergent WTG

The approximation procedure described before uses the so-called kernels in order to group together all cycles of weight 0. We study those kernels and give a characterisation allowing computability. Contrary to the non-negative case, the situation is more complex in our arbitrary case, since weights of both locations and transitions may differ from in the kernel. Moreover, it is not trivial (and may not be true in a non almost-divergent WTG) to know whether it is sufficient to consider only simple cycles, i.e. cycles without repetitions.

To answer these questions, let us first analyse the cycles of that we will encounter. Since we are in an almost-divergent game, by Lemma 2, all cycles of  (with transitions of ) are either 0-cycles, positive cycles or negative cycles. Additionally, in an SCC  of

, we cannot find both positive and negative cycles by definition. Moreover, we can classify a cycle by looking only at the corner plays following it.

###### Lemma 3

A cycle  is a 0-cycle iff there exists a corner play  following  with .

###### Proof

If is a 0-cycle, every such corner play  will have weight , by Lemma 2. Reciprocally, if such a corner play exists, all corner plays following  have weight : otherwise the set would have non-empty intersection with the set which would contradict the almost-divergence.

An important result is that 0-cycles are stable by rotation. This is not trivial because plays following a cycle can start and end in different valuations, therefore changing the starting state of the cycle could a priori change the plays that follow it and their weights.

###### Lemma 4

Let  and be paths of . Then, is a 0-cycle iff is a 0-cycle.

###### Proof

Since  is a cycle, and , so  is correctly defined.

First, since there are finitely many corners, by constructing a long enough play following an iterate of , we can obtain a corner play that starts and ends in the same corner. Formally, we define two sequences of region corners  and . We start by choosing any . Let  be a corner of  such that  is accessible from  by following . For every , let  be a corner of  such that  is accessible from  by following , and let  be a corner of  such that  is accessible from  by following . We stop the construction at the first  such that there exists  with . Additionally, we let  and . This process is bounded since  has at most  corners.

For every , let  be the weight of a play from  to  along , and let  be the weight of a play from  to  along . The concatenation of the two plays has weight , since it follows the 0-cycle . Therefore, all corner plays from to following  have the same weight , and the same applies for . For every , the concatenation of and is a play from to , of weight , following . Since  is a cycle, and the game is almost-divergent, all possible values of have the same sign.

Finally, we can construct a corner play from to by concatenating the plays . That play has weight . This implies that the terms , of constant sign, are all equal to . As a consequence, the concatenation of and is a corner play following of weight . By Lemma 3, we deduce that  is a 0-cycle.

We will now construct the kernel  as the subgraph of  containing all 0-cycles. Formally, let be the set of transitions of  belonging to a simple 0-cycle, and  be the set of states covered by . We define the kernel  of as the subgraph of  defined by and . Transitions in with starting state in are called the output transitions of . We define it using only simple 0-cycles in order to ensure its computability. However, we now show that this is of no harm, since the kernel contains exactly all the 0-cycles, which will be crucial in the approximation schema we present in Section 6.

###### Proposition 1

A cycle of is entirely in  if and only if it is a 0-cycle.

###### Proof

We prove that every 0-cycle is in  by induction on the length of the cycles. The initialisation contains only cycles of length , that are in  by construction. If we consider a cycle  of length , it is either simple or it can be rotated and decomposed into , and being smaller cycles. Let  be a corner play following . We denote by the prefix of  following  and the suffix following . It holds that , and in an almost-divergent SCC this implies . Therefore, by Lemma 3 both  and  are 0-cycles, and they must be in  by induction hypothesis. Note that this reasoning proves that every cycle contained in a longer 0-cycle is also a 0-cycle.

We now prove that every cycle in is a 0-cycle. By construction, every transition is part of a simple 0-cycle. Thus, to every transition , we can associate a path  such that is a simple 0-cycle (rotate the simple cycle if necessary). We can prove (using both Lemmas 3 and 4) the following property by relying on another pumping argument on corners: If is a path in , then is a 0-cycle of . Now, if is a cycle of in , there exists a cycle  such that is a 0-cycle, therefore is a 0-cycle.

## 5 Semi-unfolding of almost-divergent WTGs

Given an almost-divergent WTG , we describe the construction of its semi-unfolding (as depicted in Figure 3). This crucially relies on the absence of states with value , so we explain how to deal with them first:

###### Lemma 5

In an SCC of , the set of configurations with value is a union of regions computable in time linear in the size of .

###### Proof (Sketch of proof)

If the SCC is non-negative, the cumulated weight cannot decrease along a cycle, thus, the only way to obtain value is to jump in a final state with final weight . We can therefore compute this set of states with an attractor for .

If the SCC is non-positive, we let  (resp. ) be the set of target states where  is bounded (resp. has value ). We also define (resp. ), the set of transitions of whose end state belongs to (resp. ). Notice that the kernel cannot contain target states since they do not have outgoing transitions. We can prove that a configuration has value iff it belongs to a state where player  can ensure the LTL formula on transitions: . The procedure to detect states thus consists of four attractor computations, which can be done in time linear in .

We can now assume that no states of have value , and that the output weight function maps all configurations to . Since is piecewise linear with finitely many pieces, is bounded. Let denote the bound of , ranging over all target configurations.

We now explain how to build the semi-unfolding . We only build the semi-unfolding of an SCC of starting from some state of the region game, since it is then easy to glue all the semi-unfoldings together to get the one of the full game. Since every configuration has finite value, we can prove that values of the game are bounded by . As a consequence, we can find a bound linear in , and such that a play that visits some state outside the kernel more than times has weight strictly above , hence is useless for the value computation. This leads to considering the semi-unfolding of (nodes in the kernel are not unfolded, see Figure 3) such that each node not in the kernel is encountered at most times along a branch: the end of each branch is called a stop leaf of the semi-unfolding. In particular, the depth of is bounded by , and thus is polynomial in , and