Strategy Representation by Decision Trees with Linear Classifiers

by   Pranav Ashok, et al.

Graph games and Markov decision processes (MDPs) are standard models in reactive synthesis and verification of probabilistic systems with nondeterminism. The class of ω-regular winning conditions; e.g., safety, reachability, liveness, parity conditions; provides a robust and expressive specification formalism for properties that arise in analysis of reactive systems. The resolutions of nondeterminism in games and MDPs are represented as strategies, and we consider succinct representation of such strategies. The decision-tree data structure from machine learning retains the flavor of decisions of strategies and allows entropy-based minimization to obtain succinct trees. However, in contrast to traditional machine-learning problems where small errors are allowed, for winning strategies in graph games and MDPs no error is allowed, and the decision tree must represent the entire strategy. In this work we propose decision trees with linear classifiers for representation of strategies in graph games and MDPs. We have implemented strategy representation using this data structure and we present experimental results for problems on graph games and MDPs, which show that this new data structure presents a much more efficient strategy representation as compared to standard decision trees.


page 1

page 2

page 3

page 4


Strategy Representation by Decision Trees in Reactive Synthesis

Graph games played by two players over finite-state graphs are central i...

SOS: Safe, Optimal and Small Strategies for Hybrid Markov Decision Processes

For hybrid Markov decision processes, UPPAAL Stratego can compute strate...

Efficient Strategy Iteration for Mean Payoff in Markov Decision Processes

Markov decision processes (MDPs) are standard models for probabilistic s...

Combinations of Qualitative Winning for Stochastic Parity Games

We study Markov decision processes and turn-based stochastic games with ...

Stochastic Games with Lexicographic Reachability-Safety Objectives

We study turn-based stochastic zero-sum games with lexicographic prefere...

Learning-Based Synthesis of Safety Controllers

We propose a machine learning framework to synthesize reactive controlle...

Universal Decision Models

Humans are universal decision makers: we reason causally to understand t...

1 Introduction

Graph games and MDPs. Graph games and Markov decision processes (MDPs) are classical models in reactive synthesis. In graph games, there is a finite-state graph, where the vertices are partitioned into states controlled by the two players, namely, player 1 and player 2, respectively. In each round the state changes according to a transition chosen by the player controlling the current state. Thus, the outcome of the game being played for an infinite number of rounds, is an infinite path through the graph, which is called a play. In MDPs, instead of an adversarial player 2, there are probabilistic choices. An objective specifies a subset of plays that are satisfactory. A strategy for a player is a recipe to specify the choice of the transitions for states controlled by the player. In games, given an objective, a winning strategy for a player from a state ensures the objective irrespective of the strategy of the opponent. In MDPs, given an objective, an almost-sure winning strategy

from a state ensures the objective with probability 1.

Reactive synthesis and verification. The above models play a crucial role in various areas of computer science, in particular analysis of reactive systems. In reactive-system analysis, the vertices and edges of a graph represent the states and transitions of a reactive system, and the two players represent controllable versus uncontrollable decisions during the execution of the system. The reactive synthesis problem asks for construction of winning strategies in adversarial environment, and almost-sure winning strategies in probabilistic environment. The reactive synthesis for games has a long history, starting from the work of Church [18, 14] and has been extensively studied [48, 15, 28, 39], with many applications in synthesis of discrete-event and reactive systems [49, 45], modeling [23, 1], refinement [29], verification [21, 4], testing [6], compatibility checking [20], etc. Similarly, MDPs have been extensively used in verification of probabilistic systems [5, 34, 22]. In all the above applications, the objectives are -regular, and the -regular sets of infinite paths provide an important and robust paradigm for reactive-system specifications [38, 50].

Strategy representation. The strategies are the most important objects as they represent the witness to winning/almost-sure winning. The strategies can represent desired controllers in reactive synthesis and protocols, and formally they can be interpreted as a lookup table that specifies for every controlled state of the player the transition to choose. As a data structure to represent strategies, there are some desirable properties, which are as follows: (a) succinctness, i.e., small strategies are desirable, since smaller strategies represent efficient controllers; (b) explanatory, i.e., the representation explains the decisions of the strategies. While one standard data structure representation for strategies is binary decision diagrams (BDDs) [2, 13], recent works have shown that decision trees [46, 40] from machine learning provide an attractive alternative data structure for strategy representation [9, 11]. The two key advantages of decision trees are: (a) Decision trees utilize various predicates to make decisions and thus retain the inherent flavor of the decisions of the strategies; and (b) there are entropy-based algorithmic approaches for decision tree minimization [46, 40]. However, one of the key challenges in using decision trees for strategy representation is that while in traditional machine-learning applications errors are allowed, for winning and almost-sure winning strategies errors are not permitted.

Our contributions. While decision trees are a basic data structure in machine learning, their various extensions have been considered. In particular, they have been extended with linear classifiers [12, 47, 26, 36]. Informally, a linear classifier is a predicate that checks inequality of a linear combination of variables against a constant. In this work, we consider decision trees with linear classifiers for strategy representation in graph games and MDPs, which has not been considered before. First, for representing strategies where no errors are permitted, we present a method to avoid errors both in decision trees as well as in linear classification. Second, we present a new method (that is not entropy-based) for choosing predicates in the decision trees, which further improves the succinctness of decisions trees with linear classifiers. We have implemented our approach, and applied it to examples of reactive synthesis from SYNTCOMP benchmarks [31], model-checking examples from PRISM benchmarks [35], and synthesis of randomly generated LTL formulae [44]. Our experimental results show significant improvement in succinctness of strategy representation with the new data structure as compared to standard decision trees.

2 Stochastic Graph Games and Strategies

2.1 Informal description

Stochastic graph games.

We denote the set of probability distributions over a finite set

as . A stochastic graph game is a tuple , where:

  1. and is a finite set of states for player 1 and player 2, respectively, and denotes the set of all states;

  2. and is a finite set of actions for player 1 and player 2, respectively, and denotes the set of all actions; and

  3. is a transition function that given a player 1 state and a player 1 action, or a player 2 state and a player 2 action, gives the probability distribution over the successor states.

We consider two special cases of stochastic graph games, namely:

  1. graph games, where for each in the domain of , for some .

  2. Markov decision processes (MDPs), where and .

We consider stochastic graph games with several classical objectives, namely, safety (resp. its dual reachability), Büchi (resp. its dual co-Büchi), and parity objectives.

Stochastic graph games with variables. Consider a finite subset of natural numbers , and a finite set of variables over , partitioned into state-variables and action-variables ( denotes a disjoint union). A valuation is a function that assigns values from to the variables. Let (resp., ) denote the set of all valuations to the state-variables (resp., the action-variables). We associate a stochastic graph game with a set of variables , such that (i) each state is associated with a unique valuation , and (ii) each action is associated with a unique valuation .







Figure 1: A reactive system with two request channels.
Example 1

Consider a simple system that receives requests for two different channels A and B. The requests become pending and at a later point a response handles a request for the respective channel. A controller must ensure that (i) the request-pending queues do not overflow (their sizes are 2 and 3 for channels A and B, respectively), and that (ii) no response is issued for a channel without a pending request. The system can be modeled by the graph game depicted in Fig. 1. The states of player 1 (controller issuing responses) are labeled with valuations of state-variables capturing the number of pending requests for channel A and B, respectively. For brevity of presentation, the action labels (corresponding to valuations of a single action-variable) are shown only outgoing from one state, with a straightforward generalization for all other states of player 1. Further, for clarity of presentation, the labels of states and actions for player 2 (environment issuing requests, with filled blue-colored states and actions) are omitted. The controller must ensure the safety objective of avoiding the four error states.

Strategy representation. The algorithmic problem treated in this work considers representation of memoryless almost-sure winning strategies for stochastic graph games with variables. Given a stochastic graph game and an objective, a memoryless strategy for player is a function that resolves the nondeterminism for player by choosing the next action based on the currently visited state. Further, a strategy is almost-sure winning if it ensures the given objective irrespective of the strategy of the other player. In synthesis and verification of reactive systems, the problems often reduce to computation of memoryless almost-sure winning strategies for stochastic graph games, where the state space and action space is represented by a set of variables. In practice, such problems arise from various sources, e.g., AIGER specifications [30], LTL synthesis [44], PRISM model checking [34].

2.2 Detailed description

Plays. Given a stochastic graph game , a play is an infinite sequence of state-action pairs such that for all we have that for some , and . We denote by the set of all plays in .

Objectives. An objective for a stochastic graph game is a Borel set . We consider the following objectives:

  1. Reachability and safety. Given a set of target states, the reachability objective requires that a state in is eventually visited. Formally, . The dual of reachability objectives are safety objectives, where a set of safe states is given, and the safety objective requires that only states in are visited. Formally, .

  2. Parity. For an infinite play we denote by the set of states that occur infinitely often in . Let be a given priority function. The parity objective

    requires that the minimum of the priorities of the states visited infinitely often is even. The dual of the parity objective requires that the minimum of the priorities visited infinitely often is odd. For a special case of priority functions

    , the corresponding parity objective (resp., its dual) is called Büchi (resp., co-Büchi).

Memoryless strategies. Given a stochastic graph game , a strategy is a recipe for a player how to choose actions to extend finite prefixes of plays. Specifically, a memoryless strategy is a strategy where the player performs each choice based solely on the currently visited state. Formally, a memoryless strategy for player 1 is a function that given a currently visited state chooses the next action. Analogously, a memoryless strategy for player 2 is a function . We denote by and the sets of all memoryless strategies for player 1 and player 2 in , respectively. Given two strategies , , and a starting state , they induce a unique probability measure over the Borel sets of . In the special case of graph games, the two strategies and the starting state induce a unique play such that and for all , and for some . The strategies we consider in this work are all memoryless strategies.

Winning and almost-sure winning strategies. Given a stochastic graph game and an objective , an almost-sure winning strategy from state is a strategy such that for all strategies we have . A fundamental result for stochastic graph games with parity (resp., safety/reachability) objectives shows that (i) there is a memoryless almost-sure winning strategy if and only if there is a general (i.e., utilizing the past and nondeterminism) almost-sure winning strategy, and (ii) a memoryless almost-sure winning strategy satisfies the objective with probability 1 even against general strategies of the opposing player [16]. In the special case of graph games, an almost-sure winning strategy ensures for all that , and is referred to as winning strategy.

Reactive synthesis and strategies. In the analysis of reactive systems, most properties that arise in practice are -regular objectives, which capture important desirable properties, such as safety, liveness, fairness. The class of -regular objectives is expressible by the linear-time temporal logic (LTL) framework. The problem of synthesis from LTL specifications has received huge attention [19], and the LTL synthesis problem can be reduced to solving graph games with parity objectives. Moreover, given a model and a specification, the fundamental model checking problem asks to produce a witness that the model satisfies the specification. In model checking of probabilistic systems, the witness for a property is a policy that ensures the property almost-surely. In such settings, it is natural to consider graph games and MDPs where the state space and action space is represented by a set of variables.

3 Decision Trees and Decision Tree Learning

Here we recall decision trees (DT), representing strategies by DT, and learning DT.

Figure 2: A decision tree for .

Decision tree (DT) over is a tuple where is a finite rooted binary (ordered) tree, assigns to every inner node an (in)equality predicate comparing arithmetical expressions over variables , and assigns to every leaf a value or . The language

of the tree is defined as follows. For a vector

, we find a path from the root to a leaf such that for each inner node on the path, (i.e., the predicate is satisfied with valuation ) iff the first child of is on . Denote the leaf on this particular path by . Then is in the language of iff . Intuitively, captures the set of vectors accepted by the tree , i.e., vectors with accepting path in the tree (ending with ). An example is illustrated in Fig. 2 with the first children connected with unbroken arrows and the second children with dashed ones.

The (usually finite) set of predicates in the co-domain of is denoted by . In the example above are comparisons of variables to constants.

Representing strategies by DT has been introduced in [9]. The dimension of data points here is . The data points are natural tuples representing state-action pairs, thus we also write them as . The strategy induced by a decision tree allows to play in iff .

A given input strategy for player defines the sets (i) , (ii) , and (iii) ( denotes a disjoint union). Further, given a subset , we define as (i) if , and (ii) otherwise. When strategies need to be represented exactly, as in the case of games, the trees have to classify all decisions correctly [11]. This in turn causes difficulties not faced in standard DT learning [40], as described below.

Example 2

Consider the reactive system and the corresponding game described in Example 1. Consider a strategy for the controller (player 1) in this system that (i) waits in state , (ii) issues a response for channel B when there are more pending requests for channel B than pending requests for channel A, and (iii) issues a response for channel A in all other cases. Then, the strategy induces:
, and . The task is to represent exactly, i.e., to accept all examples and reject all examples.

Learning DT from the set of positive examples and the set of negative examples is described in Algorithm 1. A node with all the data points is gradually split into offsprings until the point where each leaf contains only elements of or only . Note that in the classical DT learning algorithms such as ID3 [46], one can also stop this process earlier to prevent overfitting, which induces smaller trees with a classification error, unacceptable in the strategy representation.

1:Input: partitioned into subsets and .
2:Output: A decision tree such that .
3:/* train on positive set and negative set */
6:while  nonempty do
8:     if  or  then
10:     else
11:          predicate selected by a split procedure
12:         create children and of
13:         ,        
Algorithm 1 Basic decision-tree learning algorithm
1:Input: partitioned into subsets and .
2:Output: A predicate maximizing information gain on .
4:for  do
5:      information gain
6:if  then condition checks if information gain failed
7:     for  do
Algorithm 2 Split procedure – information gain

The choice of the predicate to split a node with is described in Algorithm 2. From the finite set 111The set of considered predicates is typically domain-specific, and finitely restricted in a natural way. In this work, we consider (in)equality predicates that compare values of variables to constants. A natural finite restriction is to consider only constants that appear in the dataset. we pick the one which maximizes information gain (i.e., decrease of entropy [40]). Again, due to the need of fully expanded trees with no error, we need to guarantee that we can split all nodes with mixed data even if none of the predicates provides any information gain in one step. This issue is addressed in [11]

as follows. Whenever no positive information gain can be achieved by any predicate, a predicate is chosen according to a very simple different formula using a heuristic that always returns a positive number. One possible option suggested in

[11] is captured on Line 8.

4 Decision Trees with Linear Classifiers

In this section, we develop an algorithm for constructing decision trees with linear classifiers in the leaf nodes. As we are interested in representation of winning and almost-sure winning strategies, we have to address the challenge of allowing no error in the strategy representation. Thus we consider an algorithm that provably represents a given strategy in its entirety. Furthermore, we present a split procedure for decision-tree algorithms, which aims to propose predicates leading into small trees with linear classifiers.

4.1 Linear classifiers in the leaf nodes

Figure 3: (triangles) and (circles). No horizontal or vertical classifier can separate , but is linearly separable (by a slanted classifier).

During the construction of a decision tree for a given dataset, each node corresponds to a certain subset of the dataset. This subset exactly captures the data points from the dataset that would reach the node starting from the root and progressing based on the predicates visited along the travelled path (as explained in Section 3). Notably, there might be other data points also reaching this node from the root, however, they are not part of the dataset, and thus their outcome on the tree is irrelevant for the correct dataset representation. This insight allows us to propose a decision-tree algorithm with more expressive terminal (i.e., leaf) nodes, and in this work we consider linear classifiers as the leaf nodes.

Given two vectors , their dot product (or scalar product) is defined as . Given a weight vector and a bias term , a linear classifier is defined as

Informally, a linear classifier checks whether a linear combination of vector values is greater than or equal to a constant. Intuitively, we consider strategies as good and bad vectors of natural numbers, and we use linear classifiers to decide for a given vector whether it is good or bad. On a more general level, a linear classifier partitions the space into two half-spaces, and a given vector gets classified based on the half-space it belongs to.

Consider a finite dataset partitioned into subsets and . A linear classifier separates , if for every we have that iff . The corresponding decision problem asks, given a dataset , for existence of a weight vector and bias such that the linear classifier separates . In such a case we say that is linearly separable. Fig. 3

provides an illustration. There are efficient oracles for the decision problem of linear separability, e.g., linear-programming solvers.

action = wait

pendingA + pendingB

action = responseA


pendingA - pendingB

pendingA - pendingB

Figure 4: A decision tree for the system’s controller.
Example 3

We illustrate the idea of representing strategies by decision trees with linear classifiers. Consider the game described in Example 1 and the controller strategy for this game described in Example 2. An example of a decision tree that represents the strategy is displayed in Fig. 4. The input samples with action end in and get classified by the leftmost linear classifier, and the samples with action get classified by the rightmost linear classifier. Finally, the samples with action are rejected if there are no pending requests to channel A, and otherwise they get classified by the bottommost linear classifier. Note that the decision tree accepts each sample from and rejects each sample from , and thus indeed represents the strategy .

1:Input: partitioned into subsets and .
2:Output: A decision tree such that .
3:/* train on positive set and negative set */
6:while  nonempty do
8:     if  or  then
10:     else if  is linearly separable by a classifier  then
12:     else
13:          predicate selected by a split procedure
14:         create children and of
15:         ,        
Algorithm 3 Learning algorithm for decision trees with linear classifiers

We are now ready to describe our algorithm for representing strategies as decision trees with linear classifiers. Algorithm 3 presents the pseudocode. At the beginning, in Line 5 the queue is initiated with the root node and the whole training set . Intuitively, the queue maintains the tree nodes that are to be processed, and in every iteration of the loop (Line 6) one node gets processed. First, in Line 7 the node gets popped together with , which is the subset of that would reach from the root node. If contains only samples from (resp., only samples from ), then becomes a leaf node with (resp., ) as the answer (Line 9). If contains samples from both, but is linearly separable by some classifier, then becomes a leaf node with this classifier (Line 11). Otherwise, becomes an inner node. In Line 13 it gets assigned a predicate by an external split procedure and in Line 14 two children of are created. Finally, in Line 15, is partitioned into the subset that satisfies the chosen predicate of and the subset that does not, and the two children of are pushed into the queue with the two subsets, to be processed in later iterations. Once there are no more nodes to be processed, the final decision tree is returned.

Construction of decision trees with linear classifiers. We present a simple running example that illustrates the key points of Algorithm 3. Fig. 5 captures the flow of construction and Fig. 6 presents the output decision tree.

(a) Step 1: We start with the entire input training set . In Line 8, we check whether is homogeneous, and it is not. Then in Line 10, we check whether is linearly separable, and it is not. Thus the root node becomes an inner node.
(b) Step 2: An external procedure proposes a predicate for the root node (Line 13), and is partitioned based on the predicate. Two children with the corresponding partitions are created (Line 15), they will need to be processed.
(c) Step 3: Further iterations of the main loop (Line 6) process the two created children. One child contains a homogeneous dataset. Hence in the iteration when it is processed, in Line 9 it becomes a pure leaf node.
(d) Step 4: The second child has a dataset which is not homogeneous, but it is linearly separable. Thus in its iteration, in Line 11 it becomes a classifier leaf node. No more nodes are left to be processed and so the algorithm concludes.
Figure 5: Running Algorithm 3 on a sample dataset consisting of circle () and triangle () points. The Decision tree thus obtained is depicted in Fig. 6.
Figure 6: The Decision tree resulting from the above steps.

Correctness. We now prove the correctness of Algorithm 3. In other words, we show that given a strategy in the form of a training set, Algorithm 3 can be used to provably represent the training set (i.e., the strategy) without errors.

Theorem 4.1

Let be a stochastic graph game, and let be a memoryless strategy for player that defines a training set partitioned into and . Consider an arbitrary split procedure that considers only predicates from which produce nonempty sat- and unsat-partitions. Given as input, Algorithm 3 using the split procedure outputs a decision tree such that , which means that for all we have that iff . Thus represents the strategy .


We consider stochastic graph games with variables over a finite domain , thus . Recall that given a decision tree constructed by Algorithm 3, assigns to every inner node a predicate from , and assigns to every leaf either , or , or a linear classifier that classifies elements from into resp. .

Partial correctness. Consider Algorithm 3 with input , and let be the output decision tree. Consider an arbitrary , note that it belongs to . Consider the leaf corresponding to in . There is a unique path for down the tree from its root, induced by the predicates in the inner nodes given by . Thus is well-defined. At some point during the algorithm, was popped from the queue in Line 7, together with a dataset , and note that . Since is a leaf, there are three cases to consider:

  1. . Then , which implies . The assignment happened in Line 9, so (i) the condition in Line 8 was satisfied, and (ii) . Thus , which implies . By the definition of , we have .

  2. . Then , which implies . The assignment happened in Line 9, so (i) the condition in Line 8 was satisfied, and (ii) . Thus , which implies . By the definition of , we have .

  3. . This assignment happened in Line 11. Thus the condition in Line 10 was satisfied, and hence linearly separates . As , we have that iff . This gives that iff .

The desired result follows.

Total correctness. Algorithm 3 uses a split procedure that considers only predicates from which produce nonempty sat- and unsat-partitions. Thus the algorithm maintains the following invariant for every path in starting from the root: For each predicate , there is at most one inner node in the path such that . This invariant is indeed maintained, since any predicate considered the second time in a path inadvertedly produces an empty data partition, and such predicates are not considered by the split procedure that selects predicates for (in Line 13 of Algorithm 3).

From the above we have that the length of any path in starting from the root is at most , i.e., twice the number of variables times the size of the variable domain. We prove that the number of iterations of the loop in Line 6 is finite. The branch from Line 12 happens finitely many times, since it adds two vertices (in Line 14) to the decision tree and we have the bound on the path lengths in . Since only the branch from Line 12 pushes elements into the queue , and each iteration of the loop pops an element from in Line 7, the number of loop iterations (Line 6) is indeed finite. This proves termination, which together with partial correctness proves total correctness.∎

4.2 Splitting criterion for small decision trees with classifiers

During construction of decision trees, the predicates for the inner nodes are chosen based on a supplied metric, which heuristicly attempts to select predicates leading into small trees. The entropy-based information gain is the most prevalent metric to construct decision trees, in machine learning [40, 46] as well as in formal methods [3, 9, 27, 42]. Algorithm 2 presents a split procedure utilizing information gain, supplemented with a stand-in metric proposed in [11].

In this section, we propose a new metric and we develop a split procedure around it. When selecting predicates for the inner nodes, we exploit the knowledge that in the descendants the data will be tested for linear separability. Thus for a given predicate, the metric tries to estimate, roughly speaking, how well-separable the corresponding data partitions are. While the metric is well-studied in machine learning, to the best of our knowledge, the corresponding decision-tree-split procedure is novel, both in machine learning and in formal methods.

Figure 7: True/False Positive/Negative.

True/false positive/negative. Consider a fixed linear classifier , and a sample such that . If , then is a true positive () w.r.t. the classifier , otherwise and thus is a false positive (). Consider a different sample such that . If , then is a true negative (), otherwise and is a false negative (). Fig. 7 summarizes the terminology.

True/false positive rate. Consider a fixed linear classifier and a fixed dataset . We denote by the number of true positives within w.r.t. the classifier . Similarly we denote for false positives. Then, the true positive rate () is defined as , and the false positive rate () is . Intuitively, describes the fraction of good samples that are correctly classified, whereas describes the fraction of bad samples that are misclassified as good.

Area under the curve. Consider a fixed dataset and a fixed weight vector . In what follows we describe a metric that evaluates w.r.t. . First, consider a set of boundaries, which are the dot products of with samples from . Formally, . Further, consider for some . Then, consider the set of linear classifiers that “hit” the boundaries, plus a classifier that rejects all samples. Formally, . Now, the receiver operating characteristic (ROC) is a curve that plots against for the classifiers in . Intuitively, the ROC curve captures, for a fixed set of weights, how changing the bias term affects and of the resulting classifier. Ideally, we want the to increase rapidly when bias is weakened, while the increases as little as possible. We consider the area under the ROC curve (denoted ) as the metric to evaluate the weight vector w.r.t. the dataset . Intuitively, the faster the increases, and the slower the increases, the bigger the area under the ROC curve () will be.

Fig. 8 provides an intuitive illustration of the concept, where the weight vector is fixed as . The classifiers are then shown on the left subfigure, and the corresponding ROC curve (with the shaded area under the curve – ) is shown on the right subfigure. Note that the points in the ROC curve correspond to the classifiers from , and they capture their (,). The extra point corresponds to the classifier that rejects all samples.

(a) Classifiers , iterating over the bias from down to .

False Positive Rate

True Positive Rate

(b) ROC curve and the shaded .
Figure 8: Area under the curve for w.r.t. (triangles) and (circles).
1:Input: partitioned into subsets and .
2:Output: A predicate maximizing area under the and ROC curves.
4:for  do
Algorithm 4 Split procedure – area under the curve ()

Algorithm 4 presents a split procedure that uses as the metric to select predicates. Each considered predicate partitions input into the subset that satisfies the predicate and the subset that does not. Then, in Lines 5 and 6, two weight vectors are obtained by solving the linear least squares problem on the data partitions. This is a classical problem in statistics with a known closed-form solution, and Appendix 0.A provides detailed description of the problem. Finally, the score for the predicate equals the sum of for the two weight vectors with respect to their corresponding data partitions (Line 7). At the end, in Line 8 the predicate with maximum score is selected.

The choice of as the split metric is motivated by heuristicly estimating well-separability of data in the setting of strategy representation. A simpler metric of accuracy (i.e., the fraction of correctly classified samples) may seem as a natural choice for the estimate of well-separability. However, in strategy representation, the data is typically very inbalanced, i.e., the sizes of are typically much smaller than the sizes of . As a result, for all considered predicates the corresponding proposed classifiers focus heavily on the samples and neglect the few samples. Thus all classifiers achieve remarkable accuracy, which gives us little information on the choice of a predicate. This is a well-known insight, as in machine learning, the accuracy metric is notoriously problematic in the case of disproportionate classes. On the other hand, the metric, utilizing the invariance of bias, is able to focus also on the sparse subset, thus providing better estimates on well-separability.

5 Experiments

Throughout our experiments, we consider the following construction algorithms:

For the experimental evaluation of the construction algorithms, we consider multiple sources of problems that arise naturally in reactive synthesis, and reduce to stochastic graph games with Integer variables. These variables provide semantical information about the states (resp., actions) they identify, so a strategy-representation method utilizing predicates over the variables produces naturally interpretable output. Moreover, there is an inherent internal structure in the states and their valuations, which machine-learning algorithms can exploit to produce more succinct representation of strategies.

Given a game and an objective, we use an explicit solver to obtain an almost-sure winning strategy. Then we consider the strategy as a list of played () and non-played () actions for each state, which can be used directly as an input training set (). We evaluate the construction algorithms based on succinctness of representation, which we express as the number of non-pure nodes (i.e., nodes with either a predicate or a linear classifier). Further experimental details are presented in Appendix 0.B.

5.1 Graph games and winning strategies

We consider two sources of problems reducible to strategy representation in graph games, namely, AIGER safety synthesis [30] and LTL synthesis [44].

5.1.1 AIGER – Scheduling of Washing Cycles.

The goal of this problem is to design a centralized controller for a system of washing tanks running in parallel. The system is parametrized by the number of tanks, the time limit to fill a tank with water after a request, the delay after which the tank has to be emptied again, and a number of tanks per one shared water pipe. The controller has to ensure that all requests are satisfied within the specified time limit.

The problem has been introduced in the second year of SYNTCOMP [31], the most important and well-known synthesis competition. The problem is implicitly described in the form of AIGER safety specification [30], which uses circuits with input, output, and latch Boolean variables. This reduces directly to graph games with -valued Integer variables and safety objectives. The state-variables represent for each tank whether it is currently filled, and the current deadline for filling (resp., emptying). The action-variables capture environment requests to fill water tanks, and the controller commands to fill (resp., empty) water tanks. We consider 364 datasets, where the sizes of range from 640 to 1024000, and the sizes of range from 16 to 62.

Figure 9: Scheduling of Washing Cycles.

We illustrate the results in Fig. 9. Both subfigures plot the ratios of sizes for two considered algorithms. Each dot represents a dataset, the -axis captures the ratios, and the two red lines represent equality and order-of-magnitude improvement, respectively. The left figure considers the size ratios of the basic decision-tree algorithm and the algorithm with linear classifiers and entropy-based splits (

). The arithmetic, geometric, and harmonic means of the ratios are

, , and , respectively. The right figure considers the basic algorithm and the algorithm with linear classifiers and auc-based splits (). The arithmetic, geometric, and harmonic means of the ratios are , , and , respectively.

5.1.2 LTL synthesis.

In reactive synthesis, most properties considered in practice are -regular objectives, which can be specified as linear-time temporal logic (LTL) formulae over input/output signals [44]. Given an LTL formula and input/output signal partitioning, the controller synthesis for this specification is reducible to solving a graph game with parity objective.

In our experiments, we consider LTL formulae randomly generated using the tool SPOT [25]. Then, we use the tool Rabinizer [32] to translate the formulae into deterministic parity automata. Crucially, the states of these automata contain semantic information retained by Rabinizer during the translation. We consider an encoding of the semantic information (given as sets of LTL formulae and permutations) into binary vectors. The encoding aims to capture the inherent structure within automaton states, which can later be exploited during strategy representation. Finally, for each parity automaton we consider various input/output partitionings of signals, and thus we obtain parity graph games with -valued Integer variables. The whole pipeline is described in detail in [11].

We consider graph games with liveness (parity-2) and strong fairness (parity-3) objectives. In total we consider 917 datasets, with sizes of ranging from 48 to 8608, and sizes of ranging from 38 to 128.

Figure 10: LTL synthesis.

Fig. 10 illustrates the results, where both subfigures plot the ratios of sizes (captured on the -axis) for two considered algorithms. The left figure considers the basic decision-tree algorithm and the algorithm with linear classifiers and entropy-based splits (). The arithmetic, geometric, and harmonic means of the ratios are , , and , respectively. The right figure considers the basic decision-tree algorithm and the algorithm with linear classifiers and auc-based splits (). The arithmetic, geometric, and harmonic means of the ratios are , , and , respectively.

5.2 MDPs and almost-sure winning strategies

5.2.1 LTL synthesis with randomized environment.

In LTL synthesis, given a formula and an input/otput signal partitioning, there may be no controller that satisfies the LTL specification. In such a case, it is natural to consider a different setting where the environment is not antagonistic, but behaves randomly instead. There are LTL specifications that are unsatisfiable, but become satisfiable when randomized environment is considered. Such special case of LTL synthesis reduces to solving MDPs with almost-sure parity objectives [17]. Note that in this setting, the precise probabilities of environment actions are immaterial, as they have no effect on the existence of a controller ensuring an objective almost-surely (i.e., with probability ).

We consider 414 instances of LTL synthesis reducible to graph games with co-Büchi (i.e., parity-) objective, where the LTL specification is unsatisfiable, but becomes satisfiable with randomized environment (which reduces to MDPs with almost-sure co-Büchi objective). The examples have been obtained by the same pipeline as the one described in the previous subsection. In the examples, the sizes of range from 80 to 26592, and the sizes of range from 38 to 74.

Figure 11: LTL synthesis with randomized environment.

The experimental results are summarized in Fig. 11. The two subfigures plot the ratios of sizes (captured on the -axis) for two considered algorithms. The left figure considers the basic decision-tree algorithm and the algorithm with linear classifiers and entropy-based splits (). The arithmetic, geometric, and harmonic means of the ratios are , , and , respectively. The right figure considers the basic decision-tree algorithm and the algorithm with linear classifiers and auc-based splits (). The arithmetic, geometric, and harmonic means of the ratios are , , and , respectively.

5.2.2 PRISM model checking.

We consider model checking of probabilistic systems in the model checker PRISM [34]. Given an implicit description of a probabilistic system in PRISM, and a reachability/safety LTL formula as a specification, the model checking problem of the model and the specification reduces to construction of an almost-sure winning strategy in an MDP with nonnegative Integer variables. The state-variables correspond to the variables in the implicit PRISM model description, i.e., local states of the moduli, counter values, etc. The action-variables capture the id of the module performing an action, and the id of the action performed by the module.

Model Specification NoLC LC-ent LC-auc
coin2_K1 F[finished&agree] 1820 7 142 135 45
coin2_K2 F[finished&agree] 3484 7 270 261 55
coin2_K3 F[finished&agree] 5148 7 386 373 60
coin2_K4 F[finished&agree] 6812 7 536 520 55
coin2_K9 F[finished&agree] 15132 7 1137 1123 68
coin3_K1 F[finished&agree] 27854 9 772 713 298
coin3_K2 F[finished&agree] 51566 9 1142 1074 316
coin3_K3 F[finished&agree] 75278 9 1580 1500 378
coin3_K4 F[finished&agree] 98990 9 2047 1967 388
coin4_K0 F[finished&agree] 52458 11 742 632 221
coin5_K0 F[finished&agree] 451204 13 2572 1626 566
csma2_2 F[succ_min_bo2] 8590 13 70 52 32
csma2_2 F[max_col3] 10380 13 65 54 54
csma2_3 F[succ_min_bo3] 25320 13 66 48 35
csma2_3 F[max_col4] 28730 13 63 48 59
csma2_4 F[succ_min_bo4] 73110 13 60 42 40
csma2_4 F[max_col5] 79580 13 54 41 59
firewire_abst F[exists_leader] 2535 4 12 10 8
firewire_impl_01 F[exists_leader] 22633 12 99 86 71
firewire_impl_02 F[exists_leader] 37180 12 101 85 81
firewire_impl_05 F[exists_leader] 90389 12 102 85 72
leader2 F[elected] 204 12 25 18 11
leader3 F[elected] 3249 17 61 34 23
leader4 F[elected] 38016 22 152 92 45
mer10 G[!err_G] 499632 19 552 510 124
mer20 G[!err_G] 954282 19 963 922 124
mer30 G[!err_G] 1408932 19 1373 1332 126
wlan0 F[both_sent] 27380 14 244 198 232
wlan1 F[both_sent] 81940 14 272 200 286
wlan2 F[both_sent] 275140 14 288 206 353
zeroconf F[configured] 268326 24 413 330 376
Table 1: PRISM model checking.

Table 0 presents the PRISM experimental results, where we consider various case studies available from the PRISM benchmark suite [35] (e.g., communication protocols). The columns of the table represent the considered model and specification, the sizes of and , and the decision-tree sizes for the three considered construction algorithms ().

In this set of experiments, we have noticed several cases where the split heuristic based on achieves significantly worse results. Namely, in csma, wlan, and zeroconf, it is mostly outperformed by the information-gain split procedure, and sometimes it is outperformed even by standard decision trees without linear classifiers. This was caused by certain variables repeatedly having high scores (for different thresholds) when constructing some branches of the tree, even though subsequent choices of the predicates did little progress to linearly separate the data. We were able to mitigate the cases of bad predicate suggestions, e.g., by penalizing the predicates on the variables that already appear in the path to the current node (that is about to be split), however, the inferior overall performance in these benchmarks persists. This discovery motivates to consider various combinations of and information-gain methods, e.g., using information gain as a stand-in metric, in cases where yields poor scores for all considered predicates.

6 Related Work

Strategy representation. Previous non-explicit representation of strategies for verification or synthesis purposes typically used BDDs [51] or automata [41, 43] and do not explain the decisions by the current valuation of variables. Classical decision trees have been used a lot in the area of machine learning as a classifier that naturally explains a decision [40]. They have also been considered for representation of values and thus implicitly strategies for MDP in [8, 7]. In the context of verification, this approach has been modified to capture strategies guaranteed to be -optimal, for MDPs [9], partially observable MDPs [10], and (non-stochastic) games [11]. Learning a compact decision tree representation of an MDP strategy was also investigated in [37] for the case of body sensor networks.

Linear extensions of decision trees have been considered already in [24] for combinatoric optimization problems. In the field of machine learning, combinations of decision trees and linear models have been proposed as interpretable models for classification and regression [12, 47, 26, 36]. A common feature of these works is that they do not aim at classifying the training set without any errors, as in classification tasks this would bear the risk of overfitting. In contrast, our usage requires to learn the trees so that they fully fit the data.

The closest to our approach is the work of Neider et al. [42], which learns decision trees with linear classifiers in the leaves in order to capture functions with generally non-Boolean co-domains. Since the aim is not to classify, but represent fully a function, our approach is better tailored to representing strategies. Indeed, since the trees and the lines in the leaves of [42] are generated from counterexamples in the learning process, the following issues arise. Firstly, each counterexample has to be captured exactly using a generated line. With the geometric intuition, each point has to lie on a line, while in our approach we only need to separate positive and negative points by lines, clearly requiring less lines. Secondly, the generation of lines is done online and based on the single discussed point (counterexample). As a result, lines that would work for more points are not preferred, while our approach maximizes the utility of a generated line with respect to the complete data set and thus generally prefers smaller solutions. Unfortunately, even after discussing with the authors of [42] there is no compilable version of their implementation at the time of writing and no experimental confirmation of the above observations could be obtained.

7 Conclusion and Future Work

In this work, we consider strategy representation by an extension of decision trees. Namely, we consider linear classifiers as the leaf nodes of decision trees. We note that the decision-tree framework proposed in this work is more general. Consider an arbitrary data structure , with an efficient decision oracle for existence of an instance of representing a given dataset without error. Then, our scheme provides a straightforward way of constructing decision trees with instances of as the leaf nodes.

Besides representation algorithms that provably represent entire input strategy, one can consider models where an error may occur and the data structure is refined into a more precise one only when the represented strategy is not winning. Here we can consider more expressive models in the leaves, too. This could capture representation of controllers exhibiting more complicated functions, e.g. quadratic polynomial capturing that a robot navigates closely (in Euclidean distance) to a given point, or deep neural networks capturing more complicated structure difficult to access directly 


Acknowledgments. This work has been partially supported by DFG Grant No KR 4890/2-1 (SUV: Statistical Unbounded Verification), TUM IGSSE Grant 10.06 (PARSEC), Czech Science Foundation grant No. 18-11193S, Vienna Science and Technology Fund (WWTF) Project ICT15-003, the Austrian Science Fund (FWF) NFN Grants S11407-N23 (RiSE/SHiNE) and S11402-N23 (RiSE/SHiNE).


  • [1] M. Abadi, L. Lamport, and P. Wolper. Realizable and unrealizable specifications of reactive systems. In ICALP, pages 1–17, 1989.
  • [2] S. B. Akers. Binary decision diagrams. IEEE Transactions on Computers, C-27(6):509–516, 1978.
  • [3] R. Alur, R. Bodík, E. Dallal, D. Fisman, P. Garg, G. Juniwal, H. Kress-Gazit, P. Madhusudan, M. M. K. Martin, M. Raghothaman, S. Saha, S. A. Seshia, R. Singh, A. Solar-Lezama, E. Torlak, and A. Udupa. Syntax-guided synthesis. In Dependable Software Systems Engineering, pages 1–25. 2015.
  • [4] R. Alur, T. Henzinger, and O. Kupferman. Alternating-time temporal logic. Journal of the ACM, 49:672–713, 2002.
  • [5] C. Baier and J. Katoen. Principles of Model Checking. 2008.
  • [6] A. Blass, Y. Gurevich, L. Nachmanson, and M. Veanes. Play to test. In FATES, pages 32–46, 2005.
  • [7] C. Boutilier and R. Dearden. Approximate value trees in structured dynamic programming. In ICML, pages 54–62, 1996.
  • [8] C. Boutilier, R. Dearden, and M. Goldszmidt. Exploiting structure in policy construction. In IJCAI, pages 1104–1113, 1995.
  • [9] T. Brázdil, K. Chatterjee, M. Chmelík, A. Fellner, and J. Křetínský. Counterexample explanation by learning small strategies in Markov decision processes. In CAV, pages 158–177, 2015.
  • [10] T. Brázdil, K. Chatterjee, M. Chmelík, A. Gupta, and P. Novotný. Stochastic shortest path with energy constraints in POMDPs: (extended abstract). In AAMAS, pages 1465–1466, 2016.
  • [11] T. Brázdil, K. Chatterjee, J. Křetínský, and V. Toman. Strategy representation by decision trees in reactive synthesis. In TACAS, pages 385–407, 2018.
  • [12] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. 1984.
  • [13] R. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, C-35(8):677–691, 1986.
  • [14] J. Büchi. On a decision method in restricted second-order arithmetic. In International Congress on Logic, Methodology, and Philosophy of Science, pages 1–11, 1962.
  • [15] J. Büchi and L. Landweber. Solving sequential conditions by finite-state strategies. Transactions of the AMS, 138:295–311, 1969.
  • [16] K. Chatterjee. Stochastic Omega-Regular Games. PhD thesis, University of California at Berkeley, USA, 2007.
  • [17] K. Chatterjee, T. A. Henzinger, B. Jobstmann, and R. Singh. Measuring and synthesizing systems in probabilistic environments. Journal of the ACM, 62(1):9:1–9:34, 2015.
  • [18] A. Church. Logic, arithmetic, and automata. In International Congress of Mathematicians, pages 23–35, 1962.
  • [19] E. M. Clarke, T. A. Henzinger, H. Veith, and R. Bloem, editors. Handbook of Model Checking, chapter: Games and Synthesis. 2018.
  • [20] L. de Alfaro and T. Henzinger. Interface automata. In FSE, pages 109–120, 2001.
  • [21] L. de Alfaro, T. Henzinger, and F. Mang. Detecting errors before reaching them. In CAV, pages 186–201, 2000.
  • [22] C. Dehnert, S. Junges, J. Katoen, and M. Volk. A storm is coming: A modern probabilistic model checker. In CAV, pages 592–600, 2017.
  • [23] D. Dill. Trace Theory for Automatic Hierarchical Verification of Speed-independent Circuits. 1989.
  • [24] D. P. Dobkin. A nonlinear lower bound on linear search tree programs for solving knapsack problems. Journal of Computer and System Sciences, 13(1):69–73, 1976.
  • [25] A. Duret-Lutz, A. Lewkowicz, A. Fauchille, T. Michaud, E. Renault, and L. Xu. Spot 2.0 - A framework for LTL and -automata manipulation. In ATVA, pages 122–129, 2016.
  • [26] E. Frank, Y. Wang, S. Inglis, G. Holmes, and I. H. Witten. Using model trees for classification. Machine learning, 32(1):63–76, 1998.
  • [27] P. Garg, D. Neider, P. Madhusudan, and D. Roth. Learning invariants using decision trees and implication counterexamples. In POPL, pages 499–512, 2016.
  • [28] Y. Gurevich and L. Harrington. Trees, automata, and games. In STOC, pages 60–65, 1982.
  • [29] T. Henzinger, O. Kupferman, and S. Rajamani. Fair simulation. I&C, 173:64–81, 2002.
  • [30] S. Jacobs. Extended AIGER format for synthesis. CoRR, abs/1405.5793, 2014.
  • [31] S. Jacobs, R. Bloem, R. Brenguier, R. Könighofer, G. A. Pérez, J. Raskin, L. Ryzhyk, O. Sankur, M. Seidl, L. Tentrup, and A. Walker. The second reactive synthesis competition (SYNTCOMP 2015). In SYNT, pages 27–57, 2015.
  • [32] Z. Komárková and J. Křetínský. Rabinizer 3: Safraless translation of LTL to small deterministic automata. In ATVA, pages 235–241, 2014.
  • [33] P. Kontschieder, M. Fiterau, A. Criminisi, and S. R. Bulò. Deep neural decision forests. In IJCAI, pages 4190–4194, 2016.
  • [34] M. Z. Kwiatkowska, G. Norman, and D. Parker. PRISM: probabilistic symbolic model checker. In TOOLS, pages 200–204, 2002.
  • [35] M. Z. Kwiatkowska, G. Norman, and D. Parker. The PRISM benchmark suite. In QEST, pages 203–204, 2012.
  • [36] N. Landwehr, M. Hall, and E. Frank. Logistic model trees. In ECML, pages 241–252, 2003.
  • [37] S. Liu, A. Panangadan, C. S. Raghavendra, and A. Talukder. Compact representation of coordinated sampling policies for body sensor networks. In Advances in Communication and Networks, pages 6–10, 2010.
  • [38] Z. Manna and A. Pnueli. The Temporal Logic of Reactive and Concurrent Systems: Specification. 1992.
  • [39] R. McNaughton. Infinite games played on finite graphs. Annals of Pure and Applied Logic, 65:149–184, 1993.
  • [40] T. M. Mitchell. Machine Learning. 1997.
  • [41] D. Neider. Small strategies for safety games. In ATVA, pages 306–320, 2011.
  • [42] D. Neider, S. Saha, and P. Madhusudan. Synthesizing piece-wise functions by learning classifiers. In TACAS, pages 186–203, 2016.
  • [43] D. Neider and U. Topcu. An automaton learning approach to solving safety games over infinite graphs. In TACAS, pages 204–221, 2016.
  • [44] A. Pnueli. The temporal logic of programs. In FOCS, pages 46–57, 1977.
  • [45] A. Pnueli and R. Rosner. On the synthesis of a reactive module. In POPL, pages 179–190, 1989.
  • [46] J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.
  • [47] J. R. Quinlan. Learning with Continuous Classes. In

    Australian Joint Conference on Artificial Intelligence

    , pages 343–348, 1992.
  • [48] M. Rabin. Automata on Infinite Objects and Church’s Problem. Conference Series in Mathematics. 1969.
  • [49] P. Ramadge and W. Wonham. Supervisory control of a class of discrete-event processes. SIAM Journal of Control and Optimization, 25(1):206–230, 1987.
  • [50] W. Thomas. Languages, automata, and logic. In Handbook of Formal Languages, pages 389–455. 1997.
  • [51] R. Wimmer, B. Braitling, B. Becker, E. M. Hahn, P. Crouzen, H. Hermanns, A. Dhama, and O. Theel. Symblicit calculation of long-run averages for concurrent probabilistic systems. In QEST, pages 27–36, 2010.


Appendix 0.A Linear least squares problem

In this work, we consider dataset to be a set of tuples (samples) over natural numbers . Consider these samples arbitrarily ordered, then can be viewed as a matrix , where is the number of samples and is the dimension of the samples. Let denote the i-th sample, let denote the j-th dimension in the i-th sample. We remind that is partitioned into good samples and bad samples . Consider a vector constructed as follows: iff the i-th sample () is .

The linear least squares problem gets a matrix and a vector as input, and outputs a vector of weights that minimizes a certain error metric, i.e., . The error metric to be minimized in this problem is the squared error, formally

When the columns of are linearly independent (which is mostly the case in strategy datasets), the minimization has a unique solution , and the closed-form expression to obtain the solution is as follows.

In case when some columns of are linearly dependent, the solution is no longer unique. To obtain some solution, one considers an arbitrary maximum subset of linearly independent columns, and then the above expression can be used with the resulting sub-matrix.

In this work, we use the linear least squares problem to obtain a set of weights for strategy subsets that are not linearly separable. In such a case, the solution of this problem gives us a classifier (with bias ) that, intuitively, attempts to minimize the separation error on average (following the squared error metric). We obtain such classifiers in Lines 5 and 6 of Algorithm 4, and subsequently we compute the area under the ROC curve for them.

Appendix 0.B Experimental details

Here we provide additional details about the experiments and the results. We have implemented all algorithms in Python, using the scikit-learn library to manipulate classifiers, ROC curves, etc. For our experiments we have used a Linux machine with Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz (12 CPUs) and 128GB of RAM.

Figure 12: Scheduling of Washing Cycles – overall summary.

Considering linear classifiers instead of pure leaves ( instead of ) has typically a detrimental effect on the construction time. However, this is not necessarily always a case, as sometimes the decrease in output tree size outweights the overhead of testing linear separability for classifiers.

Considering the split procedure instead of the information-gain one ( instead of ) caused a significant increase in construction time. However, we stress that this is majorly due to our prototypical implementation. In the split procedure, the main loop (Line 4) considering all possible predicates is embarrassingly parallel. Hence an optimized parallel implementation of the procedure is expected to suffer minimal overhead in construction time.

Fig. 12, Fig. 13, and Fig. 14 provide a one-plot summary of the experimental results for the safe scheduling of washing cycles, LTL synthesis, and LTL synthesis with randomized environment, respectively. In all three figures, each column corresponds to a benchmark. The colored dots capture the sizes of decision trees for the considered algorithms, namely, blue for , green for , and red for .

Figure 13: LTL synthesis – overall summary.
Figure 14: LTL synthesis with randomized environment – overall summary.