1 Introduction
Graph games and MDPs. Graph games and Markov decision processes (MDPs) are classical models in reactive synthesis. In graph games, there is a finitestate graph, where the vertices are partitioned into states controlled by the two players, namely, player 1 and player 2, respectively. In each round the state changes according to a transition chosen by the player controlling the current state. Thus, the outcome of the game being played for an infinite number of rounds, is an infinite path through the graph, which is called a play. In MDPs, instead of an adversarial player 2, there are probabilistic choices. An objective specifies a subset of plays that are satisfactory. A strategy for a player is a recipe to specify the choice of the transitions for states controlled by the player. In games, given an objective, a winning strategy for a player from a state ensures the objective irrespective of the strategy of the opponent. In MDPs, given an objective, an almostsure winning strategy
from a state ensures the objective with probability 1.
Reactive synthesis and verification. The above models play a crucial role in various areas of computer science, in particular analysis of reactive systems. In reactivesystem analysis, the vertices and edges of a graph represent the states and transitions of a reactive system, and the two players represent controllable versus uncontrollable decisions during the execution of the system. The reactive synthesis problem asks for construction of winning strategies in adversarial environment, and almostsure winning strategies in probabilistic environment. The reactive synthesis for games has a long history, starting from the work of Church [18, 14] and has been extensively studied [48, 15, 28, 39], with many applications in synthesis of discreteevent and reactive systems [49, 45], modeling [23, 1], refinement [29], verification [21, 4], testing [6], compatibility checking [20], etc. Similarly, MDPs have been extensively used in verification of probabilistic systems [5, 34, 22]. In all the above applications, the objectives are regular, and the regular sets of infinite paths provide an important and robust paradigm for reactivesystem specifications [38, 50].
Strategy representation. The strategies are the most important objects as they represent the witness to winning/almostsure winning. The strategies can represent desired controllers in reactive synthesis and protocols, and formally they can be interpreted as a lookup table that specifies for every controlled state of the player the transition to choose. As a data structure to represent strategies, there are some desirable properties, which are as follows: (a) succinctness, i.e., small strategies are desirable, since smaller strategies represent efficient controllers; (b) explanatory, i.e., the representation explains the decisions of the strategies. While one standard data structure representation for strategies is binary decision diagrams (BDDs) [2, 13], recent works have shown that decision trees [46, 40] from machine learning provide an attractive alternative data structure for strategy representation [9, 11]. The two key advantages of decision trees are: (a) Decision trees utilize various predicates to make decisions and thus retain the inherent flavor of the decisions of the strategies; and (b) there are entropybased algorithmic approaches for decision tree minimization [46, 40]. However, one of the key challenges in using decision trees for strategy representation is that while in traditional machinelearning applications errors are allowed, for winning and almostsure winning strategies errors are not permitted.
Our contributions. While decision trees are a basic data structure in machine learning, their various extensions have been considered. In particular, they have been extended with linear classifiers [12, 47, 26, 36]. Informally, a linear classifier is a predicate that checks inequality of a linear combination of variables against a constant. In this work, we consider decision trees with linear classifiers for strategy representation in graph games and MDPs, which has not been considered before. First, for representing strategies where no errors are permitted, we present a method to avoid errors both in decision trees as well as in linear classification. Second, we present a new method (that is not entropybased) for choosing predicates in the decision trees, which further improves the succinctness of decisions trees with linear classifiers. We have implemented our approach, and applied it to examples of reactive synthesis from SYNTCOMP benchmarks [31], modelchecking examples from PRISM benchmarks [35], and synthesis of randomly generated LTL formulae [44]. Our experimental results show significant improvement in succinctness of strategy representation with the new data structure as compared to standard decision trees.
2 Stochastic Graph Games and Strategies
2.1 Informal description
Stochastic graph games.
We denote the set of probability distributions over a finite set
as . A stochastic graph game is a tuple , where:
and is a finite set of states for player 1 and player 2, respectively, and denotes the set of all states;

and is a finite set of actions for player 1 and player 2, respectively, and denotes the set of all actions; and

is a transition function that given a player 1 state and a player 1 action, or a player 2 state and a player 2 action, gives the probability distribution over the successor states.
We consider two special cases of stochastic graph games, namely:

graph games, where for each in the domain of , for some .

Markov decision processes (MDPs), where and .
We consider stochastic graph games with several classical objectives, namely, safety (resp. its dual reachability), Büchi (resp. its dual coBüchi), and parity objectives.
Stochastic graph games with variables. Consider a finite subset of natural numbers , and a finite set of variables over , partitioned into statevariables and actionvariables ( denotes a disjoint union). A valuation is a function that assigns values from to the variables. Let (resp., ) denote the set of all valuations to the statevariables (resp., the actionvariables). We associate a stochastic graph game with a set of variables , such that (i) each state is associated with a unique valuation , and (ii) each action is associated with a unique valuation .
Example 1
Consider a simple system that receives requests for two different channels A and B. The requests become pending and at a later point a response handles a request for the respective channel. A controller must ensure that (i) the requestpending queues do not overflow (their sizes are 2 and 3 for channels A and B, respectively), and that (ii) no response is issued for a channel without a pending request. The system can be modeled by the graph game depicted in Fig. 1. The states of player 1 (controller issuing responses) are labeled with valuations of statevariables capturing the number of pending requests for channel A and B, respectively. For brevity of presentation, the action labels (corresponding to valuations of a single actionvariable) are shown only outgoing from one state, with a straightforward generalization for all other states of player 1. Further, for clarity of presentation, the labels of states and actions for player 2 (environment issuing requests, with filled bluecolored states and actions) are omitted. The controller must ensure the safety objective of avoiding the four error states.
Strategy representation. The algorithmic problem treated in this work considers representation of memoryless almostsure winning strategies for stochastic graph games with variables. Given a stochastic graph game and an objective, a memoryless strategy for player is a function that resolves the nondeterminism for player by choosing the next action based on the currently visited state. Further, a strategy is almostsure winning if it ensures the given objective irrespective of the strategy of the other player. In synthesis and verification of reactive systems, the problems often reduce to computation of memoryless almostsure winning strategies for stochastic graph games, where the state space and action space is represented by a set of variables. In practice, such problems arise from various sources, e.g., AIGER specifications [30], LTL synthesis [44], PRISM model checking [34].
2.2 Detailed description
Plays. Given a stochastic graph game , a play is an infinite sequence of stateaction pairs such that for all we have that for some , and . We denote by the set of all plays in .
Objectives. An objective for a stochastic graph game is a Borel set . We consider the following objectives:

Reachability and safety. Given a set of target states, the reachability objective requires that a state in is eventually visited. Formally, . The dual of reachability objectives are safety objectives, where a set of safe states is given, and the safety objective requires that only states in are visited. Formally, .

Parity. For an infinite play we denote by the set of states that occur infinitely often in . Let be a given priority function. The parity objective
requires that the minimum of the priorities of the states visited infinitely often is even. The dual of the parity objective requires that the minimum of the priorities visited infinitely often is odd. For a special case of priority functions
, the corresponding parity objective (resp., its dual) is called Büchi (resp., coBüchi).
Memoryless strategies. Given a stochastic graph game , a strategy is a recipe for a player how to choose actions to extend finite prefixes of plays. Specifically, a memoryless strategy is a strategy where the player performs each choice based solely on the currently visited state. Formally, a memoryless strategy for player 1 is a function that given a currently visited state chooses the next action. Analogously, a memoryless strategy for player 2 is a function . We denote by and the sets of all memoryless strategies for player 1 and player 2 in , respectively. Given two strategies , , and a starting state , they induce a unique probability measure over the Borel sets of . In the special case of graph games, the two strategies and the starting state induce a unique play such that and for all , and for some . The strategies we consider in this work are all memoryless strategies.
Winning and almostsure winning strategies. Given a stochastic graph game and an objective , an almostsure winning strategy from state is a strategy such that for all strategies we have . A fundamental result for stochastic graph games with parity (resp., safety/reachability) objectives shows that (i) there is a memoryless almostsure winning strategy if and only if there is a general (i.e., utilizing the past and nondeterminism) almostsure winning strategy, and (ii) a memoryless almostsure winning strategy satisfies the objective with probability 1 even against general strategies of the opposing player [16]. In the special case of graph games, an almostsure winning strategy ensures for all that , and is referred to as winning strategy.
Reactive synthesis and strategies. In the analysis of reactive systems, most properties that arise in practice are regular objectives, which capture important desirable properties, such as safety, liveness, fairness. The class of regular objectives is expressible by the lineartime temporal logic (LTL) framework. The problem of synthesis from LTL specifications has received huge attention [19], and the LTL synthesis problem can be reduced to solving graph games with parity objectives. Moreover, given a model and a specification, the fundamental model checking problem asks to produce a witness that the model satisfies the specification. In model checking of probabilistic systems, the witness for a property is a policy that ensures the property almostsurely. In such settings, it is natural to consider graph games and MDPs where the state space and action space is represented by a set of variables.
3 Decision Trees and Decision Tree Learning
Here we recall decision trees (DT), representing strategies by DT, and learning DT.
Decision tree (DT) over is a tuple where is a finite rooted binary (ordered) tree, assigns to every inner node an (in)equality predicate comparing arithmetical expressions over variables , and assigns to every leaf a value or . The language
of the tree is defined as follows. For a vector
, we find a path from the root to a leaf such that for each inner node on the path, (i.e., the predicate is satisfied with valuation ) iff the first child of is on . Denote the leaf on this particular path by . Then is in the language of iff . Intuitively, captures the set of vectors accepted by the tree , i.e., vectors with accepting path in the tree (ending with ). An example is illustrated in Fig. 2 with the first children connected with unbroken arrows and the second children with dashed ones.The (usually finite) set of predicates in the codomain of is denoted by . In the example above are comparisons of variables to constants.
Representing strategies by DT has been introduced in [9]. The dimension of data points here is . The data points are natural tuples representing stateaction pairs, thus we also write them as . The strategy induced by a decision tree allows to play in iff .
A given input strategy for player defines the sets (i) , (ii) , and (iii) ( denotes a disjoint union). Further, given a subset , we define as (i) if , and (ii) otherwise. When strategies need to be represented exactly, as in the case of games, the trees have to classify all decisions correctly [11]. This in turn causes difficulties not faced in standard DT learning [40], as described below.
Example 2
Consider the reactive system and the corresponding game described in Example 1.
Consider a strategy for the controller (player 1) in this system that (i) waits in state ,
(ii) issues a response for channel B when there are more pending requests for channel B
than pending requests for channel A, and (iii) issues a response for channel A in all other cases. Then, the
strategy induces:
, and
.
The task is to represent exactly, i.e., to accept all examples and reject all examples.
Learning DT from the set of positive examples and the set of negative examples is described in Algorithm 1. A node with all the data points is gradually split into offsprings until the point where each leaf contains only elements of or only . Note that in the classical DT learning algorithms such as ID3 [46], one can also stop this process earlier to prevent overfitting, which induces smaller trees with a classification error, unacceptable in the strategy representation.
The choice of the predicate to split a node with is described in Algorithm 2. From the finite set ^{1}^{1}1The set of considered predicates is typically domainspecific, and finitely restricted in a natural way. In this work, we consider (in)equality predicates that compare values of variables to constants. A natural finite restriction is to consider only constants that appear in the dataset. we pick the one which maximizes information gain (i.e., decrease of entropy [40]). Again, due to the need of fully expanded trees with no error, we need to guarantee that we can split all nodes with mixed data even if none of the predicates provides any information gain in one step. This issue is addressed in [11]
as follows. Whenever no positive information gain can be achieved by any predicate, a predicate is chosen according to a very simple different formula using a heuristic that always returns a positive number. One possible option suggested in
[11] is captured on Line 8.4 Decision Trees with Linear Classifiers
In this section, we develop an algorithm for constructing decision trees with linear classifiers in the leaf nodes. As we are interested in representation of winning and almostsure winning strategies, we have to address the challenge of allowing no error in the strategy representation. Thus we consider an algorithm that provably represents a given strategy in its entirety. Furthermore, we present a split procedure for decisiontree algorithms, which aims to propose predicates leading into small trees with linear classifiers.
4.1 Linear classifiers in the leaf nodes
During the construction of a decision tree for a given dataset, each node corresponds to a certain subset of the dataset. This subset exactly captures the data points from the dataset that would reach the node starting from the root and progressing based on the predicates visited along the travelled path (as explained in Section 3). Notably, there might be other data points also reaching this node from the root, however, they are not part of the dataset, and thus their outcome on the tree is irrelevant for the correct dataset representation. This insight allows us to propose a decisiontree algorithm with more expressive terminal (i.e., leaf) nodes, and in this work we consider linear classifiers as the leaf nodes.
Given two vectors , their dot product (or scalar product) is defined as . Given a weight vector and a bias term , a linear classifier is defined as
Informally, a linear classifier checks whether a linear combination of vector values is greater than or equal to a constant. Intuitively, we consider strategies as good and bad vectors of natural numbers, and we use linear classifiers to decide for a given vector whether it is good or bad. On a more general level, a linear classifier partitions the space into two halfspaces, and a given vector gets classified based on the halfspace it belongs to.
Consider a finite dataset partitioned into subsets and . A linear classifier separates , if for every we have that iff . The corresponding decision problem asks, given a dataset , for existence of a weight vector and bias such that the linear classifier separates . In such a case we say that is linearly separable. Fig. 3
provides an illustration. There are efficient oracles for the decision problem of linear separability, e.g., linearprogramming solvers.
Example 3
We illustrate the idea of representing strategies by decision trees with linear classifiers. Consider the game described in Example 1 and the controller strategy for this game described in Example 2. An example of a decision tree that represents the strategy is displayed in Fig. 4. The input samples with action end in and get classified by the leftmost linear classifier, and the samples with action get classified by the rightmost linear classifier. Finally, the samples with action are rejected if there are no pending requests to channel A, and otherwise they get classified by the bottommost linear classifier. Note that the decision tree accepts each sample from and rejects each sample from , and thus indeed represents the strategy .
We are now ready to describe our algorithm for representing strategies as decision trees with linear classifiers. Algorithm 3 presents the pseudocode. At the beginning, in Line 5 the queue is initiated with the root node and the whole training set . Intuitively, the queue maintains the tree nodes that are to be processed, and in every iteration of the loop (Line 6) one node gets processed. First, in Line 7 the node gets popped together with , which is the subset of that would reach from the root node. If contains only samples from (resp., only samples from ), then becomes a leaf node with (resp., ) as the answer (Line 9). If contains samples from both, but is linearly separable by some classifier, then becomes a leaf node with this classifier (Line 11). Otherwise, becomes an inner node. In Line 13 it gets assigned a predicate by an external split procedure and in Line 14 two children of are created. Finally, in Line 15, is partitioned into the subset that satisfies the chosen predicate of and the subset that does not, and the two children of are pushed into the queue with the two subsets, to be processed in later iterations. Once there are no more nodes to be processed, the final decision tree is returned.
Construction of decision trees with linear classifiers. We present a simple running example that illustrates the key points of Algorithm 3. Fig. 5 captures the flow of construction and Fig. 6 presents the output decision tree.
Correctness. We now prove the correctness of Algorithm 3. In other words, we show that given a strategy in the form of a training set, Algorithm 3 can be used to provably represent the training set (i.e., the strategy) without errors.
Theorem 4.1
Let be a stochastic graph game, and let be a memoryless strategy for player that defines a training set partitioned into and . Consider an arbitrary split procedure that considers only predicates from which produce nonempty sat and unsatpartitions. Given as input, Algorithm 3 using the split procedure outputs a decision tree such that , which means that for all we have that iff . Thus represents the strategy .
Proof
We consider stochastic graph games with variables over a finite domain , thus . Recall that given a decision tree constructed by Algorithm 3, assigns to every inner node a predicate from , and assigns to every leaf either , or , or a linear classifier that classifies elements from into resp. .
Partial correctness. Consider Algorithm 3 with input , and let be the output decision tree. Consider an arbitrary , note that it belongs to . Consider the leaf corresponding to in . There is a unique path for down the tree from its root, induced by the predicates in the inner nodes given by . Thus is welldefined. At some point during the algorithm, was popped from the queue in Line 7, together with a dataset , and note that . Since is a leaf, there are three cases to consider:
The desired result follows.
Total correctness. Algorithm 3 uses a split procedure that considers only predicates from which produce nonempty sat and unsatpartitions. Thus the algorithm maintains the following invariant for every path in starting from the root: For each predicate , there is at most one inner node in the path such that . This invariant is indeed maintained, since any predicate considered the second time in a path inadvertedly produces an empty data partition, and such predicates are not considered by the split procedure that selects predicates for (in Line 13 of Algorithm 3).
From the above we have that the length of any path in starting from the root is at most , i.e., twice the number of variables times the size of the variable domain. We prove that the number of iterations of the loop in Line 6 is finite. The branch from Line 12 happens finitely many times, since it adds two vertices (in Line 14) to the decision tree and we have the bound on the path lengths in . Since only the branch from Line 12 pushes elements into the queue , and each iteration of the loop pops an element from in Line 7, the number of loop iterations (Line 6) is indeed finite. This proves termination, which together with partial correctness proves total correctness.∎
4.2 Splitting criterion for small decision trees with classifiers
During construction of decision trees, the predicates for the inner nodes are chosen based on a supplied metric, which heuristicly attempts to select predicates leading into small trees. The entropybased information gain is the most prevalent metric to construct decision trees, in machine learning [40, 46] as well as in formal methods [3, 9, 27, 42]. Algorithm 2 presents a split procedure utilizing information gain, supplemented with a standin metric proposed in [11].
In this section, we propose a new metric and we develop a split procedure around it. When selecting predicates for the inner nodes, we exploit the knowledge that in the descendants the data will be tested for linear separability. Thus for a given predicate, the metric tries to estimate, roughly speaking, how wellseparable the corresponding data partitions are. While the metric is wellstudied in machine learning, to the best of our knowledge, the corresponding decisiontreesplit procedure is novel, both in machine learning and in formal methods.
True/false positive/negative. Consider a fixed linear classifier , and a sample such that . If , then is a true positive () w.r.t. the classifier , otherwise and thus is a false positive (). Consider a different sample such that . If , then is a true negative (), otherwise and is a false negative (). Fig. 7 summarizes the terminology.
True/false positive rate. Consider a fixed linear classifier and a fixed dataset . We denote by the number of true positives within w.r.t. the classifier . Similarly we denote for false positives. Then, the true positive rate () is defined as , and the false positive rate () is . Intuitively, describes the fraction of good samples that are correctly classified, whereas describes the fraction of bad samples that are misclassified as good.
Area under the curve. Consider a fixed dataset and a fixed weight vector . In what follows we describe a metric that evaluates w.r.t. . First, consider a set of boundaries, which are the dot products of with samples from . Formally, . Further, consider for some . Then, consider the set of linear classifiers that “hit” the boundaries, plus a classifier that rejects all samples. Formally, . Now, the receiver operating characteristic (ROC) is a curve that plots against for the classifiers in . Intuitively, the ROC curve captures, for a fixed set of weights, how changing the bias term affects and of the resulting classifier. Ideally, we want the to increase rapidly when bias is weakened, while the increases as little as possible. We consider the area under the ROC curve (denoted ) as the metric to evaluate the weight vector w.r.t. the dataset . Intuitively, the faster the increases, and the slower the increases, the bigger the area under the ROC curve () will be.
Fig. 8 provides an intuitive illustration of the concept, where the weight vector is fixed as . The classifiers are then shown on the left subfigure, and the corresponding ROC curve (with the shaded area under the curve – ) is shown on the right subfigure. Note that the points in the ROC curve correspond to the classifiers from , and they capture their (,). The extra point corresponds to the classifier that rejects all samples.
Algorithm 4 presents a split procedure that uses as the metric to select predicates. Each considered predicate partitions input into the subset that satisfies the predicate and the subset that does not. Then, in Lines 5 and 6, two weight vectors are obtained by solving the linear least squares problem on the data partitions. This is a classical problem in statistics with a known closedform solution, and Appendix 0.A provides detailed description of the problem. Finally, the score for the predicate equals the sum of for the two weight vectors with respect to their corresponding data partitions (Line 7). At the end, in Line 8 the predicate with maximum score is selected.
The choice of as the split metric is motivated by heuristicly estimating wellseparability of data in the setting of strategy representation. A simpler metric of accuracy (i.e., the fraction of correctly classified samples) may seem as a natural choice for the estimate of wellseparability. However, in strategy representation, the data is typically very inbalanced, i.e., the sizes of are typically much smaller than the sizes of . As a result, for all considered predicates the corresponding proposed classifiers focus heavily on the samples and neglect the few samples. Thus all classifiers achieve remarkable accuracy, which gives us little information on the choice of a predicate. This is a wellknown insight, as in machine learning, the accuracy metric is notoriously problematic in the case of disproportionate classes. On the other hand, the metric, utilizing the invariance of bias, is able to focus also on the sparse subset, thus providing better estimates on wellseparability.
5 Experiments
Throughout our experiments, we consider the following construction algorithms:

Basic decision trees (Algorithm 1 with Algorithm 2), as considered in [11].

Decision trees with linear classifiers (Algorithm 3) and entropybased splitting procedure (Algorithm 2).

Decision trees with linear classifiers (Algorithm 3) and aucbased splitting procedure (Algorithm 4).
For the experimental evaluation of the construction algorithms, we consider multiple sources of problems that arise naturally in reactive synthesis, and reduce to stochastic graph games with Integer variables. These variables provide semantical information about the states (resp., actions) they identify, so a strategyrepresentation method utilizing predicates over the variables produces naturally interpretable output. Moreover, there is an inherent internal structure in the states and their valuations, which machinelearning algorithms can exploit to produce more succinct representation of strategies.
Given a game and an objective, we use an explicit solver to obtain an almostsure winning strategy. Then we consider the strategy as a list of played () and nonplayed () actions for each state, which can be used directly as an input training set (). We evaluate the construction algorithms based on succinctness of representation, which we express as the number of nonpure nodes (i.e., nodes with either a predicate or a linear classifier). Further experimental details are presented in Appendix 0.B.
5.1 Graph games and winning strategies
We consider two sources of problems reducible to strategy representation in graph games, namely, AIGER safety synthesis [30] and LTL synthesis [44].
5.1.1 AIGER – Scheduling of Washing Cycles.
The goal of this problem is to design a centralized controller for a system of washing tanks running in parallel. The system is parametrized by the number of tanks, the time limit to fill a tank with water after a request, the delay after which the tank has to be emptied again, and a number of tanks per one shared water pipe. The controller has to ensure that all requests are satisfied within the specified time limit.
The problem has been introduced in the second year of SYNTCOMP [31], the most important and wellknown synthesis competition. The problem is implicitly described in the form of AIGER safety specification [30], which uses circuits with input, output, and latch Boolean variables. This reduces directly to graph games with valued Integer variables and safety objectives. The statevariables represent for each tank whether it is currently filled, and the current deadline for filling (resp., emptying). The actionvariables capture environment requests to fill water tanks, and the controller commands to fill (resp., empty) water tanks. We consider 364 datasets, where the sizes of range from 640 to 1024000, and the sizes of range from 16 to 62.
We illustrate the results in Fig. 9. Both subfigures plot the ratios of sizes for two considered algorithms. Each dot represents a dataset, the axis captures the ratios, and the two red lines represent equality and orderofmagnitude improvement, respectively. The left figure considers the size ratios of the basic decisiontree algorithm and the algorithm with linear classifiers and entropybased splits (
). The arithmetic, geometric, and harmonic means of the ratios are
, , and , respectively. The right figure considers the basic algorithm and the algorithm with linear classifiers and aucbased splits (). The arithmetic, geometric, and harmonic means of the ratios are , , and , respectively.5.1.2 LTL synthesis.
In reactive synthesis, most properties considered in practice are regular objectives, which can be specified as lineartime temporal logic (LTL) formulae over input/output signals [44]. Given an LTL formula and input/output signal partitioning, the controller synthesis for this specification is reducible to solving a graph game with parity objective.
In our experiments, we consider LTL formulae randomly generated using the tool SPOT [25]. Then, we use the tool Rabinizer [32] to translate the formulae into deterministic parity automata. Crucially, the states of these automata contain semantic information retained by Rabinizer during the translation. We consider an encoding of the semantic information (given as sets of LTL formulae and permutations) into binary vectors. The encoding aims to capture the inherent structure within automaton states, which can later be exploited during strategy representation. Finally, for each parity automaton we consider various input/output partitionings of signals, and thus we obtain parity graph games with valued Integer variables. The whole pipeline is described in detail in [11].
We consider graph games with liveness (parity2) and strong fairness (parity3) objectives. In total we consider 917 datasets, with sizes of ranging from 48 to 8608, and sizes of ranging from 38 to 128.
Fig. 10 illustrates the results, where both subfigures plot the ratios of sizes (captured on the axis) for two considered algorithms. The left figure considers the basic decisiontree algorithm and the algorithm with linear classifiers and entropybased splits (). The arithmetic, geometric, and harmonic means of the ratios are , , and , respectively. The right figure considers the basic decisiontree algorithm and the algorithm with linear classifiers and aucbased splits (). The arithmetic, geometric, and harmonic means of the ratios are , , and , respectively.
5.2 MDPs and almostsure winning strategies
5.2.1 LTL synthesis with randomized environment.
In LTL synthesis, given a formula and an input/otput signal partitioning, there may be no controller that satisfies the LTL specification. In such a case, it is natural to consider a different setting where the environment is not antagonistic, but behaves randomly instead. There are LTL specifications that are unsatisfiable, but become satisfiable when randomized environment is considered. Such special case of LTL synthesis reduces to solving MDPs with almostsure parity objectives [17]. Note that in this setting, the precise probabilities of environment actions are immaterial, as they have no effect on the existence of a controller ensuring an objective almostsurely (i.e., with probability ).
We consider 414 instances of LTL synthesis reducible to graph games with coBüchi (i.e., parity) objective, where the LTL specification is unsatisfiable, but becomes satisfiable with randomized environment (which reduces to MDPs with almostsure coBüchi objective). The examples have been obtained by the same pipeline as the one described in the previous subsection. In the examples, the sizes of range from 80 to 26592, and the sizes of range from 38 to 74.
The experimental results are summarized in Fig. 11. The two subfigures plot the ratios of sizes (captured on the axis) for two considered algorithms. The left figure considers the basic decisiontree algorithm and the algorithm with linear classifiers and entropybased splits (). The arithmetic, geometric, and harmonic means of the ratios are , , and , respectively. The right figure considers the basic decisiontree algorithm and the algorithm with linear classifiers and aucbased splits (). The arithmetic, geometric, and harmonic means of the ratios are , , and , respectively.
5.2.2 PRISM model checking.
We consider model checking of probabilistic systems in the model checker PRISM [34]. Given an implicit description of a probabilistic system in PRISM, and a reachability/safety LTL formula as a specification, the model checking problem of the model and the specification reduces to construction of an almostsure winning strategy in an MDP with nonnegative Integer variables. The statevariables correspond to the variables in the implicit PRISM model description, i.e., local states of the moduli, counter values, etc. The actionvariables capture the id of the module performing an action, and the id of the action performed by the module.
Model  Specification  NoLC  LCent  LCauc  
coin2_K1  F[finished&agree]  1820  7  142  135  45 
coin2_K2  F[finished&agree]  3484  7  270  261  55 
coin2_K3  F[finished&agree]  5148  7  386  373  60 
coin2_K4  F[finished&agree]  6812  7  536  520  55 
coin2_K9  F[finished&agree]  15132  7  1137  1123  68 
coin3_K1  F[finished&agree]  27854  9  772  713  298 
coin3_K2  F[finished&agree]  51566  9  1142  1074  316 
coin3_K3  F[finished&agree]  75278  9  1580  1500  378 
coin3_K4  F[finished&agree]  98990  9  2047  1967  388 
coin4_K0  F[finished&agree]  52458  11  742  632  221 
coin5_K0  F[finished&agree]  451204  13  2572  1626  566 
csma2_2  F[succ_min_bo2]  8590  13  70  52  32 
csma2_2  F[max_col3]  10380  13  65  54  54 
csma2_3  F[succ_min_bo3]  25320  13  66  48  35 
csma2_3  F[max_col4]  28730  13  63  48  59 
csma2_4  F[succ_min_bo4]  73110  13  60  42  40 
csma2_4  F[max_col5]  79580  13  54  41  59 
firewire_abst  F[exists_leader]  2535  4  12  10  8 
firewire_impl_01  F[exists_leader]  22633  12  99  86  71 
firewire_impl_02  F[exists_leader]  37180  12  101  85  81 
firewire_impl_05  F[exists_leader]  90389  12  102  85  72 
leader2  F[elected]  204  12  25  18  11 
leader3  F[elected]  3249  17  61  34  23 
leader4  F[elected]  38016  22  152  92  45 
mer10  G[!err_G]  499632  19  552  510  124 
mer20  G[!err_G]  954282  19  963  922  124 
mer30  G[!err_G]  1408932  19  1373  1332  126 
wlan0  F[both_sent]  27380  14  244  198  232 
wlan1  F[both_sent]  81940  14  272  200  286 
wlan2  F[both_sent]  275140  14  288  206  353 
zeroconf  F[configured]  268326  24  413  330  376 
Table 0 presents the PRISM experimental results, where we consider various case studies available from the PRISM benchmark suite [35] (e.g., communication protocols). The columns of the table represent the considered model and specification, the sizes of and , and the decisiontree sizes for the three considered construction algorithms ().
In this set of experiments, we have noticed several cases where the split heuristic based on achieves significantly worse results. Namely, in csma, wlan, and zeroconf, it is mostly outperformed by the informationgain split procedure, and sometimes it is outperformed even by standard decision trees without linear classifiers. This was caused by certain variables repeatedly having high scores (for different thresholds) when constructing some branches of the tree, even though subsequent choices of the predicates did little progress to linearly separate the data. We were able to mitigate the cases of bad predicate suggestions, e.g., by penalizing the predicates on the variables that already appear in the path to the current node (that is about to be split), however, the inferior overall performance in these benchmarks persists. This discovery motivates to consider various combinations of and informationgain methods, e.g., using information gain as a standin metric, in cases where yields poor scores for all considered predicates.
6 Related Work
Strategy representation. Previous nonexplicit representation of strategies for verification or synthesis purposes typically used BDDs [51] or automata [41, 43] and do not explain the decisions by the current valuation of variables. Classical decision trees have been used a lot in the area of machine learning as a classifier that naturally explains a decision [40]. They have also been considered for representation of values and thus implicitly strategies for MDP in [8, 7]. In the context of verification, this approach has been modified to capture strategies guaranteed to be optimal, for MDPs [9], partially observable MDPs [10], and (nonstochastic) games [11]. Learning a compact decision tree representation of an MDP strategy was also investigated in [37] for the case of body sensor networks.
Linear extensions of decision trees have been considered already in [24] for combinatoric optimization problems. In the field of machine learning, combinations of decision trees and linear models have been proposed as interpretable models for classification and regression [12, 47, 26, 36]. A common feature of these works is that they do not aim at classifying the training set without any errors, as in classification tasks this would bear the risk of overfitting. In contrast, our usage requires to learn the trees so that they fully fit the data.
The closest to our approach is the work of Neider et al. [42], which learns decision trees with linear classifiers in the leaves in order to capture functions with generally nonBoolean codomains. Since the aim is not to classify, but represent fully a function, our approach is better tailored to representing strategies. Indeed, since the trees and the lines in the leaves of [42] are generated from counterexamples in the learning process, the following issues arise. Firstly, each counterexample has to be captured exactly using a generated line. With the geometric intuition, each point has to lie on a line, while in our approach we only need to separate positive and negative points by lines, clearly requiring less lines. Secondly, the generation of lines is done online and based on the single discussed point (counterexample). As a result, lines that would work for more points are not preferred, while our approach maximizes the utility of a generated line with respect to the complete data set and thus generally prefers smaller solutions. Unfortunately, even after discussing with the authors of [42] there is no compilable version of their implementation at the time of writing and no experimental confirmation of the above observations could be obtained.
7 Conclusion and Future Work
In this work, we consider strategy representation by an extension of decision trees. Namely, we consider linear classifiers as the leaf nodes of decision trees. We note that the decisiontree framework proposed in this work is more general. Consider an arbitrary data structure , with an efficient decision oracle for existence of an instance of representing a given dataset without error. Then, our scheme provides a straightforward way of constructing decision trees with instances of as the leaf nodes.
Besides representation algorithms that provably represent entire input strategy, one can consider models where an error may occur and the data structure is refined into a more precise one only when the represented strategy is not winning. Here we can consider more expressive models in the leaves, too. This could capture representation of controllers exhibiting more complicated functions, e.g. quadratic polynomial capturing that a robot navigates closely (in Euclidean distance) to a given point, or deep neural networks capturing more complicated structure difficult to access directly
[33].Acknowledgments. This work has been partially supported by DFG Grant No KR 4890/21 (SUV: Statistical Unbounded Verification), TUM IGSSE Grant 10.06 (PARSEC), Czech Science Foundation grant No. 1811193S, Vienna Science and Technology Fund (WWTF) Project ICT15003, the Austrian Science Fund (FWF) NFN Grants S11407N23 (RiSE/SHiNE) and S11402N23 (RiSE/SHiNE).
References
 [1] M. Abadi, L. Lamport, and P. Wolper. Realizable and unrealizable specifications of reactive systems. In ICALP, pages 1–17, 1989.
 [2] S. B. Akers. Binary decision diagrams. IEEE Transactions on Computers, C27(6):509–516, 1978.
 [3] R. Alur, R. Bodík, E. Dallal, D. Fisman, P. Garg, G. Juniwal, H. KressGazit, P. Madhusudan, M. M. K. Martin, M. Raghothaman, S. Saha, S. A. Seshia, R. Singh, A. SolarLezama, E. Torlak, and A. Udupa. Syntaxguided synthesis. In Dependable Software Systems Engineering, pages 1–25. 2015.
 [4] R. Alur, T. Henzinger, and O. Kupferman. Alternatingtime temporal logic. Journal of the ACM, 49:672–713, 2002.
 [5] C. Baier and J. Katoen. Principles of Model Checking. 2008.
 [6] A. Blass, Y. Gurevich, L. Nachmanson, and M. Veanes. Play to test. In FATES, pages 32–46, 2005.
 [7] C. Boutilier and R. Dearden. Approximate value trees in structured dynamic programming. In ICML, pages 54–62, 1996.
 [8] C. Boutilier, R. Dearden, and M. Goldszmidt. Exploiting structure in policy construction. In IJCAI, pages 1104–1113, 1995.
 [9] T. Brázdil, K. Chatterjee, M. Chmelík, A. Fellner, and J. Křetínský. Counterexample explanation by learning small strategies in Markov decision processes. In CAV, pages 158–177, 2015.
 [10] T. Brázdil, K. Chatterjee, M. Chmelík, A. Gupta, and P. Novotný. Stochastic shortest path with energy constraints in POMDPs: (extended abstract). In AAMAS, pages 1465–1466, 2016.
 [11] T. Brázdil, K. Chatterjee, J. Křetínský, and V. Toman. Strategy representation by decision trees in reactive synthesis. In TACAS, pages 385–407, 2018.
 [12] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. 1984.
 [13] R. Bryant. Graphbased algorithms for boolean function manipulation. IEEE Transactions on Computers, C35(8):677–691, 1986.
 [14] J. Büchi. On a decision method in restricted secondorder arithmetic. In International Congress on Logic, Methodology, and Philosophy of Science, pages 1–11, 1962.
 [15] J. Büchi and L. Landweber. Solving sequential conditions by finitestate strategies. Transactions of the AMS, 138:295–311, 1969.
 [16] K. Chatterjee. Stochastic OmegaRegular Games. PhD thesis, University of California at Berkeley, USA, 2007.
 [17] K. Chatterjee, T. A. Henzinger, B. Jobstmann, and R. Singh. Measuring and synthesizing systems in probabilistic environments. Journal of the ACM, 62(1):9:1–9:34, 2015.
 [18] A. Church. Logic, arithmetic, and automata. In International Congress of Mathematicians, pages 23–35, 1962.
 [19] E. M. Clarke, T. A. Henzinger, H. Veith, and R. Bloem, editors. Handbook of Model Checking, chapter: Games and Synthesis. 2018.
 [20] L. de Alfaro and T. Henzinger. Interface automata. In FSE, pages 109–120, 2001.
 [21] L. de Alfaro, T. Henzinger, and F. Mang. Detecting errors before reaching them. In CAV, pages 186–201, 2000.
 [22] C. Dehnert, S. Junges, J. Katoen, and M. Volk. A storm is coming: A modern probabilistic model checker. In CAV, pages 592–600, 2017.
 [23] D. Dill. Trace Theory for Automatic Hierarchical Verification of Speedindependent Circuits. 1989.
 [24] D. P. Dobkin. A nonlinear lower bound on linear search tree programs for solving knapsack problems. Journal of Computer and System Sciences, 13(1):69–73, 1976.
 [25] A. DuretLutz, A. Lewkowicz, A. Fauchille, T. Michaud, E. Renault, and L. Xu. Spot 2.0  A framework for LTL and automata manipulation. In ATVA, pages 122–129, 2016.
 [26] E. Frank, Y. Wang, S. Inglis, G. Holmes, and I. H. Witten. Using model trees for classification. Machine learning, 32(1):63–76, 1998.
 [27] P. Garg, D. Neider, P. Madhusudan, and D. Roth. Learning invariants using decision trees and implication counterexamples. In POPL, pages 499–512, 2016.
 [28] Y. Gurevich and L. Harrington. Trees, automata, and games. In STOC, pages 60–65, 1982.
 [29] T. Henzinger, O. Kupferman, and S. Rajamani. Fair simulation. I&C, 173:64–81, 2002.
 [30] S. Jacobs. Extended AIGER format for synthesis. CoRR, abs/1405.5793, 2014.
 [31] S. Jacobs, R. Bloem, R. Brenguier, R. Könighofer, G. A. Pérez, J. Raskin, L. Ryzhyk, O. Sankur, M. Seidl, L. Tentrup, and A. Walker. The second reactive synthesis competition (SYNTCOMP 2015). In SYNT, pages 27–57, 2015.
 [32] Z. Komárková and J. Křetínský. Rabinizer 3: Safraless translation of LTL to small deterministic automata. In ATVA, pages 235–241, 2014.
 [33] P. Kontschieder, M. Fiterau, A. Criminisi, and S. R. Bulò. Deep neural decision forests. In IJCAI, pages 4190–4194, 2016.
 [34] M. Z. Kwiatkowska, G. Norman, and D. Parker. PRISM: probabilistic symbolic model checker. In TOOLS, pages 200–204, 2002.
 [35] M. Z. Kwiatkowska, G. Norman, and D. Parker. The PRISM benchmark suite. In QEST, pages 203–204, 2012.
 [36] N. Landwehr, M. Hall, and E. Frank. Logistic model trees. In ECML, pages 241–252, 2003.
 [37] S. Liu, A. Panangadan, C. S. Raghavendra, and A. Talukder. Compact representation of coordinated sampling policies for body sensor networks. In Advances in Communication and Networks, pages 6–10, 2010.
 [38] Z. Manna and A. Pnueli. The Temporal Logic of Reactive and Concurrent Systems: Specification. 1992.
 [39] R. McNaughton. Infinite games played on finite graphs. Annals of Pure and Applied Logic, 65:149–184, 1993.
 [40] T. M. Mitchell. Machine Learning. 1997.
 [41] D. Neider. Small strategies for safety games. In ATVA, pages 306–320, 2011.
 [42] D. Neider, S. Saha, and P. Madhusudan. Synthesizing piecewise functions by learning classifiers. In TACAS, pages 186–203, 2016.
 [43] D. Neider and U. Topcu. An automaton learning approach to solving safety games over infinite graphs. In TACAS, pages 204–221, 2016.
 [44] A. Pnueli. The temporal logic of programs. In FOCS, pages 46–57, 1977.
 [45] A. Pnueli and R. Rosner. On the synthesis of a reactive module. In POPL, pages 179–190, 1989.
 [46] J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.

[47]
J. R. Quinlan.
Learning with Continuous Classes.
In
Australian Joint Conference on Artificial Intelligence
, pages 343–348, 1992.  [48] M. Rabin. Automata on Infinite Objects and Church’s Problem. Conference Series in Mathematics. 1969.
 [49] P. Ramadge and W. Wonham. Supervisory control of a class of discreteevent processes. SIAM Journal of Control and Optimization, 25(1):206–230, 1987.
 [50] W. Thomas. Languages, automata, and logic. In Handbook of Formal Languages, pages 389–455. 1997.
 [51] R. Wimmer, B. Braitling, B. Becker, E. M. Hahn, P. Crouzen, H. Hermanns, A. Dhama, and O. Theel. Symblicit calculation of longrun averages for concurrent probabilistic systems. In QEST, pages 27–36, 2010.
Appendix
Appendix 0.A Linear least squares problem
In this work, we consider dataset to be a set of tuples (samples) over natural numbers . Consider these samples arbitrarily ordered, then can be viewed as a matrix , where is the number of samples and is the dimension of the samples. Let denote the ith sample, let denote the jth dimension in the ith sample. We remind that is partitioned into good samples and bad samples . Consider a vector constructed as follows: iff the ith sample () is .
The linear least squares problem gets a matrix and a vector as input, and outputs a vector of weights that minimizes a certain error metric, i.e., . The error metric to be minimized in this problem is the squared error, formally
When the columns of are linearly independent (which is mostly the case in strategy datasets), the minimization has a unique solution , and the closedform expression to obtain the solution is as follows.
In case when some columns of are linearly dependent, the solution is no longer unique. To obtain some solution, one considers an arbitrary maximum subset of linearly independent columns, and then the above expression can be used with the resulting submatrix.
In this work, we use the linear least squares problem to obtain a set of weights for strategy subsets that are not linearly separable. In such a case, the solution of this problem gives us a classifier (with bias ) that, intuitively, attempts to minimize the separation error on average (following the squared error metric). We obtain such classifiers in Lines 5 and 6 of Algorithm 4, and subsequently we compute the area under the ROC curve for them.
Appendix 0.B Experimental details
Here we provide additional details about the experiments and the results. We have implemented all algorithms in Python, using the scikitlearn library to manipulate classifiers, ROC curves, etc. For our experiments we have used a Linux machine with Intel(R) Xeon(R) CPU E51650 v3 @ 3.50GHz (12 CPUs) and 128GB of RAM.
Considering linear classifiers instead of pure leaves ( instead of ) has typically a detrimental effect on the construction time. However, this is not necessarily always a case, as sometimes the decrease in output tree size outweights the overhead of testing linear separability for classifiers.
Considering the split procedure instead of the informationgain one ( instead of ) caused a significant increase in construction time. However, we stress that this is majorly due to our prototypical implementation. In the split procedure, the main loop (Line 4) considering all possible predicates is embarrassingly parallel. Hence an optimized parallel implementation of the procedure is expected to suffer minimal overhead in construction time.
Fig. 12, Fig. 13, and Fig. 14 provide a oneplot summary of the experimental results for the safe scheduling of washing cycles, LTL synthesis, and LTL synthesis with randomized environment, respectively. In all three figures, each column corresponds to a benchmark. The colored dots capture the sizes of decision trees for the considered algorithms, namely, blue for , green for , and red for .