dtControl: Decision Tree Learning Algorithms for Controller Representation

02/12/2020 ∙ by Pranav Ashok, et al. ∙ 0

Decision tree learning is a popular classification technique most commonly used in machine learning applications. Recent work has shown that decision trees can be used to represent provably-correct controllers concisely. Compared to representations using lookup tables or binary decision diagrams, decision trees are smaller and more explainable. We present dtControl, an easily extensible tool for representing memoryless controllers as decision trees. We give a comprehensive evaluation of various decision tree learning algorithms applied to 10 case studies arising out of correct-by-construction controller synthesis. These algorithms include two new techniques, one for using arbitrary linear binary classifiers in the decision tree learning, and one novel approach for determinizing controllers during the decision tree construction. In particular the latter turns out to be extremely efficient, yielding decision trees with a single-digit number of decision nodes on 5 of the case studies.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Formal synthesis of controllers enforcing complex specifications on cyber-physical systems has gained significant attention in the last few years. This is mainly due to the need for obtaining formally verified control strategies rendering some complex tasks; these are usually represented using temporal logic specifications or (in)finite strings over automata. There are several techniques and tools available that provide automated, correct-by-construction, controller synthesis for cyber-physical systems by utilizing symbolic models (a.k.a. finite abstractions) (Tabuada, 2009; Belta et al., 2017), in which the uncountable continuous states and inputs are aggregated to finite symbolic states and inputs via quantization (a.k.a. discretization). The so-called symbolic controllers are then computed by utilizing algorithmic machinery from computer science and then mapped back for use in the original systems. The state-of-the-art tools to synthesize such controllers are, e.g., SCOTS (Rungger and M, 2016), pFaces (Khaled and Zamani, 2019), QUEST (Jagtap and Zamani, 2017), Pessoa (Mazo et al., 2010), CoSyMA (Mouelhi et al., 2013), or Uppaal Stratego (David et al., 2015). These tools give a huge list of state-action pairs (a.k.a. lookup tables) representing  controllers.

Storing these symbolic controllers in the memory is a major problem because they usually need to run on embedded devices with limited memory. However, if we do not store the controllers as lookup tables, but take advantage of decision trees (DT) (Mitchell, 1997), which exploit their hidden structure to represent them in a more compact way, we can mitigate this problem. As shown in (Ashok et al., 2019b), DTs can be orders of magnitude smaller than lookup tables. Such a concise representation opens the door for better readability, understandability, and explainability of the controllers, while reducing memory requirements and preserving correctness guarantees. Moreover, human-understandable controllers may also provide insight into the models themselves, thus aiding their validation, as we illustrate in the example below.

Our setting is inherently different from the usual use of DT in machine learning; there, in order to generalize well, DTs typically do not fit the training data exactly; in contrast, in this work, DTs have to exactly represent the given controllers in order to preserve their correctness guarantee. Therefore, our requirements on DTs differ: beside the size and the explainability, it is also the perfect fitting. Consequently, it is necessary to thoroughly re-evaluate current DT-learning algorithms and possibly also modify them.

A basic technique used to represent controllers more concisely is to determinize them, i.e. to make them not (maximally) permissive but only retain a single action for each state. To this end, one can use, for instance, the action with the minimum norm from a reference input, when least energy consuming controllers are preferred (Meyer et al., 2017), or the previously applied action (if possible), when lazy controllers are preferred (Mazo et al., 2010; Mouelhi et al., 2013). Such a size reduction by determinization can be applied as pre-processing before learning the DT representation of the controller, typically yielding also a smaller DT. Alternatively, one can apply other kinds of reduction by determinization as post-processing after constructing the DT. For instance, in “safe pruning” of (Ashok et al., 2019b), the DT constructed for the maximally permissive controller is modified as follows. The leaves of the tree are merged in a bottom-up fashion, thereby reducing the size and partially determinizing it. In contrast, here we introduce a novel approach for determinizing the controllers during the construction of the DT, with advantages to both pre-processing and post-processing methods. Firstly, since the choice of the action for each state greatly affects the size and structure of the DT, it is advantageous to guide the choice by the concrete, already built part of the DT, compared to a-priori choices made by pre-processing approaches. Secondly, while the post-processing approaches have to construct a large tree first, our new technique constructs an already reduced tree, avoiding the intermediate large one, thus making it more scalable.

Motivating Example



Figure 1. Decision tree for the temperature controller

Consider a temperature control system running in a building with 10 rooms with the heater installed only in 2 rooms as described in (Jagtap and Zamani, 2017). The permissive controller maintaining the temperatures of all the rooms within a certain range obtained using SCOTS is a lookup table with 52,488 state-action pairs. By naively determinizing, we get a lookup table with 26,244 symbolic states (i.e. domain of the controller) and their respective actions. The standard DT-learning, e.g. (Breiman et al., 1984), applied to these two lookup tables yields DT with 8,648 and with 2,703 decision nodes, respectively. While this is an improvement, it is far from being explainable. With the help of our novel determinization strategy presented in Section 4.2, we are able to obtain the decision tree with only 3 (!) decision nodes, see Figure 1. Apart from obtaining a compact and easily implementable controller representation while preserving correctness guarantees, the result is so small that it is immediately explainable and, moreover, allows us to improve on the implementation: one can readily see that we only need to install temperature sensors in two rooms instead of all 10 rooms, which will help users to reduce the system deployment cost as well as the required bandwidth to transfer the state information to the controller. Only 4 symbols (leaves of the tree) need to be transferred to realize the controller.

We also obtain a controller with very few nodes for the cruise-control model of (Larsen et al., 2015). From such a clear representation one immediately notices that the controller makes the car decelerate when the car in front of it is far away. This counter-intuitive behaviour has thus revealed a bug in the model, which did not actually describe the intended behaviour of the system.

The contribution of this paper can be summarized as follows:

  • We present dtControl

    , an open-source tool to convert formally verified controllers to decision trees preserving their correctness guarantees.

    dtControl has a simple input format and already supports automated conversion for controllers generated by two state-of-the-art tools – Uppaal Stratego (David et al., 2015) and SCOTS (Rungger and M, 2016). It supports several output formats, most importantly the graphical output as DOT files, useful for further analysis and visual presentation, and the C source code, useful for closed-loop simulation or for loading onto embedded devices.

  • We introduce a new technique for using arbitrary binary classifiers in the DTs and a novel approach for determinizing controllers during the DT learning. Our approach is tuned towards obtaining extremely small, explainable DTs. In 5 out of 8 case studies where it is applicable (the original controllers are non-deterministic), it produces trees with single-digit numbers of decision nodes.

  • We present a comprehensive evaluation of 8 DT-learning algorithms on 10 case studies.

Related Work

DTs (Mitchell, 1997, Chapter 3) are a well-known class of data structures, particularly known for their interpretability, used mostly by machine learning practitioners in classification or regression tasks. Our work is based on well-known algorithms for decision tree learning, namely CART (Breiman et al., 1984), C4.5 (Quinlan, 1993) and OC1 (Murthy et al., 1993).

There has been previous work on combining decision trees with classifiers, namely Perceptrons 

(Utgoff, 1988)

, Logistic Regression models 

(Landwehr et al., 2003), piece-wise functions (Neider et al., 2016)

or Support-Vector Machines 

(Christou and Efremidis, 2007; Ashok et al., 2019a). We generalize those approaches by allowing for arbitrary binary classifiers to be used in our trees. Additionally, those methods are either restricted to only use two labels, which is not applicable for controllers with more than two possible actions, or they only allow linear classifiers in leaf nodes (Ashok et al., 2019a; Neider et al., 2016). In contrast, our approach is applicable with an arbitrary number of actions and also leverages the power of linear classifiers in inner nodes.

An alternative to DTs are binary decision diagrams (BDD) (Bryant, 1986). As seen in (Ashok et al., 2019b; Brázdil et al., 2018, 2015), BDDs have several disadvantages: firstly, they do not retain the inherent flavour of decisions of strategies as maps from states to actions due to their bit-level representation and, hence, are hardly explainable. Secondly, they are notoriously hard to minimize (Brázdil et al., 2018), also because finding the best variable ordering is NP-complete (Bryant, 1986). BDDs only allow binary classification, so the actions have to be joined with the state space to represent a controller. The recent result in (Zapreev et al., 2018)

discusses various heuristic-based determinization algorithms for BDDs representing controllers; however, they still suffer from those disadvantages we mentioned for BDDs. Algebraic decision diagrams (ADD) 

(Bahar et al., 1997) are an extension of BDDs that allow to have more than two labels, i.e. associate every action to a leaf node. However, they still suffer from the same drawbacks as BDDs. In (Girard, 2013) ADDs are used for controller representation; however, no concrete algorithm is provided.

The formal methods community has made use of decision trees to represent controllers and counterexamples arising out of model checking Markov decision processes, stochastic games and LTL synthesis

(Brázdil et al., 2015; Ashok et al., 2019a, b; Brázdil et al., 2018)

. DTs have also been used to represent learnt policies from reinforcement learning

(Pyeatt et al., 2001). However, in contrast to our paper, (Pyeatt et al., 2001) does not preserve safety guarantees, only considers axis-aligned splits and does not consider non-determinism. (Julian et al., 2018) suggests the possibility of using regression trees for representing policies, whereas we consider classification trees.

2. Tool

dtControl is an easy-to-use open-source tool for post-processing memoryless symbolic controllers into various compact and more interpretable representations. We report the input and output formats as well as the algorithms that are currently supported. Note that the tool can easily be extended with new formats and algorithms. dtControl is distributed as an easy-to-install pip package111pip is a standard package-management system used to install and manage software packages written in Python. See https://pypi.org/project/dtcontrol/. along with a user and developer manual222Available at https://dtcontrol.readthedocs.io/en/latest/.


dtControl works with Python version 3.6.7 or higher. The core of the tool which runs the learning algorithms require numpy, pandas and scikit-learn (Pedregosa et al., 2011). Optionally, dtControl may also require the C-based oblique decision tree tool OC1 (Murthy et al., 1993).

Input formats

dtControl currently accepts controllers in three formats: (i) a raw comma-separated values (CSV) format with each row consisting of a vector of state variables concatenated with a vector of input variables; (ii) a sparse matrix format used by SCOTS; and (iii) the raw strategy produced by Uppaal Stratego. More details about the various formats are described in the user manual.


dtControl offers a range of parameters to adjust the DT learning algorithm, which are described in Section 4.

Output formats

dtControl outputs the decision tree in the DOT graph representation language (for visual presentation of the tree), as well as C code that can be directly used for implementation; see Appendix A for the DOT and C output that dtControl produces for the DT in Figure 1. Additionally, dtControl reports statistics for every constructed tree, namely size, the minimum number of bits required to represent symbols in obtained controller, and the construction time.

3. Preliminaries - Decision tree learning

A decision tree (DT) over the domain with the set of labels is a tuple , where is a finite full binary tree (every node has exactly 0 or 2 children), assigns to every leaf node (node with 0 children) a label and assigns to every inner node (node with 2 children, also called decision node) of the tree a predicate, which is a boolean function .

The semantics of a DT is as follows: given a state , there is a unique decision path through the tree starting from the root node (the only node with no parent) to a leaf node . This means that the label for state is . The decision path is defined by starting at the root node, and then for each decision node evaluating the predicate on the state, i.e. computing , and picking the left child if the predicate is true and the right child otherwise.

For example, consider the DT in Figure 1: has 7 nodes, 3 of which are decision nodes (including the root node) and 4 of which are leaf nodes. A state of the system is a vector of 10 temperatures, e.g. To find the decision for this state, we first evaluate the predicate in the root node. Since the temperature in the second room is smaller than 20.625, the predicate is true and we go to the left child. We evaluate the next predicate in the same fashion and arrive at the leaf node labelled , which gives us a safe control input, in this case to turn on both heaters.

All DT learning algorithms implemented in dtControl follow the same underlying structure: given a finite set of feature-label pairs, it returns a DT that represents precisely; this means that for every , the leaf node of the decision path for has the label . In the setting of this paper, is a controller, features are states and labels are actions333We use the term actions instead of control inputs, to avoid confusion because of the fact that the control inputs are the outputs of a DT..

To learn the DT, the algorithm tries to minimize the entropy of , denoted , by splitting it according to a predicate. Formally, for some ,


is the empirical probability of label

being in ; notation denotes the cardinality of a set. The underlying algorithm works recursively as follows:

  • Base case: If , i.e. all pairs have the same label , then return the following DT: the tree has only a single node , with , and has no domain in this case, as there are no decision nodes.

  • Recursive case: If , needs to be split; for that, we use some predicate which splits , where the set PREDS to be picked here is a parameter of the algorithm that is discussed in Section 4.1. We pick the predicate that minimizes the entropy after the split, i.e.,

    Intuitively, the best predicate is the one which is able to split into two parts which are as homogeneous as possible. Given the best predicate, we recursively call the algorithm on the subsets resulting from the split, getting two DTs and ; the indices and indicate whether the predicate was true or false, respectively. Then we return the following DT: the tree has the root node , with the left child being the root of and the right child the root of . uses for leaves of the left sub-tree and for the right sub-tree. is defined similarly on the inner nodes of the left and right sub-trees, with the addition that , i.e. the predicate of the root of is the predicate we used for the split.

The symbolic controllers designed by SCOTS and Uppaal Stratego are generated by correct-by-construction synthesis procedures. In order to use these controllers for original systems (i.e. with infinite continuous states and inputs), we need to refine the controllers. For more details on refinement procedures, we kindly refer the interested reader to  (Reissig et al., 2016; Tabuada, 2009; Larsen et al., 2018).

dtControl preserves the correctness guarantees by representing the symbolic controllers precisely, i.e. iterating until the entropy in all leaf nodes is 0. In the case of determinization, dtControl represents one of the deterministic sub-controllers precisely, which is chosen on-the-fly during the construction.

4. Methods

There are two parameters of dtControl: the set of predicates to consider (PREDS) and the way in which non-determinism is handled. For each of these, dtControl implements existing ideas and introduces new ones. Here, we only report the high-level ideas; for a more detailed description, refer to the user or developer manual.

4.1. Predicates

4.1.1. Existing idea: Axis-aligned splits

In the standard algorithms, e.g (Breiman et al., 1984; Quinlan, 1993), only axis-aligned splits are considered; i.e. predicates that can only have the form , where is one of the state variables, , and . In our setting, the set of possible predicates is greatly restricted due to discretization (quantization). The number of splits to be evaluated for each variable is equal to the number of discrete values of .

4.1.2. Existing idea: Oblique splits

Beside the standard axis-aligned splits, dtControl also supports predicates of the form , where . These oblique predicates (Murthy et al., 1993) incorporate information from multiple state variables in a single split and thus have the potential to greatly simplify the induced decision tree (Ashok et al., 2019a). However, due to combinatorial explosion, it is too costly to simply enumerate all possible oblique predicates even in the discretized space, due to which different heuristics are employed (Murthy et al., 1993). In this regard, dtControl supports the usage of predicates obtained using (an adapted version of) the OC1 algorithm (Murthy et al., 1993).

4.1.3. New technique: Using binary machine-learnt classifiers

It is possible to find non-axis-aligned predicates splitting the controller by using classification techniques from machine learning. As our main goal is for the resulting tree to be explainable, we want to avoid complex predicates, and thus we restrict the classifiers we consider in two ways: (i) we only consider linear classifiers, and (ii) we restrict to binary classifiers, so that the resulting tree is binary.

We use these binary linear classifiers in a way that is similar to the classical one-vs-the-rest classification, e.g. (Bishop, 2007, Chapter 4): For each action , we train a classifier that tries to separate the states with that action from the rest. We then pick that classifier whose predicate minimizes the entropy, i.e.

We considered various linear classification techniques including Logistic Regression (Bishop, 2007, Chapter 4), linear Support Vector Machines (SVM) (Bishop, 2007, Chapter 7), Perceptrons (Bishop, 2007, Chapter 5)

, and Naive Bayes 

(Zhang, 2004). However, the latter two yielded significantly larger DTs in all of our experiments, so dtControl does not offer these algorithms to the end-user.

In summary, dtControl currently supports four possibilities for the set PREDS: axis-aligned predicates, the modified oblique split heuristic from (Murthy et al., 1993) and oblique splits obtained either via logistic regression or linear SVM classifiers. Due to the modular structure of the code, it is easy to extend the existing approaches or add new methods, as described in our developer manual.

4.2. Non-determinism

In the general algorithm described in Section 3, for the sake of simplicity, we restricted our procedure to controllers that deterministically choose a single control input. In case of non-deterministic (also called permissive) controllers, the tuples in the controller have the form , where is now a set of admissible control inputs. One approach to handle non-determinism is to simply assign a unique label to each set, and hence reduce the setting to the case where for every state there is only a single label. This means that the DT algorithm can be used in exactly the same way as described in Section 3. This method retains all information that was initially present in the given controller.

The disadvantage of handling non-determinism like this is that the number of unique classes may be as large as . In order to avoid this blow-up and optimize memory, one can decide to determinize the controller. If we have some knowledge about which value of a control input is optimal, e.g. from domain knowledge or since it was computed by an optimization algorithm as in Uppaal Stratego (David et al., 2015), this information can be used, eliminating the non-deterministic choice. Otherwise, one can use a standard determinization approaches, e.g. picking the value with the minimum norm. The tree can then simply be constructed from the determinized labels. Additionally, we propose the following alternative to these determinization approaches.

Novel determinization approach: Maximal frequencies

Our new determinization technique MaxFreq aims to minimize the size of the resulting DT. The underlying general idea is simple: if many of the data points share the same label, a DT learning algorithm should group them together under the common label. This idea naturally gives a determinizing strategy when applied in our context.

Consider a set of pairs of state and sets of actions. The goal is to identify for each state a single action which can be assigned to it. Let be the function for action frequency, which maps actions to their number of occurrences in . Then, for each state such that , we re-assign to the single label which appears with the highest frequency. Formally, our determinization procedure produces for each state , an action , where

Once we have determinized , we can use any method presented in Section 4.1 to find a predicate for the current node. After the set is split, the procedure is recursively applied to both child nodes, recomputing the action frequency each time.

In summary, dtControl offers 3 different possibilities to handle non-determinism: unique labels retaining the information, determinizing upfront by picking the action with the minimal norm, and using the novel heuristic MaxFreq.

5. Experiments

Most permissive controller Determinized controller
Case Study Lookup table CART LinSVM LogReg OC1 MaxFreq MaxFreqLC MinNorm MinNormLC
Single-input non-deterministic
cartpole (Jagtap et al., 2018) 271 127 126 100 92 6 7 56 39
2D Thermal (Girard, 2013) 40,311 14 14 8 12 5 4 8 4
helicopter (Jagtap et al., 2018) 280,539 3,174 2,895 1,877 115 134 677 526
cruise (Larsen et al., 2015) 295,615 494 543 392 374 2 2 282 197
dcdc (Rungger and M, 2016) 593,089 136 140 70 90 5 5 11 11
Multi-input non-deterministic
10D Thermal (Jagtap and Zamani, 2017) 26,244 8,649 67 74 2,263 4 10 2,704 28
truck_trailer(Khaled and Zamani, 2019) 1,386,211 169,195 21,598 12,611 95,417 30,888
traffic(Swikir and Zamani, 2019) 16,639,662 6,287 4,477 98 80 690
Multi-input deterministic
vehicle (Rungger and M, 2016) 48,018 6,619 6,592 5,195 4,886 n/a n/a n/a n/a
aircraft (Rungger et al., 2015) 2,135,056 456,929 407,523 n/a n/a n/a n/a
Table 1. Result of running the various methods on 10 different case studies. The ‘Lookup table’ column gives the size of the domain of the original controller. For all other columns, the number of decision paths in the constructed tree is indicated. The case studies are grouped together by the number of control inputs and methods based on whether they preserve non-determinism. indicates that the computation did not finish within 3 hours; n/a indicates that the approach is not applicable (we cannot determinize, as the model is already deterministic).

All experiments were conducted on a server running on an Intel Xeon W-2123 processor with a clock speed of 3.60GHz and 64 GB RAM. We ran the unique-label approach with all 4 possible predicate classes (see Section 4.1): axis-aligned predicates (CART) (Breiman et al., 1984), oblique predicates with linear support-vector machines (LinSVM), logistic regression (LogReg), and the heuristic from (Murthy et al., 1993), called OC1. Note that all these resulting trees represent the maximally permissive controller for the finite abstraction. Additionally, on all the non-deterministic models we ran our novel determinization approach (see Section 4.2) with axis-aligned predicates (MaxFreq), and with oblique predicates (MaxFreqLC where LC stands for linear classifier). For the results in Table 1, we used logistic regression as linear classifier, because it reliably performed well. As a competitor for our determinization approach we use a-priori determinization with the minimum norm, again both with axis-aligned predicates (MinNorm) and with logistic regression for linear predicates (MinNormLC). Additionally, we compare to the random a-priori determinization, to get an impression for possible cases where MinNorm would not be a natural choice but no better is given. However, since the results are always worse, we only report the numbers in Appendix B. Since some of the algorithms rely on randomization, we ran all experiments thrice and report the median.

We run the discussed algorithms on ten case studies, five of which are marked as multi-input, containing control inputs which are multi-dimensional, i.e. . All our algorithms work by giving each multi-dimensional control input a single action label, and then working on these labels as in the case of single-dimensional control inputs.

In order to compare the sizes of the representations of the controllers fairly, we provide two different ways. Firstly, the straight-forward way is to compare the number of nodes used in the DT and the number of rows in the lookup table, which we do in Table 2 in Appendix B. However, a practically more relevant comparison should reflect the number of state symbols needed to capture the behaviour of the controller; these can also be directly related to memory requirements. To this end, in Table 1 for DTs we report the number of decision paths, as these induce a partitioning of the state space into symbolic states. For more information on this and an example, see Figure 2 and the discussion in Section 6.

Beside comparing DTs to the lookup tables, we also compare them to BDDs. However, BDDs do not directly correspond to the state symbols. Hence we refrain from the state-symbols comparison and do not report BDD sizes in Table 1, but only in Appendix B. There, we compare the number of nodes in the BDDs to the number of nodes (not decision paths) generated by our DT algorithms. The BDDs were generated using SCOTS for all models but the two from Uppaal Stratego, cruise and 2D Thermal; for these two, we used the dd and autoref Python libraries. The BDDs were minimized as much as possible by calling reordering heuristics until convergence. The results show that the DT algorithms which determinize or which do not use oblique predicates are more scalable, as they were able to compute the result for all case studies, while BDDs timed out on dcdc and traffic. Depending on the case study, BDDs are usually in the same order of magnitude as CART, sometimes better, sometimes worse. On the one hand, on 10D Thermal and truck_trailer, BDDs have an order of magnitude less nodes, but on the other hand CART is able to produce results for dcdc and traffic. Compared to MaxFreq, there is the exception of truck_trailer, where the best BDD has a quarter of the size; on all other models, MaxFreq is at least one order of magnitude better.


(a) Decision tree representation







(b) Non-uniform quantizer as a coder on the sensor side
Symbol Input
1 -1.6
2 -3.7
3 3.9
4 3.6
5 -2.9
6 2.2
(c) Lookup table for the DT-based controller
Figure 2. End-to-end usage of DT-based controller: First, a DT representation is synthesized with the help of dtControl (the result of running MaxFreq on cartpole is shown here). Then a non-uniform quantizer is implemented at the sensor side, which for each decision path (i.e. a region in the state-space), sends a state symbol to the controller. At the controller, this symbol gives actual control input. In this case, the information needs to be sent over the sensor-controller channel is bits per time unit. The theoretical lower bound on the data rate in this example is bit per time unit to achieve invariance (Tomar et al., 2017).

6. Discussion

Table 1 shows that DTs are always better than lookup tables. In the case of DTs exactly representing the most permissive controller, our linear-classifier-based algorithm, LogReg, generally performs better than the standard DT learning algorithm CART. An inspection of the trees showed that oblique splits indeed aid in this reduction. In order to save memory, however, our determinizing algorithms may be used. Here, MaxFreq and its linear classifier variant, MaxFreqLC, easily outperform all other discussed algorithms, returning trees which can be drawn on a single sheet of paper in most of our case studies! The controller produced by MaxFreq for the case study cartpole is depicted in Figure 1(a).

Apart from the compact representation of the controllers and efficient determinization, dtControl makes controllers more understandable. This helps to do some analysis for the systems and corresponding controllers. A few analyses were mentioned for the temperature control example in the introduction. Another application is that dtControl learns how to efficiently partition the state space. In general, the tools synthesizing symbolic controllers use uniform partitioning, i.e. a uniform quantizer is used to discretize the state set. Therefore, they need a large number of symbols to represent the state set. dtControl  aggregates state symbols where the same control input is admissible to reduce the number of symbols required. In other words, dtControl  provides a scheme to design non-uniform quantizers (i.e., state encoders with non-uniform partitioning of state-set), illustrated in Figure 1(b).

The entries in Table 1 correspond to the necessary number of state symbols. For instance, consider the cartpole example in Table 1. The controller obtained using SCOTS  requires symbols to represent the domain of the controller, which implies that one needs to send 9 bits per time unit over the sensor-controller channel to achieve invariance. After processing the controller using dtControl  with MaxFreq, we only need 6 symbols to represent the controller, corresponding to only 3 bits information. One can directly relate this idea of constructing efficient static coders to the notion of invariance feedback entropy introduced in (Tomar et al., 2017). This notion characterizes the necessary state information required by any coder-controller to enforce the invariance condition in the closed loop. For example, in the case of cartpole, the theoretical lower-bound on average bit rate for any static coder-controller to achieve invariance is 1 (obtained through the invariance feedback entropy (Tomar et al., 2017)), which is not far from 3, computed using dtControl.

In summary, one can utilize the results provided in this paper for constructing efficient coder-controllers for invariance properties which is an active topic in the domain of information-based control (Nair et al., 2007).

7. Conclusion

We presented dtControl, an open-source, easily extensible tool for post-processing controllers synthesized by various tools such as SCOTS and Uppaal Stratego into small, efficient and interpretable representations. The tool allows for a comparison between various representations in terms of size and performance and also allows us to export the controller both as a graphic and as a code. We also presented a new determinization technique, MaxFreq, which easily converts non-deterministic controllers into extremely small deterministic decision trees. Further algorithms for controller representation were thoroughly evaluated and made accessible to the end-user. We believe these small representations will not only allow us to save memory but also help us in understanding and validating the model. As for future work, dtControl can be extended with

  • further input and output formats, to also support tools such as pFaces(Khaled and Zamani, 2019) and QUEST(Jagtap and Zamani, 2017);

  • different predicates: this can be other, possibly even non-linear or non-binary, machine-learning classifiers or richer algebraic predicates utilizing domain knowledge;

  • other impurity measures instead of entropy, which decide the predicate used for the split

This work was supported in part by the H2020 ERC Starting Grant AutoCPS (grant agreement no 804639), the German Research Foundation (DFG) through the grants ZA 873/1-1 and KR 4890/2-1 Statistical Unbounded Verification, and the TUM International Graduate School of Science and Engineering (IGSSE) grant 10.06 PARSEC.


  • P. Ashok, T. Brázdil, K. Chatterjee, J. Křetínský, C. H. Lampert, and V. Toman (2019a) Strategy representation by decision trees with linear classifiers. In QEST (1), pp. 109–128. Cited by: §1, §1, §4.1.2.
  • P. Ashok, J. Křetínský, K. G. Larsen, A. Le Coënt, J. H. Taankvist, and M. Weininger (2019b) SOS: safe, optimal and small strategies for hybrid markov decision processes. In QEST (1), D. Parker and V. Wolf (Eds.), pp. 147–164. Cited by: §1, §1, §1, §1.
  • R. I. Bahar, E. A. Frohm, C. M. Gaona, G. D. Hachtel, E. Macii, A. Pardo, and F. Somenzi (1997) Algebraic decision diagrams and their applications. Formal Methods in System Design 10 (2/3), pp. 171–206. Cited by: §1.
  • C. Belta, B. Yordanov, and E. A. Gol (2017) Formal methods for discrete-time dynamical systems. Vol. 89, Springer. Cited by: §1.
  • C. M. Bishop (2007) Pattern recognition and machine learning, 5th edition. Information science and statistics, Springer. Cited by: §4.1.3, §4.1.3.
  • T. Brázdil, K. Chatterjee, M. Chmelik, A. Fellner, and J. Kretínský (2015) Counterexample explanation by learning small strategies in markov decision processes. In CAV (1), Lecture Notes in Computer Science, Vol. 9206, pp. 158–177. Cited by: §1, §1.
  • T. Brázdil, K. Chatterjee, J. Kretínský, and V. Toman (2018) Strategy representation by decision trees in reactive synthesis. In TACAS (1), Lecture Notes in Computer Science, Vol. 10805, pp. 385–407. Cited by: §1, §1.
  • L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone (1984) Classification and regression trees. Wadsworth. Cited by: §1, §1, §4.1.1, §5.
  • R. E. Bryant (1986) Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers 100 (8), pp. 677–691. Cited by: §1.
  • I. T. Christou and S. Efremidis (2007) An evolving oblique decision tree ensemble architecture for continuous learning applications. In AIAI, IFIP, Vol. 247, pp. 3–11. Cited by: §1.
  • A. David, P. G. Jensen, K. G. Larsen, M. Mikucionis, and J. H. Taankvist (2015) Uppaal stratego. In TACAS, Lecture Notes in Computer Science, Vol. 9035, pp. 206–211. Cited by: 1st item, §1, §4.2.
  • A. Girard (2013) Low-complexity quantized switching controllers using approximate bisimulation. Nonlinear Analysis: Hybrid Systems 10, pp. 34–44. Cited by: Table 2, §1, Table 1.
  • P. Jagtap, F. Abdi, M. Rungger, M. Zamani, and M. Caccamo (2018) Software fault tolerance for cyber-physical systems via full system restart. arXiv preprint arXiv:1812.03546. Cited by: Table 2, Table 1.
  • P. Jagtap and M. Zamani (2017) QUEST: a tool for state-space quantization-free synthesis of symbolic controllers. In International Conference on Quantitative Evaluation of Systems, pp. 309–313. Cited by: Table 2, §1, §1, Table 1, 1st item.
  • K. D. Julian, M. J. Kochenderfer, and M. P. Owen (2018)

    Deep neural network compression for aircraft collision avoidance systems

    CoRR abs/1810.04240. Cited by: §1.
  • M. Khaled and M. Zamani (2019) pFaces: an acceleration ecosystem for symbolic control. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pp. 252–257. Cited by: Table 2, §1, Table 1, 1st item.
  • N. Landwehr, M. A. Hall, and E. Frank (2003) Logistic model trees. In ECML, Lecture Notes in Computer Science, Vol. 2837, pp. 241–252. Cited by: §1.
  • K. G. Larsen, A. L. Coënt, M. Mikucionis, and J. H. Taankvist (2018) Guaranteed control synthesis for continuous systems in uppaal tiga. In Cyber Physical Systems. Model-Based Design - 8th International Workshop, CyPhy 2018, and 14th International Workshop, WESE 2018, Turin, Italy, October 4-5, 2018, Revised Selected Papers, R. D. Chamberlain, W. Taha, and M. Törngren (Eds.), Lecture Notes in Computer Science, Vol. 11615, pp. 113–133. Cited by: §3.
  • K. G. Larsen, M. Mikucionis, and J. H. Taankvist (2015) Safe and optimal adaptive cruise control. In Correct System Design, Lecture Notes in Computer Science, Vol. 9360, pp. 260–277. Cited by: Table 2, §1, Table 1.
  • M. Mazo, A. Davitian, and P. Tabuada (2010) Pessoa: a tool for embedded controller synthesis. In International Conference on Computer Aided Verification, pp. 566–569. Cited by: §1, §1.
  • P. J. Meyer, M. Rungger, M. Luttenberger, J. Esparza, and M. Zamani (2017) Quantitative implementation strategies for safety controllers. arXiv preprint:1712.05278. Cited by: §1.
  • T. M. Mitchell (1997) Machine learning. McGraw Hill series in computer science, McGraw-Hill. Cited by: §1, §1.
  • S. Mouelhi, A. Girard, and G. Gössler (2013) CoSyMA: a tool for controller synthesis using multi-scale abstractions. In Proceedings of the 16th international conference on Hybrid systems: computation and control, pp. 83–88. Cited by: §1, §1.
  • S. K. Murthy, S. Kasif, S. Salzberg, and R. Beigel (1993) OC1: A randomized induction of oblique decision trees. In AAAI, pp. 322–327. Cited by: §1, §2, §4.1.2, §4.1.3, §5.
  • G. N. Nair, F. Fagnani, S. Zampieri, and R. J. Evans (2007) Feedback control under data rate constraints: an overview. Proc. of the IEEE 95 (1), pp. 108–137. Cited by: §6.
  • D. Neider, S. Saha, and P. Madhusudan (2016) Synthesizing piece-wise functions by learning classifiers. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 186–203. Cited by: §1.
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay (2011) Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, pp. 2825–2830. Cited by: §2.
  • L. D. Pyeatt, A. E. Howe, et al. (2001) Decision tree function approximation in reinforcement learning. In

    Proceedings of the third international symposium on adaptive systems: evolutionary computation and probabilistic graphical models

    Vol. 2, pp. 70–77. Cited by: §1.
  • J. R. Quinlan (1993) C4.5: programs for machine learning. Morgan Kaufmann. Cited by: §1, §4.1.1.
  • G. Reissig, A. Weber, and M. Rungger (2016) Feedback refinement relations for the synthesis of symbolic controllers. IEEE Transactions on Automatic Control 62 (4), pp. 1781–1796. Cited by: §3.
  • M. Rungger and Z. M (2016) SCOTS: A tool for the synthesis of symbolic controllers. In HSCC, pp. 99–104. Cited by: Table 2, 1st item, §1, Table 1.
  • M. Rungger, A. Weber, and G. Reissig (2015) State space grids for low complexity abstractions. In 2015 54th IEEE Conference on Decision and Control (CDC), pp. 6139–6146. Cited by: Table 2, Table 1.
  • A. Swikir and M. Zamani (2019) Compositional synthesis of symbolic models for networks of switched systems. IEEE Control Systems Letters 3 (4), pp. 1056–1061. Cited by: Table 2, Table 1.
  • P. Tabuada (2009) Verification and control of hybrid systems: a symbolic approach. Springer Science & Business Media. Cited by: §1, §3.
  • M. S. Tomar, M. Rungger, and M. Zamani (2017) Invariance feedback entropy of uncertain control systems. arXiv preprint arXiv:1706.05242. Cited by: Figure 2, §6.
  • P. E. Utgoff (1988) Perceptron trees: A case study in hybrid concept representations. In AAAI, pp. 601–606. Cited by: §1.
  • I. S. Zapreev, C. Verdier, and M. Mazo (2018) Optimal symbolic controllers determinization for BDD storage. In ADHS, Cited by: §1.
  • H. Zhang (2004) The optimality of naive bayes. In

    Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, Miami Beach, Florida, USA

    , V. Barr and Z. Markov (Eds.),
    pp. 562–567. Cited by: §4.1.3.

Appendix A Output of dtControl for DT in Figure 1

The following is the C-code for the DT in Figure 1, and Figure 3 shows the corresponding DOT output.

if (x[1] <= 20.625) {
        if (x[4] <= 20.625) {
                result[0] = 1.0f;
                result[1] = 1.0f;
        else {
                result[0] = 1.0f;
                result[1] = 0.0f;
else {
        if (x[4] <= 20.625) {
                result[0] = 0.0f;
                result[1] = 1.0f;
        else {
                result[0] = 0.0f;
                result[1] = 0.0f;
Figure 3. The DOT output of dtControl for the DT in Figure 1, as displayed by Graphviz.

Appendix B Additional experimental results

In Table 2, we compare our algorithms as described in Section 5 to the size of BDDs representing the controllers and to the idea of randomly determinizing the controller before applying the DT algorithms. Unlike in Table 1, we report the full number of nodes, not the number of decision paths, to make the comparison to BDDs fairer. For clarity, we did not include all the algorithms from Table 1. However, if needed, one can compute the number of nodes for every algorithm by multiplying the number of decision paths in Table 1 with two and then subtracting one.