Classification of Time-Series Data Using Boosted Decision Trees

10/01/2021
by   Erfan Aasi, et al.
Boston University
Lehigh University
0

Time-series data classification is central to the analysis and control of autonomous systems, such as robots and self-driving cars. Temporal logic-based learning algorithms have been proposed recently as classifiers of such data. However, current frameworks are either inaccurate for real-world applications, such as autonomous driving, or they generate long and complicated formulae that lack interpretability. To address these limitations, we introduce a novel learning method, called Boosted Concise Decision Trees (BCDTs), to generate binary classifiers that are represented as Signal Temporal Logic (STL) formulae. Our algorithm leverages an ensemble of Concise Decision Trees (CDTs) to improve the classification performance, where each CDT is a decision tree that is empowered by a set of techniques to generate simpler formulae and improve interpretability. The effectiveness and classification performance of our algorithm are evaluated on naval surveillance and urban-driving case studies.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

05/24/2021

Inferring Temporal Logic Properties from Data using Boosted Decision Trees

Many autonomous systems, such as robots and self-driving cars, involve r...
12/28/2021

Time-Incremental Learning from Data Using Temporal Logics

Real-time and human-interpretable decision-making in cyber-physical syst...
09/17/2021

Decision Tree Learning with Spatial Modal Logics

Symbolic learning represents the most straightforward approach to interp...
07/13/2018

On Ternary Coding and Three-Valued Logic

Mathematically, ternary coding is more efficient than binary coding. It ...
07/24/2019

Interpretable Classification of Time-Series Data using Efficient Enumerative Techniques

Cyber-physical system applications such as autonomous vehicles, wearable...
02/14/2021

Connecting Interpretability and Robustness in Decision Trees through Separation

Recent research has recognized interpretability and robustness as essent...
03/15/2022

Approximate Decision Trees For Machine Learning Classification on Tiny Printed Circuits

Although Printed Electronics (PE) cannot compete with silicon-based syst...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

To cope with the complexity of robotic tasks, machine learning (ML) techniques have been employed to capture their temporal and logical structure from time-series data. One of the main problems in ML is the two-class classification problem, where the goal is to build a classifier that distinguishes desired system behaviors from the undesired ones. Traditional ML algorithms focus on building such classifiers; however, they are often not easy to understand or they don’t offer any insights about the system. Motivated by the readability and interpretability of temporal logic formulas 

[6], there has been great interest in applying formal methods to ML in recent years [1, 11, 25, 21, 26, 14, 27].

Signal Temporal Logic (STL) [18] is a specification language used to express temporal properties of real-valued signals. In this paper, we use STL to generate specifications of time-series system behaviors. Early methods for mining temporal properties from data mostly focus on parameter synthesis, given template formulas [1, 12, 10, 2]. These works require the designer to have a good understanding of the system properties. In addition, learning algorithms based on formula templates may not derive new knowledge from the data. In [15]

, a general supervised learning framework that can infer both the structure and the parameters of a formula from data is presented. The approach is based on lattice search and parameter synthesis, which makes it general, but inefficient. Using an efficient decision tree-based framework to learn STL formulas is explored in 

[4, 3], where the nodes of the tree contain simple formulae that are tuned optimally from a predefined set of primitives. In [20], the authors propose a systematic enumeration based method to learn short, interpretable STL formulas. Other works in the area of learning temporal logic formulae consider learning from positive examples only [11], clustering [25]

(i.e., unsupervised setting), active learning 

[17], and using automata-based methods for untimed specifications [21, 26].

Most existing algorithms for learning STL formulas either do not achieve good classification performance for real-world applications, such as autonomous driving, or do not provide any interpretability of the output formulas: they generate long and complicated specifications. In this paper, to address these concerns, we introduce Boosted Concise Decision Trees (BCDTs) to learn STL formulas from labeled time-series data. To improve the classification accuracy of existing works, we use a boosting method to combine multiple models with weak classification power. The weak learning models are bounded-depth decision trees, called Concise Decision Trees (CDTs). Each CDT is a Decision Tree (DT) [5]

, empowered by a set of techniques called as conciseness techniques, to generate simpler formulae and improve the interpretability of the final output. We also use a heuristic method in the BCDT algorithm to prune the ensemble of trees, which helps with the interpretability of the formulae. To relate STL and BCDTs, we establish a connection between boosted trees and weighted STL (wSTL) formulas

[19], which have weights associated with Boolean and temporal operators. We show performance gains and improved interpretability of our method compared to existing works, in naval surveillance and urban driving scenarios.

The main contributions of the paper are: (a) a novel inference algorithm based on boosted decision trees, which has better classification performance than related approaches, (b) a set of heuristic techniques to generate simple STL formulae from decision trees that improve interpretability, (c) two case studies in naval surveillance and urban-driving that highlight the classification performance and interpretability of our proposed learning algorithm.

Ii Preliminaries

Let , , denote the sets of real, integer, and non-negative integer numbers, respectively. With a slight abuse of notation, given we use . The cardinality of a set is denoted by . A (discrete-time) signal is a function that maps each (discrete) time point to an

-dimensional vector of real values, where

. Each component of is denoted as .

Signal Temporal Logic (STL) was introduced in [18]. Informally, the STL formulas used in this paper are made of predicates defined over components of real-valued signals in the form of , where is a threshold and , which are connected using Boolean operators, such as , , , and temporal operators, such as (always) and (eventually). The semantics are defined over signals. For example, formula means that, for all times 3,4,5,6, component of a signal is less than or equal 1. STL has both qualitative and quantitative semantics. We use to denote Boolean satisfaction. The quantitative semantics is given by a robustness degree (function) [7] , which captures the degree of satisfaction of a formula by a signal . Positive robustness () implies Boolean satisfaction , while negative robustness () implies violation .

Weighted STL (wSTL) [19] is an extension of STL that has the same qualitative semantics as STL, but has weights associated with the Boolean and temporal operators, which modulate its robustness degree. In this paper, we restrict our attention to a fragment of wSTL with weights on conjunctions only. For example, the wSTL formula , , denotes that and must hold with priorities and . The priorities capture the satisfaction importance of their corresponding formulas.

Parametric STL (PSTL) [1] is an extension of STL, where the endpoints of the time intervals in the temporal operators and the thresholds in the predicates are parameters. The set of all possible valuations of all parameters in a PSTL formula is called the parameter space and is denoted by . A particular valuation is denoted by and the corresponding formula by .

Iii Problem Formulation

Iii-a Motivating Example

Consider the maritime surveillance scenario from [15, 3] (see Fig. 1). The goal is to detect anomalous vessel behaviors by looking at their trajectories. A vessel behaving normally approaches from the open sea and heads directly towards the harbor, while a vessel with anomalous behaviors either veers to the island and then heads to the harbor, or it approaches other vessels in the passage between the peninsula and the island and then returns to the open sea.

In the scenario’s dataset [3], the signals are represented as 2-dimensional trajectories with planar coordinates . The labels indicate the type of a vessel’s behavior (normal or anomalous). In Fig. 1 and 1, we show the and components of some signals, respectively, over time. For better visualization, we show the signals over a part of their time horizon. In Fig. 1, one of the areas that distinguishes between positive and negative signals is the area between lines and , over the time interval . By using the restricted STL from [3], formula can be used to describe this area and distinguish between positive and negative signals. This can be obviously simplified to .

Fig. 1: (a) Naval surveillance scenario [15], where normal trajectories are shown in green, and anomalous signals are shown in blue and magenta, (b) x and (c) y components of naval trajectories. The green and red trajectories belong to the normal and anomalous behaviors, respectively.

Similarly, in Fig. 1, we can describe the separation area between lines and by the STL formula using the restricted STL from [3], which can be simplified to . Considering the common time interval between the separation areas in Fig. 1 and Fig. 1, we can combine and into a shorter, easier to read formula , where and

As it will be shown next, in this paper we will use such formulas, which are simpler than the ones in [3], to classify signals without losing classification accuracy.

Iii-B Problem Statement

Let be the set of possible (positive and negative) classes. We consider a labeled data set with data samples as , where is the signal and is its label.

Problem 1

Given a labeled data set , find an STL formula that minimizes the Misclassification Rate defined below:

(1)

Iv Solution

We propose a solution to Pb. 1 based on BCDTs (Alg. 1). Our algorithm grows multiple binary CDTs based on AdaBoost [9] that combines weak classifiers with simple formulae, trained on weighted data samples. Weights of the data represent the difficulty of correct classification. After training a weak classifier, the weights of correctly classified samples are decreased and weights of misclassified samples are increased. In Sec. IV-A and IV-B, we introduce the construction methods for BCDTs and a single CDT, respectively. We describe the methods’ meta parameters in Sec. IV-C, while in Sec. IV-D we explain the conciseness techniques and the connection with interpretability. In Sec.IV-E, we describe the translation of BCDTs to STL formulas.

Iv-a Boosted Concise Decision Trees Algorithm

The BCDT algorithm in Alg. 1 is based on the AdaBoost method [24]. The algorithm takes as input the labeled data set , the number of learners (trees) , and the weak learning model , which is the algorithm to construct CDTs (explained in Alg. 3). The CDTs are binary decision trees, where formulas of the nodes are primitives (see Sec. IV-C) with general rectangular predicates of the form , with , , as the identity matrix, and .

1:Input: , ,
2:Output: final classifier
3:Initialize:
4:for k = 1, , K:
5:     
6:     
7:     
8:     
9:
10:return
Algorithm 1 Boosted Concise Decision Trees (BCDT)

In Alg. 1, initially all data samples are weighted equally (line 3). The algorithm iterates over the number of trees (line 4). At each iteration, the weak learning algorithm constructs a single CDT based on data set and current samples’ weights (line 5). Next, the misclassification error of the constructed tree is computed (line 6). If the current tree has weak classification performance but better than random guessing (), its weight is computed based on the original AdaBoost method, and if it has perfect classification performance such that it classifies all signals correctly (), a big value is assigned to its weight (line 7). At the end of each iteration, the samples’ weights are updated and normalized (denoted by ) based on the performance of the current tree (line 8). To compute the final output of the algorithm, we use a heuristic method to prune the ensemble of trees, to generate simpler formulae and improve interpretability. Inspired by heuristic methods for pruning ensemble of decision trees in [5, 16], we compute the final output as (line 9): if the weights of all trees are less than , the final output is computed as the weighted majority vote over all the CDTs (as in the AdaBoost method); otherwise, if there are one or more trees with weight , the final output is computed by the -weight tree that has the simplest STL formula, denoted by . As a metric to compare the simplicity of formulas, the number of Boolean and temporal operators is considered. This pruning method helps with reducing the generalization error in the test phase and generating simpler formulas. We show its advantages with empirical results in Sec. V.

The final output assigns a label to each data sample. For simplicity, we abuse notation and consider and , such that for all . Note that one of the main assumptions in boosting methods is that each weak learner performs slightly better than random guessing (i.e., coin tossing). Therefore in Alg. 1, if any newly generated tree performs worse than random guessing (), we just discard it and generate another tree. An illustration of Alg. 1 is shown in Fig. 3.

Iv-B Construction of Concise Decision Tree

Decision Trees (DTs) [5, 22] are sequential decision models with hierarchical structures. In our algorithm, DTs operate on signals with the goal of predicting their labels. Inspired by [3], we present the Concise Decision Tree (CDT) method in Alg. 3, which extends the DT construction algorithm to CDTs, by applying conciseness techniques (see Sec. IV-D). To limit the complexity of CDTs, we consider three meta-parameters in Alg. 3: (1) PSTL primitives capturing the possible ways to split the data at each node, (2) impurity measures to select the best primitive at each node, and (3) stop conditions to limit the CDTs’ growth. The meta-parameters are explained in Sec. IV-C.

To explain Alg. 3, first we introduce the parameterized primitive optimization method presented in Alg. 2. This method has similar meta parameters as Alg. 3 and takes as input (1) the set of labeled signals at the current node, (2) the path formula from the root to the current node, (3) a set of input primitives prim, and (4) the depth from the root to the node. In line 4, if the stop conditions are satisfied, a label is computed according the best classification quality using the defined in Sec. IV-C2 (we identify the labels and as and , respectively). Otherwise, the best primitive from the input primitive set is computed based on the impurity measure in Sec. IV-C2. We use Alg. 2 followed by Alg. 3 to find the best primitive with optimal evaluation at each node, from the input primitive set .

1:Meta-Parameters:
2:Input: , , ,
3:Output: optimal primitive
4:if then
5:     
6:else
7:     
8:return
Algorithm 2 Parameterized Primitive Optimization

Alg. 3 is recursive, and takes as input (1) the set of labeled signals at the current node, referred to as parent node, (2) the path formula from the root to the parent node, (3) the depth from the root to the node, and (4) the candidate formula for the node. The construction of each CDT starts with .

1:Meta-Parameters:
2:Input: , , ,
3:Output: sub-tree
4:if then
5:     
6:     return
7:
8:
9:
10:for do
11:     
12:     
13:     if :
14:         return
15:
16:
17:return
Algorithm 3 Concise Decision Tree (CDT) method

At the start of Alg. 3, the stop conditions are checked (line 4). If they are satisfied, a single leaf is returned that is marked with label according to Alg. 2 (lines 5-6). Otherwise, a non-terminal node is created that is associated with the candidate formula (line 7). The formula is the updated path formula from the root, considering the candidate primitive of the parent node (line 8). Next, the data set is partitioned according to the new formula (line 9), where and are the set of signals that satisfy and violate , respectively.

Following the structure of the tree, first for the left child of the node () and then for the right child (), we follow these steps (line 10): first, the candidate primitive for the child is computed from the set (line 11), based on Alg. 2. Then, by applying the conciseness method (explained in Sec. IV-D) on the combination of parent’s candidate formula and the child’s candidate primitive , we find a new formula (line 12) as a new candidate for the parent node. If the impurity measure of the new candidate formula is better than the previous candidate (line 13), the algorithm is repeated for the current node, with replaced by (line 14). The decision tree method in [3] is based on incremental impurity reduction at each node of the tree. Following the same idea, we argue that by applying the conciseness techniques at each node of the tree, if the impurity reduction of the new candidate formula is better than the previous one, the new candidate leads to a stronger classifier with a simpler specification at the end. Finally, we continue the construction of the tree for the left and right children (lines 15-16) and the sub-tree for the parent is returned (line 17).

Iv-C Meta Parameters

Iv-C1 PSTL primitives

The splitting rules at each node are simple PSTL formulas, called primitives [3]. Here we use first-order primitives : , , where the decision parameters are .

Iv-C2 Impurity measure

We use the Misclassification Gain (MG) impurity measure [5] as a criterion to select the best primitive at each node. Given a finite set of signals , an STL formula , and the subsets of that are partitioned based on as , , we have , where , and the parameters are partition weights computed based on signals’ labels and satisfaction of . Here, we extend the robustness-based impurity measures in [3] to account for the sample weights from the BCDT in Alg. 1. The boosted impurity measures are defined by partition weights

(2)

This formulation also works for other types of impurity measures, such as information and Gini gains [23].

Iv-C3 Stop Conditions

There are multiple stopping conditions that can be considered for terminating Alg. 3. We stop the growth of trees either when they reach a given depth, or when the majority of the signals belong to the same class.

Iv-D Conciseness

We propose the conciseness method , presented in Alg. 4, to improve the simplicity and interpretability of STL formulas. This algorithm takes as inputs the candidate primitive for the parent node, the candidate primitive for its child (either left or right child) , the set of signals , path formula , and depth at the parent node. The output of the algorithm is a new candidate primitive for the parent node, denoted by .

First, the method constructs a new PSTL primitive for the parent node, denoted by , by combining the candidate primitives of the parent and the child nodes (line 3), where the combination operator is denoted by . This is done by considering the possible ways to combine two candidate primitives, which are explained in the following. Then, the optimal valuation of the new PSTL primitive is computed by applying the optimization method in Alg. 2 (line 4).

1:Input: , ,
2:Output: new candidate primitive
3:
4:
5:return
Algorithm 4 Conciseness Method

Next, we present heuristic techniques to combine two primitives and generate shorter PSTL formulae:

Iv-D1 Combination of Always operators

If the candidate primitive of the parent node is and the candidate primitive of its child is , we construct a new PSTL primitive for their combination. For example, given and , the combined PSTL primitive is .

Iv-D2 Combination of Eventually operators

Similar to the combination of always operators, if the candidate primitive of the parent node is and the candidate primitive of its child is , we construct a new PSTL primitive .

In Fig. 2, we provide an example of how Alg. 4 works.

Fig. 2: Example of applying the conciseness method during the construction of a CDT for the naval surveillance data set. On the left, for the parent node 1, the candidate primitive with respect to the available signals is . After partitioning the signals based on , the candidate primitive for the left child (node 2) is computed as . Considering the combination of and , the conciseness method constructs the new candidate PSTL primitive as , where its optimal valuation according to the impurity measure in Sec. IV-C2 is . Due to the higher impurity reduction of compared to , the new candidate primitive is chosen as for the parent node 1 on the right. We recompute the partitioning of signals in S according to the new candidate primitive and the candidate primitives for the left and right children at nodes 2 and 3. Based on the conciseness techniques mentioned in Sec. IV-D, there is no more possibility of combining the candidate primitive of parent node 1 with its children. Therefore, we continue the procedure of CDT construction for the left and right children.

Remark: In this paper we consider the techniques mentioned above to generate shorter formulas. However, there are other ways to combine the primitives and improve interpretability and expressivity of formulas. For example, given the candidate primitive of the parent node and the candidate primitive of its child , we can construct a new PSTL primitive . We will investigate other ways of combining primitives in future work.

Iv-E Decision trees to formulas

We use the method from [3] to convert a CDT to an STL formula. The algorithm is invoked from the root, and builds the formula that captures all the branches that end in leaves marked . The BCDT method returns a set of formulas and associated weights . The STL formula is the overall output formula. However, using wSTL [19], we express (see Fig. 3).

Fig. 3: Illustration of BCDT Alg. 1. The CDTs and their weights are used in construction of the final classifier in Alg. 1, and its corresponding wSTL formula in Sec. IV-E. In this figure we have assumed .

V Case Studies

We demonstrate the effectiveness and computational advantages of our method with two case studies. The first is the naval surveillance scenario from Sec. III-A. The second is an urban-driving scenario, implemented in the simulator CARLA [8]

. We use Particle Swarm Optimization (PSO) method

[13] for solving the optimization problems in Alg. 2. The parameters of the PSO method are tuned empirically. We use in our implementations. We run the case studies on a GHz processor with GB RAM.

V-a Naval Surveillance

We compare our inference algorithm with the methods from [3] (the DTL4STL tool) and [20]. The dataset is composed of 2000 signals, with 1000 normal and 1000 anomalous trajectories. Each signal has 61 timepoints. See Fig. 4 for some example trajectories. We test our algorithm with 5-fold cross validation and maximum depth = 3 for the trees (as in [3]). The results are provided in Table. I for different values of in Alg. 1; TR-M and TR-S

are the mean and standard deviation of the MCR in the training phase, respectively; TE-M

and TE-S are the mean and standard deviation of the MCR in the test phase; R is the runtime, and m is the number of times that by applying the conciseness method during the construction of CDTs, a simpler formula is found.

With , we find a set of concise trees that are able to classify all signals correctly in the test phase. As an example formula in one of the folds, the learned wSTL formula is . By applying the heuristic idea explained in Alg. 1, the final output of the BCDT algorithm is computed as , where and .

K TR-M (%) TR-S (%) TE-M (%) TE-S (%) R m
1 0.36 0.35 0.95 0.97 11m 8s 4
2 0.34 0.21 0.55 0.33 30m 47s 14
3 0.01 0.02 0.0 0.0 33m 16s 10
4 0.05 0.1 0.1 0.12 61m 33s 29
TABLE I:
Fig. 4: Examples of trajectories from the naval surveillance case study. The green and red trajectories belong to normal and anomalous behaviors, respectively. For formula , the thresholds of the always and eventually operators are shown by solid and dashed black lines, respectively.

In [3], using first-order primitives and maximum tree depth of 3, the authors get a MCR with mean 1.3 and standard deviation 0.28 for this data set. To provide a fair comparison, we ran the algorithm from [3] on the same computer that we used for the algorithm from our paper and for the same data set. We obtained a MCR with mean and standard deviation in the test phase, with total runtime of 33 seconds. An example formula learned in one of the folds using the method from [3] is:

Compared to the method from [3], our algorithm obtains a better classification performance, in addition to simpler and more interpretable formulas. In [20], the authors obtain a MCR with mean in test phase and total runtime of 45 minutes and the formula learned in their work is . From the interpretability view, both the formulas learned by our algorithm and by [20] are simple and easy to interpret, but our algorithm has better classification performance.

V-B Urban Driving

Consider an autonomous vehicle (referred to as ego) driving in an urban environment shown in Fig. 5. The scenario also contains a pedestrian and another car, which is assumed to be driven by a ”reasonable” human who obeys traffic laws. Ego and the other car are in different, adjacent lanes, moving in the same direction. The cars move uphill in the plane of the coordinate frame, towards positive and directions, with no lateral movement in the direction. The accelerations for both cars are constant, and smaller for ego.

Fig. 5: Urban-driving scenario implemented in CARLA [8]

The positions and accelerations of the cars are initialized such the other car is always ahead of ego. The vehicles are headed towards an intersection without any traffic lights. There is an unmarked cross-walk at the end of the road before the intersection. When the pedestrian crosses the street, the other car brakes to stop before the intersection. If the pedestrian does not cross, the other car keeps moving without decreasing its velocity.

Ego does not have a clear line-of-sight to the pedestrian crossing at the intersection, because of the other car and the uphill shape of the road. The goal is to develop a method allowing ego to infer whether a pedestrian is crossing the street by observing the behavior (e.g., relative position and velocity over time) of the other car.

The simulation of this scenario ends whenever ego gets closer than 8 to the intersection. We assume that labeled behaviors (relative distances and velocities) are available, where the labels indicate whether a pedestrian is crossing or not. We collected 300 signals with 500 uniform time-samples per trace, where 150 were with and 150 without pedestrians crossing the street (see Fig. 6).

We evaluate our algorithm with 5-fold cross-validation and maximum depth = 2 for the trees. The results are shown in Table II for different values of .

K TR-M (%) TR-S (%) TE-M (%) TE-S (%) R m
1 0.0 0.0 1 1.33 7m 10s 2
2 0.0 0.0 0.67 0.82 9m 57s 2
3 0.0 0.0 0.33 0.66 14m 52s 1
4 0.0 0.0 0.0 0.0 24m 40s 3
TABLE II:

In , as an example formula in one of the folds, our algorithm learns the wSTL formula . By applying the heuristic idea from Alg. 1, the final output is computed as . The thresholds of formula are shown in Fig. 6.

Fig. 6: (a) and (b) are the y component of relative distance and relative velocity between ego and the other vehicle, respectively, over time. The green and red signals belong to the cases when there is a pedestrian and when there is no pedestrian, respectively.

To provide a fair comparison, we evaluate the performance of the algorithm from [3] on the same data set and on the same computer that is used for the algorithm developed in this paper. With first-order primitives, 5-fold cross validation and maximum depth of 2 for the trees, we obtained a mean MCR of 1 with standard deviation 1.5 in the test phase, with total runtime of 7.72 seconds. An example formula learned in one of the folds using the method from [3] is . The results show that, with our algorithm, we get simpler formulas and better classification performance than with the algorithm from [3].

Vi Conclusion

In this paper, we propose a method for two-class classification of time-series data. The algorithm, called Boosted Concise Decision Trees (BCDTs), grows an ensemble of Concise Decision Trees (CDTs), which are decision trees empowered by conciseness techniques to improve the interpretability of the formulas. We show that boosting helps with improving the miclassification performance. The classification and interpretability advantages of our algorithm are evaluated on naval surveillance and urban-driving case studies. We also compare our method with two recent algorithms from literature.

References

  • [1] E. Asarin, A. Donzé, O. Maler, and D. Nickovic (2011) Parametric identification of temporal properties. In International Conference on Runtime Verification, pp. 147–160. Cited by: §I, §I, §II.
  • [2] A. Bakhirkin, T. Ferrère, and O. Maler (2018) Efficient parametric identification for stl. In Proceedings of the 21st International Conference on Hybrid Systems: Computation and Control (part of CPS Week), pp. 177–186. Cited by: §I.
  • [3] G. Bombara and C. Belta (2021) Offline and online learning of signal temporal logic formulae using decision trees. ACM Transactions on Cyber-Physical Systems 5 (3), pp. 1–23. Cited by: §I, §III-A, §III-A, §III-A, §III-A, §IV-B, §IV-B, §IV-C1, §IV-C2, §IV-E, §V-A, §V-A, §V-A, §V-B.
  • [4] G. Bombara, C. Vasile, F. Penedo, H. Yasuoka, and C. Belta (2016) A decision tree approach to data classification using signal temporal logic. In Hybrid Systems: Computation and Control, pp. 1–10. Cited by: §I.
  • [5] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen (1984) Classification and regression trees. CRC press. Cited by: §I, §IV-A, §IV-B, §IV-C2.
  • [6] E. M. Clarke, E. A. Emerson, and A. P. Sistla (1986) Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Transactions on Programming Languages and Systems (TOPLAS) 8 (2), pp. 244–263. Cited by: §I.
  • [7] A. Donzé and O. Maler (2010) Robust satisfaction of temporal logic over real-valued signals. In International Conference on Formal Modeling and Analysis of Timed Systems, pp. 92–106. Cited by: §II.
  • [8] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun (2017) CARLA: an open urban driving simulator. preprint arXiv:1711.03938. Cited by: Fig. 5, §V.
  • [9] Y. Freund and R. E. Schapire (1997) A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 55 (1), pp. 119–139. Cited by: §IV.
  • [10] B. Hoxha, A. Dokhanchi, and G. Fainekos (2018) Mining parametric temporal logic properties in model-based design for cyber-physical systems. International Journal on Software Tools for Technology Transfer 20 (1), pp. 79–93. Cited by: §I.
  • [11] S. Jha, A. Tiwari, S. A. Seshia, T. Sahai, and N. Shankar (2019) TeLEx: learning signal temporal logic from positive examples using tightness metric. Formal Methods in System Design 54 (3), pp. 364–387. Cited by: §I, §I.
  • [12] X. Jin, A. Donzé, J. V. Deshmukh, and S. A. Seshia (2015) Mining requirements from closed-loop control models. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34 (11), pp. 1704–1717. Cited by: §I.
  • [13] J. Kennedy and R. Eberhart (1995) Particle swarm optimization. In

    International Conference on Neural Networks

    ,
    Vol. 4, pp. 1942–1948. Cited by: §V.
  • [14] A. Ketenci and E. A. Gol (2019) Synthesis of monitoring rules via data mining. In American Control Conference, pp. 1684–1689. Cited by: §I.
  • [15] Z. Kong, A. Jones, and C. Belta (2016) Temporal logics for learning and detection of anomalous behavior. IEEE Transactions on Automatic Control 62 (3), pp. 1210–1222. Cited by: §I, Fig. 1, §III-A.
  • [16] V. Y. Kulkarni and P. K. Sinha (2012)

    Pruning of random forest classifiers: a survey and future directions

    .
    In

    2012 International Conference on Data Science & Engineering (ICDSE)

    ,
    pp. 64–68. Cited by: §IV-A.
  • [17] A. Linard and J. Tumova (2020) Active learning of signal temporal logic specifications. In 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), pp. 779–785. Cited by: §I.
  • [18] O. Maler and D. Nickovic (2004) Monitoring temporal properties of continuous signals. In Formal Techniques, Modelling and Analysis of Timed and Fault-Tolerant Systems, pp. 152–166. Cited by: §I, §II.
  • [19] N. Mehdipour, C. Vasile, and C. Belta (2020) Specifying user preferences using weighted signal temporal logic. IEEE Control Systems Letters. Cited by: §I, §II, §IV-E.
  • [20] S. Mohammadinejad, J. V. Deshmukh, A. G. Puranic, M. Vazquez-Chanlatte, and A. Donzé (2020) Interpretable classification of time-series data using efficient enumerative techniques. In Proceedings of the 23rd International Conference on Hybrid Systems: Computation and Control, pp. 1–10. Cited by: §I, §V-A, §V-A.
  • [21] D. Neider and I. Gavran (2018) Learning linear temporal properties. In Formal Methods in Computer Aided Design, pp. 1–10. Cited by: §I, §I.
  • [22] B. D. Ripley (2007) Pattern recognition and neural networks. Cambridge university press. Cited by: §IV-B.
  • [23] L. Rokach and O. Maimon (2005) Top-down induction of decision trees classifiers - a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C 35 (4), pp. 476–487. External Links: Document Cited by: §IV-C2.
  • [24] S. Shalev-Shwartz and S. Ben-David (2014) Understanding machine learning: from theory to algorithms. Cambridge university press. Cited by: §IV-A.
  • [25] M. Vazquez-Chanlatte, J. V. Deshmukh, X. Jin, and S. A. Seshia (2017) Logical clustering and learning for time-series data. In International Conference on Computer Aided Verification, pp. 305–325. Cited by: §I, §I.
  • [26] Z. Xu, M. Ornik, A. A. Julius, and U. Topcu (2019) Information-guided temporal logic inference with prior knowledge. In American Control Conference, pp. 1891–1897. Cited by: §I, §I.
  • [27] R. Yan and A. Julius (2021) Neural network for weighted signal temporal logic. arXiv preprint arXiv:2104.05435. Cited by: §I.