A kernel function for Signal Temporal Logic formulae

09/11/2020 ∙ by Luca Bortolussi, et al. ∙ 0

We discuss how to define a kernel for Signal Temporal Logic (STL) formulae. Such a kernel allows us to embed the space of formulae into a Hilbert space, and opens up the use of kernel-based machine learning algorithms in the context of STL. We show an application of this idea to a regression problem in formula space for probabilistic models.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Signal Temporal Logic (STL) [Maler2004] is gaining momentum as a requirement specification language for complex systems and, in particular, Cyber-Physical Systems  [bartocci2018specification]. STL has been applied in several flavours, from Runtime-monitoring [bartocci2018specification] to control synthesis [HaghighiMBB19] and falsification problems [FainekosH019], and recently also within learning algorithms, trying to find a maximally discriminating formula between sets of trajectories [bombara_decision_2016, BBS14, NenziSBB18]. In these applications, a central role is played by the real-valued quantitative semantics [donze2013efficient], measuring robustness of satisfaction.

Most of the applications of STL have been applied to deterministic (hybrid) systems, with less emphasis on non-deterministic or stochastic ones [BartocciBNS15]

. Another area in which formal methods are providing interesting tools is in logic-based distances between models, like bisimulation metrics for Markov models 

[BacciBLM16], which are typically based on a branching logic. In fact, extending these ideas to linear time logic is hard [jan2016linear], and typically requires statistical approximations. Finally, another relevant problem is how to measure the distance between two logic formulae, thus giving a metric structure to the formula space, a task relevant for learning which received little attention for STL, with the notable exception of [madsen2018metrics].

In this work, we tackle the metric, learning, and model distance problems from a different perspective than the classical one, which is based on some form of comparison of the languages of formulae. The starting point is to consider an STL formula as a function mapping a real-valued trajectory (signal) into a number or into another trajectory. As signals are functions, STL formulae should be properly considered as functionals, in the sense of Functional Analysis (FA) [brezis2010functional]. This point of view gives us a large bag of FA tools to manipulate formulae. What we explore here is the definition of a suitable inner product in the form of a kernel [shawe2004kernel] between STL formulae, capable of capturing the notion of semantic similarity of two formulae. This will endow the space of formulae with the structure of a Hilbert space, defining a metric from the inner product. Moreover, having a kernel opens the use of kernel-based machine learning techniques [rasmussen:williams:2006].

A crucial aspect is that kernels for functionals are typically defined by integrating over the support space, with respect to a given measure. However, in trajectory space, there is no canonical measure (unless one discretizes time and maps signals to

), which introduces a degree of freedom on which measure to use. We decide to work with probability measures on trajectories, i.e. stochastic processes, and we build one that favours “simple” trajectories, with a small total variation. This encodes the idea that two formulae differing on simple signals should have a larger distance than two formulae differing only on complex trajectories. As we will see in the experiments, this choice allows the effective use of this kernel to perform regression on the formula space for approximating the satisfaction probability and the expected robustness of several stochastic processes, different than the one used to build the kernel.

2 Background

Signal Temporal Logic.

(STL) [Maler2004] is a linear time temporal logic suitable to monitor properties of continuous trajectories. A trajectory is a function with a time domain in and the state space. We define the trajectory space as the set of all possible continuous functions over . The syntax of STL is:

where is the Boolean true constant, is an atomic predicate, negation and conjunction are the standard Boolean connectives and is the until operator, with and . As customary, we can derive the disjunction operator and the future eventually and always operators from the until temporal modality. The logic has two semantics: a Boolean semantics, , with the meaning that the trajectory satisfies the formula and a quantitative semantics, , that can be used to measure the quantitative level of satisfaction of a formula for a given trajectory. The function is also called the robustness function. The robustness is compatible with the Boolean semantics since it satisfies the soundness property: if then ; if then . Furthermore it satisfies also the correctness property, which shows that measures how robust is the satisfaction of a trajectory with respect to perturbations. We refer the reader to [donze2013efficient] for more details.

Given a stochastic process , where is a trajectory space and is a probability measure on a -algebra of , we define the expected robustness as . The qualitative counterpart of the expected robustness is the satisfaction probability , i.e. the probability that a trajectory generated by the stochastic process satisfies the formula at the time : where if and otherwise. The satisfaction probability is the probability that a trajectory generated by the stochastic process satisfies the formula at the time .

Kernel Functions.

A kernel , , defines an integral linear operator on functions , which intuitively can be thought of as a scalar product on a possibly infinite feature space : , with

being the eigenfunctions of the linear operator, spanning a Hilbert space, see 

[rasmussen:williams:2006]. Knowledge of the kernel allows us to perform approximation and learning tasks over without explicitly constructing it.

One application is kernel regression, with the goal of estimating the function

, from a finite amount of observations , where each observation has an associated response , and is the training set. There exist several methods that address this problem exploiting the kernel function has a similarity measure between a generic and the observations of the training set. In the experiments, we compare different regression models used to compute the expected robustness and the probability satisfaction.

3 A kernel for Signal Temporal Logic

If we endow an arbitrary space with a kernel function, we can apply different kinds of regression methods. Even for a non-metric space such as the STL formulae one, with a kernel we could perform operations that are very expensive, such as the estimation of the satisfaction probability and the expected robustness for a stochastic model of any formula , without running additional simulations. The idea behind our definition is to exploit the robustness to project an STL formula to a Hilbert space, and then to compute the scalar product in that space. In fact, the more similar the two projections will be, and the higher the scalar product will result. In addition, the function that we will define will be a kernel by construction.

3.1 STL kernel

Let us fix a formula in the STL formulae space. Consider the robustness , is a bounded interval, and is the trajectory space of continuous functions. We observe that there is a map defined by . With we denote the set of the continuous functions on the topological space . It can be proved that and hence we can use the dot product in as a kernel for . Formally,

Theorem 1.

Given the STL formulae space , the trajectory space , a bounded interval , let defined by , then:


For proving the theorem, we need to recall the definition of and its inner product.

Definition 1.

Given a measure space , we call Lebesgue space the space defined by

where is a norm defined by

We define the function as

It can be easily proved that is a inner product.

Furthermore, we have the following result.

Proposition 1 ([kothe1983topological]).

with the inner product is a Hilbert space.

Proof Theorem 1.

In order to satisfy (1) we can make the hypothesis that is a bounded (in the norm) subset of , with a bounded interval, which means that exists such that for all . Moreover, the measure on is a distribution, and so it is a finite measure. Hence

for each and is the maximum absolute value of an atomic proposition of . This implies . ∎

We can now use the dot product in as a kernel for . In such a way, we will obtain a kernel that returns a high positive value for formulae that agree on high-probability trajectories and high negative values for formulae that, on average, disagree.

Definition 2.

Fixing a probability measure on , we can then define the STL-kernel as:

Since the function satisfies the finitely positive semi-definite property, we can be proved that is a kernel itself.

Proposition 2.

Kernel matrices are positive semi-definite.


Let us consider the general case, that is for . Let us consider

which implies that is positive semi-definite. ∎

Theorem 2 (Characterization of kernels).

A function which is either continuous or has a finite domain, can be written as

where is a feature map into a Hilbert space , if and only if it satisfies the finitely positive semi-definite property.


Firstly, let us observe that if , then it satisfies the finitely positive semi-definite property for the Proposition 2. The difficult part to prove is the other implication.

Let us suppose that satisfies the finitely positive semi-definite property. We will construct the Hilbert space as a function space. We recall that

is a Hilbert space if it is a vector space with an inner product that induces a norm that makes the space complete.

Let us consider the function space

The sum in this space is defined as

which is clearly a close operation. The multiplication by a scalar is a close operation too. Hence, is a vector space.

We define the inner product in as follows. Let be defined by

so the inner product is defined as

where the last two equations follows from the definition of and . This map is clearly symmetric and bilinear. So, in order to be an inner product, it suffices to prove

and that

If we define the vector we obtain

where is the kernel matrix constructed over and the last equality holds because satisfies the finite positive semi-definite property.

It is worth to notice that this inner product satisfies the property

This property is called reproducing property of the kernel.

From this property it follows also that, if then

applying the Cauchy-Schwarz inequality and the definition of the norm. The other side of the implication, i.e.

follows directly from the definition of the inner product.

It remains to show the completeness property. Actually, we will not show that is complete, but we will use to construct the space of the enunciate. Let us fix and consider a Cauchy sequence . Using the reproducing property we obtain

where we applied the Cauchy-Schwarz inequality. So, for the completeness of , has a limit, that we call . Hence we define as the punctual limit of and we define as the space obtained by the union of and the limit of all the Cauchy sequence in , i.e.

which is the closure of . Moreover, the inner product in extends naturally in an inner product in which satisfies all the desired properties.

In order to complete the proof we have to define a map such that

The map that we are looking for is . In fact

One desirable property of our kernel is that In fact, given a formula , no formula should be more similar to then itself. This property can be enforced by redefining the kernel as follows:

3.2 The base measure

In order to make our kernel meaningful and not too expensive to compute, we endow the trajectory space

with a probability distribution such that more complex trajectories are less probable. We use the total variation 

[pallara2000functions] of a trajectory and the number of changes in its monotonicity as indicators of its "complexity". We define the probability measure by providing an algorithm sampling from piece-wise linear functions, a dense subset of , that we use for Monte Carlo approximation of .

Before describing the algorithm, we need the definition of Total Variation of a function [pallara2000functions].

Definition 3 (Total Variation).

Let . We call Total Variation of on a finite interval the quantity


where is the set of all partitions of the interval .

We use the total variation of a trajectory as an indicator of its "complexity". We also take the number of changes in the monotonicity behavior of a trajectory as another indicator of "complexity". The idea is to endow with a probability distribution such that more complex trajectories are less likely to be drawn. We describe a sampling algorithm over piecewise linear functions that we use for Monte Carlo approximation. In doing so, we sample from a dense subset of .

The sampling algorithm is the following:

  1. Set a discretization step ; define and ;

  2. Sample a starting point and set ;

  3. Sample , that will be the total variation of ;

  4. Sample points and set and ;

  5. Order and rename them such that ;

  6. Samle ;

  7. Set iteratively with ,
    and , for .

Finally, we can linearly interpolate between consecutive points of the discretization and make the trajectory continuous, i.e.,


Figure 1: Trajectories randomly sampled from using the above descripted algorithm.

For our implementation, we fixed the above parameters as follows:

  • ,

  • ,

  • ,

  • ,

  • ,

  • .

In the next section, we show that using this simple measure still allows us to make predictions with remarkable accuracy for other stochastic processes on .

4 Experimental Results

4.1 Kernel Regression on

To show the goodness of our kernel definition, we use it to predict the expected robustness and the satisfaction probability of STL formulae w.r.t. the stochastic process defined on . We use a training set composed of 400 formulae sampled randomly according to a syntax tree random growing scheme as follows:

  1. Sample the number of atomic predicates

  2. Sample

  3. Create a set formulae

    where is the atomic predicate ;

  4. With of probability select the operator ; with the remaining sample an operator

  5. Randomly sample 1 or 2 formulae from (depending if is an unary or a binary operator) and apply to it or them, obtaining the formula ;

  6. Remove the formula/formulae sampled at step (5) from and add to ;

  7. If has more than one element, repeat from step (4), otherwise continue to step (8);

  8. The output formula is, with of probability, the last formula of ; for the other of probability, sample an operator

    and apply it to the last formula of : the resulting formula is the output of the algorithm.

Then, we approximate expected robustness and satisfaction probability using a set of 100 000 trajectories sampled according to . We compare the following regression models: Nadaraya-Watson estimator, K-Nearest Neighbors regression, Support Vector Regression (SVR) and

Kernel Ridge Regression

(KRR) [murphy2012machine]. We obtain the lowest Mean Squared Error (MSE) on expected robustness, equal to , using an SVR with a Gaussian kernel and . On the other hand, the best performances in predicting the satisfaction probability were given by the KRR, with an MSE equal to .

Kernel Regression on other stochastic processes

The last aspect that we investigate is whether the definition of our kernel w.r.t. the fixed measure can be used for making predictions of the average robustness also for other stochastic processes, i.e., while taking expectations w.r.t. other probability measures on . We compare this with a kernel defined w.r.t itself. We used three different stochastic models: Immigration, Isomerization and Polymerase, simulated using the Python library StochPy [maarleveld2013stochpy], Figure 2.

Figure 2: From left to right, trajectories generated by the Immigration model, by the Isomerization model and by the Polymerase model.

As it can be seen from Figure 3 (left), our base kernel is the best performing one. This can be explained by the fact that the measure is broad in terms of coverage of the trajectory space, meaning that different kinds of behaviours tend to be covered. This allows to better distinguish among STL formulae, compared to models that tend to focus the probability mass on narrower regions of , such as the Isomerization model (which has the worst performance). Also in this case, we obtained the best results using SVR and KRR. However, given the sparseness of SVR, it’s more convenient to use it, since we need to evaluate a lower number of kernels to perform the regression. Interestingly, the minimum MSE is obtained using the Gaussian kernel with exactly the same parameter as for the regression task on

, hinting for some intrinsic robustness to hyperparameter’s choice that has to be investigated in greater detail. In Figure

3 (right) we show the predictions for the expected robustness over the three stochastic models that we took as examples, using the best regression model that we have found so far, which is the SVR with a Gaussian kernel having . Note that to compute the kernel by Monte Carlo approximation, we have to sample only once the required trajectories for . We also need to estimate the expected robustness transition probability for the formulae comprising the training set. However, kernel regression permits us to avoid further simulations of the model for novel formulae .

Figure 3: (left) MSE as a function of the bandwidth of the Gaussian kernel, for the prediction of the expected robustness. We compare the performances on different stochastic models, using both the kernel evaluated according to the base measure (base kernel), and a custom kernel computed using the trajectories generated by the stochastic model itself. Both the axis are in logarithmic scale. (center, right) predictions of the expected robustness for formulae and , over three different trajectory spaces. The predictions are made using SVR on a Gaussian kernel, with the best performing bandwidth , which is .

5 Conclusions

We defined a kernel for STL, fixing a base measure over trajectories, and we showed that we can use exactly the same kernel across different stochastic models for computing a very precise approximation of the expected robustness of new formulae, with only the knowledge of the expected robustness of a fixed set of training formulae. Our STL-kernel, however, can also be used for other tasks. For instance, computing STL-based distances among stochastic models, resorting to a dual kernel construction, and building non-linear embeddings of formulae into finite dimensional real spaces with kernel-PCA techniques. Another direction for future work is to refine the quantitative semantics in such a way that equivalent formulae have the same robustness, e.g. using ideas like in [madsen2018metrics].