TeSSLa: Temporal Stream-based Specification Language

08/31/2018 ∙ by Lukas Convent, et al. ∙ 0

Runtime verification is concerned with monitoring program traces. In particular, stream runtime verification (SRV) takes the program trace as input streams and incrementally derives output streams. SRV can check logical properties and compute temporal metrics and statistics from the trace. We present TeSSLa, a temporal stream-based specification language for SRV. TeSSLa supports timestamped events natively and is hence suitable for streams that are both sparse and fine-grained, which often occur in practice. We prove results on TeSSLa's expressiveness and compare different TeSSLa fragments to (timed) automata, thereby inheriting various decidability results. Finally, we present a monitor implementation and prove its correctness.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The essence of software verification is to check whether a program meets its specification. Runtime verification (RV) is an applied formal technique that has been established as a complement to traditional verification techniques such as model checking [21, 18]. Compared to static verification, RV considers only a single run of a system and checks whether it satisfies a property. Thus, RV can be seen as a lightweight, but formal extension to testing and debugging. RV can be applied offline to previously recorded traces or online to evaluate correctness properties at the runtime of the system under scrutiny. Typically, a property to be checked is specified as a logical formula, e.g. in (past time) LTL, and then synthesized to a monitor which can evaluate a run [19, 5]. Stream runtime verification (SRV) [7], as pioneered by the language LOLA [9, 14], takes a different approach by incrementally relating a set of input streams to a set of output streams. This allows not only the monitoring of correctness properties but also of quantitative measures. In this paper we introduce the novel temporal stream-based specification language TeSSLa which is tailored for SRV of cyber-physical systems, where timing is a critical issue. While traditional SRV approaches process event streams without considering timing information, TeSSLa supports timestamped events natively, which allows efficient processing of streams with sparse and fine-grained event sequences. Preliminary versions of TeSSLa have already been studied with regard to their usability to monitor trace data generated by embedded tracing units of processors [10]; how to implement stream-based monitors on hardware has been studied in theory [22] and practice [11]. These versions share the basic idea of transforming timed event streams but they did not allow for recursive equations and comprised only a set of ad-hoc operators. In this paper we define a minimal language with support for recursive definitions that allows us to obtain strong guarantees for evaluation algorithms, expressiveness results and meaningful fragments. While the practical applicability of such a language has been demonstrated by the previous papers, these papers lack a concise and clear theoretical basis and investigation. As an example for SRV, consider the following specification which checks whether a measured temperature stays within given boundaries. For every new event (measurement) on the temperature stream, new events on the derived streams low, high and unsafe are computed:

6

2

1

5

9

ff

tt

tt

ff

ff

ff

ff

ff

ff

tt

ff

tt

tt

ff

tt

temperature

low

high

unsafe

SRV is a combination of complex event processing (CEP) and traditional RV approaches: Streams are transformed into streams and there is not only one final verdict but the output is a stream of the property being evaluated at every temperature change. Furthermore, the user gets more detailed information about why an error occurred by being able to distinguish between the two separate causes low and high.

In the rest of this section we introduce the main features of TeSSLa and contrast them with related specification languages. The next section presents the language and its semantics formally, in section 3 we present several results regarding the expressiveness of TeSSLa and in section 4 we focus on comparing (fragments of) the language to variants of (timed) automata. Finally in section 5 we discuss different approaches to implement TeSSLa monitors and present our TeSSLa tool suite.

Asynchronous Streams

In the previous example of traditional SRV, every stream has an event for every step of the system. TeSSLa requires the events of all streams to be in a global order, but doesn’t require all streams to have simultaneous events. As a consequence, both sparse and high-frequency streams can be modeled. As cyber-physical systems often give rise to streams at unstable frequencies or continuous signals, this asynchronous setting is especially suitable. Consider as an example a ring buffer where the number of write accesses should not exceed the number of read accesses too much:

read

write

numReads

0

1

2

3

numWrites

0

1

2

3

4

safe

tt

ff

tt

Read and write events occur independently at different frequencies. The derived stream numReads (numWrites) counts the number of events of the input stream read (write). While the read and write streams contain only discrete events, the number of events can be seen as a piece-wise constant signal with the initial value of 0. The difference between the two signals is evaluated every time one of the two signals changes its value using the last known value of both signals. We call this concept signal semantics: TeSSLa handles internally only streams of discrete events, but one can express operators following signal semantics in TeSSla and hence these discrete events can be seen as those points in time where the signal changes its value. In these introductory examples operators are automatically lifted to signal semantics, which is formally introduced as the operator later.

Recursive Equations

Like existing SRV approaches, TeSSLa relates a set of input streams to a set of output streams via mutually recursive equations, which allows self-references to the past, e.g. counting events of a stream x as in the previous example is expressed in TeSSLa as follows: The operator outputs the last known value of the count stream, on every event of the stream x. The base of the recursion is provided by merging with 0, which is a stream with one initial event of value 0. Since only refers to events strictly lying in the past, the unique solution of such recursive equations can be computed incrementally (see section 2).

Time as First-Class Citizen

In TeSSLa, every event has a timestamp which can be accessed via the operator. Since every event has a timestamp which is referring to a global clock and is unique for its stream, accessing the timestamps of events serves two purposes: Accessing the global order of events by comparing timestamps and performing calculations with the timestamps. Consider e.g. the following specification which checks whether the lapse of time between two write events exceeds 5 time units and outputs the overtime if it does:

2

5

7

15

18

write

3

2

8

3

diff

3

error

In the example, the stream is filtered by the condition . Note that the property violation is only reported when the delayed event happens. To report such errors as soon as possible, TeSSLa has the ability to create events at certain points in time via the operator. The following specification checks the same property but raises a unit event on the error stream as soon as we know that there was no write event in time:

2

5

7

12

15

18

write

5

5

5

5

5

timeout

error

The function works as a timer, which is set to a timeout value with the first argument and reset with any event on the second argument. In the example, the function maps the values of events to the constant value of 5, which is then used as timeout value. While in all the other examples the derived streams only contain events with timestamps taken from the input streams, in this example events with additional timestamps are generated. Like , the operator can be used in recursive equations, for example the equation

produces an infinite stream with an event every 5 time units. The is used to provide a base case for the recursion and is used to map the value of the generated events to 5 so that they can be used as the new timeout value.

Efficient Parallel Evaluation

TeSSLa’s design follows two principles to allow efficient evaluation on parallel hardware: Explicit memory usage and local operator composition. If TeSSLa operates only on streams with bounded data-types of constant size, then the operators only need finite memory because every operator only needs to store at most one data value. This allows implementations on systems without random access memory, e.g. FPGAs or embedded systems. TeSSLa consists of a small set of primitive operators which can be flexibly combined. The TeSSLa semantics is defined in a way that allows a local composition of the individual operators, which can be realized via message passing without the need for global synchronization. Because of an explicit notion of progress for every stream describing how far the stream is known, local message passing is also sufficient to compute solutions for the recursive TeSSLa equations. Implementing an efficient evaluation on FPGAs is part of our EU research project COEMS111https://www.coems.eu.

1.0.1 Related Work and Comparison

LOLA [9, 14] is a synchronous stream specification language in the following sense: Events arrive in discrete steps and for every step, all input streams provide an event and all output streams produce an event, which means that it is not suitable for handling events with arbitrary real-time timestamps arriving at variable frequencies. The not yet formally published RTLola [15] is an extension of LOLA which introduces asynchronous streams to perform aggregations over real-time intervals. A major difference between RTLola and TeSSLa is that RTLola focuses on splitting input streams and aggregating over them, whereas TeSSLa provides a more general framework that in particular allows the (recursive) definition of aggregation operators while giving strict memory guarantees at the same time. Focus [8] is a formalism for the specification of stream-based systems. Their timed streams progress by discrete ticks that separate events inbetween, thereby allowing multiple events at the same timestamp. The synchronous stream programming languages Lustre [17], Esterel [6] and Signal [16], the stream specification language Copilot [24] as well as the class of functional reactive programming (FRP) languages [13] allow the description of the transformation in a linear style, i.e. an input stream is read chronologically and is thereby evaluated. TeSSLa also supports linear evaluation because there are no future-references and the number of past-references is limited by the specification size. The only complement to linear evaluation is the creation of additional events via the operator. Quantitative regular expressions (QREs) [2] and logics like Signal Temporal Logic (STL) [23] and Time-Frequency Logic (TFL) [12] allow the mapping from complete streams to one final verdict/quantity. They cannot generally be evaluated in a linear way. The idea used in TeSSLa of supporting signals and event streams has also been used for Timed Regular Expressions [4], but those have two explicitly different stream types, where TeSSLa internally represents signals as event streams. Recently, synthesis of hardware-based monitors from stream specifications has become an important field: For LOLA [9] constant memory bounds for an algorithm that evaluates well-formed specifications exist and for LOLA 2.0 [14] future references must be eliminated to gain constant memory bounds. There has been work on synthesis of STL to FPGAs in different ways as well [20, 25].

2 Formal Definition of the TeSSLa Core Language

In this section we introduce syntax and semantics of the minimal core of TeSSLa. In examples we use parametrized definitions, e.g on top, which are expanded to their definitions until only core operators remain.

Preliminaries

Given a partial order , a set is called directed if . is called directed-complete partial order (dcpo) if there exists a supremum for every directed subset . Let be a function and , partial orders. is called monotonic if it preserves the order, i.e. . is called continuous if it preserves the supremum, i.e. for all directed subsets . By the Kleene fixed-point theorem, every monotonic and continuous function has a least fixed point if is a dcpo with a least element . is the least upper bound of the chain iterating starting with the bottom element: .

Syntax

A TeSSLa specification consists of a set of possibly mutually recursive stream definitions defined over a finite set of variables where an equation has the form with and

All variables not occuring on the left-hand side of equations are input variables. All variables on the left-hand side are output variables. We call a TeSSLa specification flat if it does not contain any nested expressions. Every specification can be represented as a flat specification by using additional variables and equations.

Semantics

We define the semantics of TeSSLa in terms of an abstract time domain which only requires a total order and corresponding arithmetic operators:

Definition 1.

A time domain is a totally ordered semi-ring that is not negative, i.e. .

We extend the order on time domains to the set with .

Conceptually, streams are timed words that are known inclusively or exclusively up to a certain timestamp, its progress, that might be infinite. A stream might contain an infinite number of events even if its progress is finite.

Definition 2.

An event stream over a time domain and a data domain is a finite or infinite sequence where for all with ( is for infinite streams). The prefix relation over is the least relation that satisfies , if and if , , and .

We say a stream has an event with value at time if in its sequence directly follows . We say a stream is known at time if it contains a strictly larger timestamp or a non-strictly larger timestamp followed by a data value or . Where convenient, we also see streams as functions such that if the stream has value at time , if it is known to have no value, and otherwise. We refer to the supremum of all known timestamps of a stream as inclusive or exclusive progress, depending on whether it is itself a known timestamp. The prefix relation realises the intuition of cutting a stream at a certain point in time while keeping or removing the cutting point.

In the following, we present the denotation of a specification as a function between input streams and output streams.

Definition 3 (TeSSLa semantics).

Given a specification of equations , every can be interpreted as a function of input streams and output streams , that is composed of the primitive functions whose denotation is given in the rest of this section. Input variables are mapped to input streams, and output variables to output streams, . Thus for fixed input streams and every , we obtain a function and in combination a function . We now define the denotation of a specification as the least fixed-point of this function.

The function is monotonic and continuous because all primitive TeSSLa functions defined later in this section are monotonic and continuous and both properties are closed under function composition and cartesian products. and by extension are dcpos. By the Kleene fixed-point theorem has a least fixed point, which is the least upper bound of its Kleene chain.

Next we give the semantics of the primitive TeSSLa functions. The dependency of the input and output streams is assumed implicitly.

Definition 4.

Nil is a constant for the completely known stream without any events: .

We use the unit type for streams that can carry only the single value .

Definition 5.

Unit is a constant for the completely known stream with a single unit event at timestamp zero:

The following functions are given by specifying two conditions: the first for positions where an output event occurs, and the second where no output event occurs. Thereby the progress of the stream is defined indirectly as the position where the output can no longer be inferred from these conditions.

Definition 6.

The time operator returns the stream of the timestamps of another stream where is defined as such that

The lift operator lifts an -ary function from values to streams. The notation denotes the set of functions where all and have been extended by the value .

Definition 7.

Unary lift is defined as where is given by such that

Definition 8.

Binary lift is given as where is given by s.t.

where .

The binary lift can naturally be extended to an -ary lift by recursively combining two streams into a stream of tuples or partially applied functions until the final result is obtained (see Appendix 0.A.1). Alternatively, the scheme of the binary lift can be easily extended to higher arities.

Example 1.

Merge combines events of two streams, prioritising the first one.

Example 2.

Const maps the values of all events of the input stream to a constant value: with . Using we can lift constants into streams representing a constant signal with this value, e.g. or .

Definition 9.

The last operator takes two streams and returns the previous value of the first stream at the timestamps of the second. It is defined as where is given as such that

where and .

Note that while TeSSLa is defined on event streams, last realizes some essential aspects of the signal semantics: With this operator one can query the last known value of an event stream at a specific time and hence interpret the events on this stream as points where a piece-wise constant signal changes its value.

Example 3.

By combining the and the operators, we can now realize the signal lift semantics implicitly used in the introduction:
with

Example 4.

In order to filter an event stream with a dynamic condition, we apply the last known filter condition to the current event:

tt

ff

tt

tt

ff

ff

Definition 10.

The delay operator takes delays as its first argument. After a delay has passed, a unit event is emitted. A delay can only be set if a reset event is received via the second argument, or if an event is emitted on the output. Formally, where is given as such that

where , , and .

In many applications the delay operator is used in simplified versions: In the first example of the introduction that uses the delay operator, the delay and the reset argument can be the same because the delay is used only in non-recursive equations and every new delay is a reset, too. If a periodical event pattern is generated independently from input events then the second argument can be set to unit because only an initial reset event is needed. The full complexity of the delay operator is only needed if the delay is used in recursive equations with input dependencies and ensures that the fixed-point is unique.

We can observe that all basic functions are monotonic and continuous. From the fact, that these properties are closed under composition and the smallest fixed-point is determined by the Kleene chain, we can therefore conclude:

Proposition 1.

The semantics of a TeSSLa specification is monotonic and continuous in the input streams.

In other words, the semantics will provide an extended result for an extended input and is therefore suited for online monitoring.

We can further observe that the pre-fixed-points on the Kleene chain have the following property: the progress only increases a finite number of times until a further event has to be appended. This is due to the basic functions that do handle progress in this way. We therefore obtain:

Theorem 2.1.

For a specification every finite prefix of can be computed assuming all lifted functions are computable. Assuming they are computable in steps, the prefix can be computed in steps where is the number of events over all involved streams.

Note that in case the specification contains no output streams cannot contain any such timestamps that did not occur already in the inputs. Further note, that fixed-points might contain infinitely many positions with data values (in case of ) and we can thus only compute prefixes. A respective monitor would exhibit infinite outputs even for finite inputs.

Due to 1 we can reuse a previously computed fixed-point if new input events occur and hence also compute the outputs incrementally.

2.0.1 Well-formedness

While the least fixed-point is unique it does not have to be the only fixed-point. In that case, the least fixed-point is often the stream with progress or some other stream with too little progress and one would be interested in (one of) the maximal fixed-points. Since the largest fixed-points would be more difficult to compute, especially in the setting of online monitoring, we define a fragment for which a unique fixed-point exists.

Definition 11.

We call a TeSSLa specification well-formed if every cycle of the dependency graph (of the flattened specification) contains at least one delayed-labelled edge. The dependency graph of a flat TeSSLa specification of equations is the directed multi-graph of nodes . For every the graph contains the edge iff is used in . We label edges corresponding to the first argument of or with delayed.

Theorem 2.2.

Given a well-formed specification of equations and input streams then is the only fixed-point.

Proof.

From the Kleene fixed-point theorem we know
. Because is well-formed, every is either constant or contains at least one or . The input streams limit progress, i.e. the maximal timestamp produced, of . The progress strictly increases with every step of the iteration of in the Kleene chain until the limit given by the input streams is reached. Every other fixed-point of must be an extension of the least fixed-point, but the least fixed-point has already the maximal progress permitted by the input streams. ∎

3 Expressiveness of TeSSLa

We discuss the expressiveness of four different TeSSLa fragments: TeSSLa specifications without the delay operator can only produce events with timestamps which are already included in the input streams and TeSSLa specifications with the delay operator can produce arbitrary event patterns even without any input event. On the other hand we distinguish between TeSSLa specifications which use only bounded data structures, which can only consider finitely many past events, and those with unbounded data structures which can consider infinitely many past events in the computation of new events.

To characterize functions which can be expressed in TeSSLa we define timestamp conservatism and future independence in addition to monotonicity and continuity. For a stream we denote with the set of timestamps present in the stream and for multiple streams .

Definition 12 (Timestamp Conservatism).

We call a function on streams timestamp conservative iff it does not introduce new timestamps, i.e. for input streams and output streams we have implies .

Note that TeSSLa specifications without delay are timestamp conservative because only delay can introduce new timestamps.

For a stream we denote with the prefix of with progress .

Definition 13 (Future Independence).

We call a function on streams future independent iff output events only depend on current or previous events, i.e. for input streams and output streams we have implies .

Note that every TeSSLa specification is future independent because the operators and are the only operators referring to events with different timestamps and they refer only to previous events. The omitted proofs of the following theorems can be found in Appendix 0.A.2.

Theorem 3.1 (Expressiveness of TeSSLa Without Delay).

Every function on streams can be represented as a TeSSLa specification without delay iff it is a) monotonic and continuous, b) timestamp conservative and c) future independent.

Proof Sketch.

Represent the function as the iterative function taking a memory state , the current input values , and the corresponding current timestamp and returning the new memory state . Output events for all output streams can be derived from . Because is monotonic it is sufficient to compute the output events step by step; because is future independent it is sufficient to allow to store arbitrary information about the past events; and because is timestamp conservative it is sufficient to execute for every timestamp in the input events. Translate into an equivalent TeSSLa specification: .

If all data types in the TeSSLa specification are bounded, uses a finite memory cell , which can only store a constant number of current and previous events. Monotonicity guarantees that we can compute output events incrementally and by future independence we know that knowledge about the previous events is sufficient to derive new events. From the combination of both properties we know that it is not necessary to queue (arbitrarily large) event sequences to compute the output events. Instead one memory cell (capable of storing one element of the data domain) per delay and per last operator in the specification is sufficient. Restricting TeSSLa to bounded data types allows TeSSLa implementations on embedded systems without addressable memory because then finite memory is sufficient. Such a restricted TeSSLa specification can compute new events only based on a finite number of current and previous events.

Theorem 3.2 (Expressiveness of TeSSLa With Delay).

Every function can be represented as a TeSSLa specification with delay iff it is a) monotonic and continuous and b) future independent.

The proof accompanies the step-function with a timeout function which is evaluated on every new memory state. returns the timestamp of the next evaluation of , which allows arbitrary event generation. The effect of can be realized using the delay operator.

We call a stream Zeno if it contains two timestamps and with infinitely many events between and . With the delay operator it is possible to construct such Zeno streams because the timeout function is not restricted in any way. By Rice’s theorem it is impossible to check for an arbitrary timeout function whether it only generates non-Zeno timestamp sequences. Hence, one would need to restrict allowed timeout functions more drastically, which would restrict the possible event sequences generated by a TeSSLa specification further than necessary. For that reason we decided to include the capability to generate Zeno streams with TeSSLa. As a consequence of Theorem 0.A.4 we obtain:

Corollary 1.

A TeSSLa specification with multiple delays can be translated into an equivalent specification with only one delay.

TeSSLa with and without delay are closely related because TeSSLa without delay can verify the relation of given input/output streams with respect to a TeSSLa specification that uses delay. The delay is only needed to actively generate the events at specified times. In the following we denote with the boolean function indicating whether the boolean output stream of the TeSSLa specification contains only events with value true for the input streams .

Theorem 3.3 (Delay Elimination).

For every TeSSLa specification with with delay operators there exists a TeSSLa specification without delay operators, which derives a boolean stream , s.t. for any input streams and output streams we have iff .

The above theorem follows from Theorem 0.A.3 and the fact that is timestamp conservative, because the output stream only contain events when any input stream contains an event. See Appendix 0.A.2.1 for a constructive proof of a slightly weaker lemma.

4 TeSSLa Fragments and Transducers

In this section we investigate two TeSSLa fragments related to deterministic Büchi automata and timed automata, resp. We translate TeSSLa specifications to transducers, which can be seen as automata taking the in- and output of the corresponding transducer as input word. Thus by relating TeSSLa fragments to certain transducer classes, we inherit complexity and expressiveness results from the well-known automata models.

4.0.1 Boolean Fragment

The fragment TeSSLa restricts TeSSLa to boolean streams and the operators last, lift and slift with on timestamps. In the syntax expressions are restricted as follows, where is a function :

Note that since one can only compare timestamps, for a TeSSLa-formula and two tuples of input streams we have iff all events in carry the same values in the same order as those in , independent from the exact timestamps of the events.

A deterministic finite state transducer (DFST) is a 5-tuple with input alphabet , output alphabet , state set , initial state and transition function . For an input word we call a sequence a run of a DFST with output iff and for all . To show that TeSSLa and DFSTs have the same expressiveness, we encode DFST words as TeSSLa streams and vice versa. The function encodes a DFST word as a corresponding set of TeSSLa streams: For every a stream exists with . The function encodes TeSSLa streams as a synchronized DFST word over the alphabet with which maps stream names to their current values: Let be the set of all timestamps present in the streams including with . Then if has exclusive progress of , if has inclusive progress of or otherwise.

Theorem 4.1.

For a DFST there is a TeSSLa formula and for a TeSSLa formula there is a DFST s.t.

Note that since the boolean transducers produce one output symbol per input symbol one could reattach the timestamps of the input streams to the output streams to preserve the exact timestamps, too.

Translating DFST to TeSSLa

We represent the states as stream which is true iff the transducer is in it: and the initial state , where . For every transition we add and . The merge of all the output streams is the output: .

Translating TeSSLa to DFST

We translate every equation of the flattened specification into individual DFSTs, which are then composed into one DFST . For every DFST the input symbols are functions from the names of the input streams to and the output symbols are functions from the name of the equation to . As discussed in the previous section, for this finite data domain we only need to consider finitely many different internal states for every equation. The transition function realizes the state changes the current output based on the current state. See Appendix 0.A.3 for details on these transducers.

For the composition of the individual DFSTs every two and are then composed parallel int