2.1 A Static Model of Programs
Consider a function implemented in code. Now, let be the set of program variables (including functions defined/called) in the code for the function . Further, let a symbolic state, a representation of program state without runtime information, be a total function . Thus, a symbolic state determines whether a program variable has been changed or whether a function has been called, but contains no information about the value since it cannot be assumed that this is computable statically. Now the notion of a symbolic state has been introduced, the Symbolic Control Flow Graph of the code in the function is given in Definition 2.1.1.
Definition 2.1.1 (Symbolic Control Flow Graph).
A Symbolic Control Flow Graph of a function is a directed graph which allows cycles. and are such that
is a finite sequence of symbolic states induced by state changing instructions in the code for .
Denote by the natural number such that (where is the
element of the vector), ie, the index in at which is found. Also, denote by the length of the sequence .
is a set of edges representing instructions that induce changes in state. In particular, is an edge from to augmented with a branching condition (possibly empty) that asserts that holds between the symbolic states and , and that the computation required to move from symbolic state to is a statement of type for every .
is the position in of the symbolic state from which the control flow starts.
is the set of indices in of final symbolic states where, for every , there is no such that (if a symbolic state is final, it cannot be moved to another final symbolic state).
Given an edge , is a type of the edge. Further, given , a path of length (write ) through is defined by a finite sequence of symbolic states where, for each (), there is an edge . A complete path is a path such that and .
is normally the empty symbolic state, denoted by . For , for every , .
A permuted form of , given the appropriately modified edge set , would define an isomorphic graph (the isomorphism would be a permutation of natural numbers111A permutation of a set is a bijective map from the set to itself.); the use of a sequence for is simply to have a natural index to refer to each state and, more significantly, to allow multiple occurrences of the same symbolic state.
Symbolic Control Flow Graphs are necessarily finite () because programs are finite representations of algorithms.
Construction of a Symbolic Control Flow Graph is outlined in the form of a set of rules. Suppose the current symbolic state is (hence, a vertex in some SCFG). Then:
Assignments x = a for some program variable and some expression induce a new state with 222 is the standard map modification notation, denoting the map that agrees with on all program variables except , whose value is now changed.. Assignments therefore also induce edges .
Function calls g(…) for some function (the internals of which are not assumed to be known and are currently of no interest) induce a new state with for all . Notice the inclusion of all program variables being changed; this is because one cannot make any assumptions about the purity333A function is pure if no variables outside the local scope of that function are affected by its execution. of the function . Function calls therefore also induce edges .
Conditionals induce multiple new states for blocks by considering each block individually: the change in state induced by the first instruction in block induces the new state . Naturally, the block reached by the conditional’s test condition induces an edge ; the blocks reached by the alternative conditions for induce edges
and the else-block (the ) induces an edge
Each for is the type of the statement that induces the symbolic state .
For and while-loops induce new states by applying the rules above to the code in their body, then taking the first and last symbolic states in the body ( and respectively) and inducing edges, based on the loop condition , and . Here, is the type of the statement that induces the symbolic state .
In the case of the instruction x = e(g(…)) where denotes an arbitrary expression (hence, an assignment and a function call in one instruction), the rules can simply be combined, where all program variables must be considered and is considered as called. Additionally, processing a conditional leads to multiple loose-end states (one for each branch), so the next instruction (if there is one) must be processed with respect to each symbolic state that is still a loose-end. Finally, for a symbolic state that indicates that the variable has changed/been called, the next symbolic state after that which indicates that a program variable has been changed/called that is not must map to unchanged.
Figure 2.1.1 shows an example SCFG computed from a simple program. The program, in this case, has branching based on the condition i == j; the edges labelled […] (shortened to save space) denote the conditions that must hold for the control flow to follow the respective paths.
2.1.1 Similarities with Symbolic Execution Trees
The symbolic execution tree [2, 3] used in the symbolic execution literature draws similarities with the SCFG approach in that it models a program without any runtime information. A key difference is that symbolic execution trees associate both symbolic states and the instructions that compute the symbolic states with vertices, using edges only for branching. The SCFG described in Section 2.1, however, only associates symbolic states with vertices, and associates the instructions whose execution computes those symbolic states with the preceding edges (that is, the edges whose destination vertex is the symbolic state computed).
Additionally, despite not using any runtime information, symbolic execution trees encode information about runs of a program and so contain distinct branches that never converge after divergence has been forced by, say, a conditional. In particular, paths through a symbolic execution tree may be uniquely identified by the sequence of conditions that are true along them. The symbolic control flow graph, on the other hand, allows convergence since it can be regarded as an augmented control flow graph.
2.2 A Model of Program Runs
In order to define a representation of a given program run based on a symbolic control flow graph (see Section 2.1), the set of critical symbols is first defined as the set containing names of program variables and functions, hence , for the function implemented in code. Further, a concrete state is a total function
where Val is the finite444Using the intuition that physical machines have finite memory. set of values that can occur in a program. The additional values undefined and not called are added to the codomain to model a variable that is currently undefined, and to model a function that was not called, respectively. Further, a concrete state models a function call result by storing the return value of the call; if a function has not been called, the value mapped to is not called. Denote by the symbolic state that generates at runtime.
Now, consider a run of the function as a complete path (a path starting at the start state and ending at an end state) through , but with the symbolic states replaced by concrete states holding runtime information. Definition 2.2.1 details the structure of such a run (modelled as a discrete-time dynamical system) with respect to .
Definition 2.2.1 (A Run of as a Discrete-time Dynamical System).
A discrete-time dynamical system constructed from using the set of critical symbols is a tuple with:
a finite sequence of the form for concrete states such that
is derived from some in , hence ;
For and , there is a path from to in ; and
For some , either
555If a state changes a value, this can be either because of direct assignment, or as a result of a call to an impure function, hence no restriction is placed on the type of the incident edge. or
if , then there is an edge of type call from to in .
Denote by the length of the vector and denote by the natural number such that (where is the element of the vector ).
, the clock function, giving the time elapsed in the system when the concrete state at position in is reached. In addition, (different concrete states cannot be attained at the same time) and (time must move forward).
With a discrete time dynamical system defined, a natural next step is, for each concrete state in the vector of , to extend the incident edges of to being a part of the runtime. Edges in a symbolic control flow graph can have no notion of time since they are statically computed; lifting the notion of an edge to a part of the runtime ultimately allows one to consider timing constraints. This is done by Definition 2.2.2.
Definition 2.2.2 (Transition).
A transition represents the computation required to move from the concrete state to in . Since the edges in the path from to in represent instructions, the transition represents the execution of the sequence of instructions in at runtime.
For a transition , denote by the start time of the transition where . Hence, the start time of a transition is the time at which the state immediately before it is attained.
From now on, when or is written, it is done with the understanding that the timestamp for its occurrence is also available, meaning the pairs and are unique666Without timestamp information, concrete states are simply total functions, and these can be isomorphic; timestamp information breaks the isomorphism and removes ambiguity.. Therefore, and are notational shortcuts for these pairs. Furthermore, the containment relation is extended to sequences in the expected way: means that there is some such that .
Transitions can be the computation performed by multiple edges, one after the other; this is from the condition that there must be a path between symbolic counterparts, and not just a single edge (though a single edge still constitutes a path, so the symbolic counterparts may be adjacent).
When seen as a function, the operator that gives the symbolic state that generates a concrete state holds the following properties:
If there is any branching in and the section of code with branching is traversed only once, then cannot be surjective since not every branch is explored at runtime.
If there are any loops in and a symbolic state lies inside a loop that performs more than one iteration, then cannot be injective since that symbolic state will generate multiple concrete states during runtime.
A system represents a single run of a function . If the inputs are changed, or if there is non-determinism in , this results in a different system .
The set of all transitions in a single system is denoted by (with the subscript omitted), but the notation is maintained since can be derived from the sequence of concrete states and the times at which they are attained. The set has a natural total order . For , . Using , it makes sense to call the minimal element the first element, and build a labelling from there. This gives a way to index sets of transitions. Hence, denote by the element of with respect to the ordering .
This description of a way to model program runs is concluded by defining some properties of transitions.
Definition 2.2.3 (Properties of Transitions).
Let be a transition in a system , with . Then, one can define:
with . is called the duration of .
and to be the concrete states and respectively.
to be .
Note that, if a transition corresponds to a function call (that is, it corresponds to a single edge of type call in a SCFG), the duration of the transition is the time taken by the function call. In particular, by the conditions in Definition 2.2.1, if the transition between two concrete states corresponds to a function call, this must be represented by a single edge in the SCFG. Furthermore, for any transition in a system provides a mechanism to directly talk about time constraints, which are simply predicates on the map .
Consider again the code snippet
with the SCFG in Figure 2.0(b). In the case of this code, there are no parameters and the program is deterministic so, for each set of critical variables , only one DDS can ever be generated by runs of it. Suppose is the DDS with . Then,
and assigns to each concrete state in the sequence a timestamp.
This paves the way to defining the new logic that is the main topic of this report.
2.3 CFTL and its Semantics
Given the machinery developed so far, it is now possible to define the new logic; this involves defining the syntax and semantics. First, some requirements should be given. The logic should:
Describe constraints over state and time. The main point of interest for time constraints is over function calls.
Be efficient to check at runtime, meaning reaching a verdict on a property will not generate too much777“Too much” depends on the system being monitored. overhead.
With this in mind, the purpose of developing this logic should be reiterated. Given a function that is implemented in code, a property should be checked with respect to the function . If the function, at any point, violates this property, a (false) verdict should be reached. If no violation occurs, the verdict is always either (true, so no violation can occur for the section of the code being monitored) or ? (not enough information has been observed yet to reach a true or false verdict).
Formulas in this logic will take the form
where for are domains over which quantification can occur (see Definition 2.3.1) and is an -ary predicate (a predicate with as free variables). The significant structure of a formula in this logic lies in the predicate . It remains to define the ; this is done in Definition 2.3.1, but some preliminary work is required to allow the construction of the sets discussed there.
2.3.1 Quantification Domains
Let be a discrete-time dynamical system (see Definition 2.2.1) based on some with respect to a set of critical symbols . Now, two binary relations must be defined. Let:
with the edge to which corresponds in has type (if for a singleton , then is written);
with . If and is a singleton set, for simplicity, one writes .
Now, let be a predicate on transitions with
If the transition is a function call, the operates on relation gives the name of the function being called as well as every ; if it is an assignment, the relation gives the name of the variable to which it assigns a value.
Now, define a third binary relation by and so is a predicate on concrete states with
With and defined, it now makes sense to write , ie, “the transition holds the property ”, and similarly for . Hence, .
Finally, let and denote by a set consisting of elements of either or such that, . Recall that, by writing or , the timestamp at which either occurs is understood, allowing sets to contain multiple instances of isomorphic states and transitions (distinguished only by their timestamps).
One can construct a set of transitions that are function calls of some function by writing
One can also construct a set of all states in which has a new value by writing
The necessary definitions are now in place to properly present the notion of a Quantification Domain and, therefore, make clear what the domains are in the formula Equation 2.3.1.
Definition 2.3.1 (Quantification Domain (QD)).
Let be a property over states or transitions, and let be the set of either states or transitions that hold this property. Then is a Quantification Domain (QD).
As an example, let be the set of states that change , so . Then, (a case of one-dimensional quantification) can be interpreted as “For every state that changes the program variable , the predicate on that state should hold”. This a natural way of applying properties to programs and is the idea followed through this report.
To finish this section, the precise definition of the natural ordering with respect to time is needed for quantification domains. Let be a quantification domain of either concrete states or transitions taken from the system . Then the total ordering on induced by is such that, for , .
Using this total order, the minimal element is the first element, and a labelling can be applied from there, thus generating an indexing of the elements of a quantification domain.
2.3.2 Points of Interest and Future Time
Given a formula with one-dimensional quantification, , one can refer to each as a point of interest. In particular, , once defined, will be a predicate on both and other states/transitions in that have some relationship to (eg, the next transition with respect to that holds some property based on the relations described in Section 2.3.1).
The next step is to define a set of functions that, given a point of interest for some quantification domain , will give either a single element or a set of elements that have some relationship to .
Definition 2.3.2 (Future-time Operators).
Let be a discrete-time dynamical system. Then,
gives the next transition in time with respect to that satisfies :
is similar, but for states.
gives all future transitions in time with respect to that satisfy :
is again similar, but for states.
For any for a quantification domain consisting of concrete states or transitions and a predicate written in the form seen in Section 2.3.1, and similarly for the future-time operators that give states.
Notice that, since or yield single elements with respect to a system , these cannot be quantified over; it makes no sense. However, and can indeed be quantified over, since they yield sets.
Consider the DDS in Example 2.2.1 with the sequence of states
Fix (the first entry in ) and . Then (or when the type of the transition is understood) refers to the transition
where the notation used is consistent with that in Definition 2.2.2.
Now, the structure of is presented in Definition 2.3.3.
Definition 2.3.3 (Form of ).
The form of is given by the grammar
Note that this grammar can only generate formulas in prenex normal form. Note also that implication and conjunction can be expressed by the usual identifies888 and . Here, follows the context-sensitive grammar below for some ; is the current binding from a quantification domain; and denotes nested quantification with respect to a quantification domain whose computation requires a binding from some .
It is assumed in the grammar that formulas have at most as many free variables as there are bound variables from quantification; there can be no free variables that are not bound by some quantifier.
Now, the semantics can be given. Some notation is introduced, first: , for a quantification domain depending on an existing binding , denotes the instance of when the binding is given. For a formula with quantification sequence , a binding is a map from bind variables to (for and part of some DDS) derived from the quantification sequence. Note that bindings may be partial functions.
Definition 2.3.4 (Definition of for a system ).
Let be a discrete-time dynamical system, let be a property in CFTL and let be a binding taken from the quantification sequence . Then, the relation is defined by:
Now, take a state from a binding (that is, for some ). Then, the semantics for (where has the structure given in Definition 2.3.3) follows.
Suppose now that is a transition taken from a binding (that is, for some ). Then, the semantics for (where has the structure given in Definition 2.3.3) follows.
Supposing that is either a transition or a state. Then the remaining semantics is:
In , the first quantification domain used is necessarily independent of any bindings, that is, it does not require any binding from any other quantification domain to be computed. All other quantification domains are necessarily dependent on some with .
Definition 2.3.4 gives the notion of a binding satisfying a formula . This definition is now extended to say what it means for a system to hold the property , written . Denote by the current observation sequence such that for some DDS . An observation sequence is intuitively the data observed so far from a runtime; some Obs is said to be well-formed when, for every , . Consequently, only well-formed observation sequences are considered as these are the only ones that can be received from the runtime of a monitored program.
Now, consider a formula . The quantification sequence of this formula defines a set of bindings that map the bind variables to elements of of some DDS. Given a current observation sequence Obs, a subset of can be said to be generated. Denote such a subset by (such a subset can contain maps that are partial, since some information required to construct full bindings may not have been observed, yet). Now, Obs is a finite prefix of another finite sequence that represents the sequence of observations obtained by observing the entire runtime, hence it is natural to consider extensions of such a prefix, say , and consider the concatenation of sequences, . One can then write .
The 3-valued semantics of a formula based on a finite prefix of an observation sequence can therefore be given in terms of these finite prefixes, as it is in Definition 2.3.5. This definition gives a value to , which denotes the verdict of given the observation sequence (a finite prefix) Obs.
Definition 2.3.5 (3-valued semantics of ).
Let be a formula in CFTL and Obs be the current observation sequence. Then,
The intuition is as such:
when no extension of the current sequence of observations can introduce a new binding and, for every existing binding , holds.
when, given the current sequence of observations, there is already a binding under which does not hold; observation of more data and expansion of cannot change this.
There is a binding that does not contain enough information to decide , where denotes with values substituted in from ; or
There is a possibility that the remainder of the runtime of the program under scrutiny will generate more bindings, which could give rise to a violation of . However, there may also be extensions of the current observation sequence that do not violate ; one cannot know which extension will be observed.
Finally, for a DDS , write if and only if there is some finite prefix Obs of the complete observation sequence of , , such that . Notice that this means that all other finite prefixes with Obs as a prefix will also be models for by virtue of Definition 2.3.5.
The space of bindings derived from an observation sequence may contain partial bindings and still contain enough information to decide for each binding.
From now on, for a binding , notation will be reduced to with the understanding that the bind variables in a formula have the same order as the one in which they are written.
Consider again the code snippet:
and verification of the property
Suppose that the runtime has been observed up to a point where the observation sequence is
From this observation sequence, ; the quantification sequence in the formula in Equation 2.3.4 consists of a single bind variable, hence is a full binding. Since the set is the set of all bindings that can be derived based on the quantification sequence in the formula being checked, and further observation will not yield a larger set of bindings, a true verdict is reached.
Consider now the same code as in Example 2.3.4, but with verification of the property
Suppose that the runtime has, again, been observed up to a point where the observation sequence is
Then , where the second bind variable is a transition, since it is obtained from a binding from the set in the quantifier sequence.
Notice that extending the observation sequence:
yields the set of bindings
where the bindings are distinguished by the fact that, when written down, concrete states and transitions are understood to be paired with their timestamps.
To finish this example, notice that observing further calls to would yield further bindings, and so would expand . By Definition 2.3.5, this means cannot be equal to ; rather it must be the case that since extensions of Obs generate larger spaces of bindings .
3.1 Online Monitors
A monitor constructed for online use should have three states, , in agreement with the CFTL semantics in Definition 2.3.5. The truth value of a formula with respect to observed data must, in the context of online monitoring, be able to be “I don’t know”.
These three states correspond to three distinct configurations of a formula tree (defined in Definition 3.1.1): the root collapsed to , meaning the truth value is ; the root collapsed to , meaning the truth value is ; and the root not collapsed (still a tree), meaning the truth value is ?.
The definitions that follow introduce the inductively-defined formula tree, along with the notion of collapse of subtrees of formula trees. This initial definition of a formula tree will be in terms of propositional logic, and is easy to extend to CFTL.
Definition 3.1.1 (Formula Tree).
Let be a formula in propositional logic. Then the Formula Tree is a directed graph where:
is a set of vertices corresponding to sub-formulas of ;
is a set of edges where is a sub-formula of ;
is the root vertex.
A tree is defined inductively:
generates vertices and edges
generates vertices and edges
If any generated is itself non-atomic (is a conjunction or disjunction; negations are counted as atoms), then the rules for trees generated by conjunctions/disjunctions are applied again.
Definition 3.1.2 (Sub-tree Collapse).
Let be the formula tree constructed using Definition 3.1.1, and let be a sub-tree of for a sub-formula of . Then, the collapse of to a truth value is performed by replacing all edges with .
A sub-tree can be collapsed provided that one of the following conditions holds:
for some sub-formulas (corresponding to sub-trees) and there is at least one (by observation of an atom, or by collapse of sub-trees). In this case, the collapse is to .
for some sub-formulas (corresponding to sub-trees) and all . In this case, the collapse is to .
for some sub-formulas (corresponding to sub-trees) and there is some . In this case, the collapse is to .