Execution traces provide an exact representation of the system behaviours that are exercised when an instrumented implementation runs. This is leveraged by automated model learning algorithms to generate system abstractions.
Modern passive learning algorithms also infer guards and operations on system variables from the trace data, yielding symbolic models [model_daikon, compute_walkinshaw, Walkinshaw2016, jeppu]. But the behaviours admitted by these models are, of course, limited to only those manifest in the traces. So capturing all system behaviour by the generated system models is conditional on devising a software load that exercises all relevant system behaviours.
This can be difficult to achieve in practice, especially when a system comprises multiple components and it is not obvious how each component will behave in conjunction with the others. Random input sampling is a pragmatic choice in this scenario, but it does not guarantee that generated models admit all system behaviour. This is discussed further in Section IV-C.
On the other hand, active learning algorithms can, in principle, generate exact system models [lstar, mat_star]. But when used in practice to learn symbolic abstractions, these algorithms suffer from high query complexity and can only learn models with transitions labelled by simple predicates, such as equality/inequality relations [grey_box_sl, Howar2018ActiveAL, Howar2019].
We present a new active learning approach to derive system abstractions, as finite state automata (FSA), of a system implementation instrumented to observe a set of system variables. The generated abstractions admit all system behaviours over these variables and provide useful insight in the form of invariants that hold on the implementation. As illustrated in Fig. 1, the approach is a grey-box algorithm. It combines a black-box analysis, in the form of model learning from traces, with a white-box analysis, in the form of software model checking [mc-book]. The model learning component can be any algorithm that can generate an FSA that accepts a given set of system execution traces. Model checking is used to evaluate the degree of completeness of the learned automaton and identify any missing behaviours.
The core of this new approach is as follows. Given an instrumented system implementation, a set of observable variables and a model learning algorithm, the structure of a candidate abstraction generated by model learning is used to extract conditions that collectively encode the following completeness hypothesis: For any transition in the system defined over the set of observables, there is a corresponding transition in the generated abstraction. To verify the hypothesis, all conditions are checked against the implementation using model checking. Any violation indicates missing behaviours in the candidate model. This evaluation procedure operates at the level of the abstraction and not individual system traces, unlike query-based active learning algorithms, and therefore can be easily implemented using existing model checkers.
The procedure to evaluate degree of completeness of the learned model yields a set of new traces that exemplify system behaviours identified to be missing from the model. New traces are used to augment the input trace set for model learning and iteratively refine the abstraction until all conditions are satisfied. The resulting learned model is the system abstraction that has been proven to admit all system behaviours defined over the set of observable variables. Further, the conditions extracted from the final generated abstraction serve as invariants that hold on the implementation. Given a model learning algorithm that can infer symbolic abstractions from trace data, such as [jeppu], the approach can learn models that are more expressive than the abstractions learned using existing active learning algorithms.
Ii-a Formal Model
The system for which we wish to generate an abstraction is represented as a tuple . is a set of observable system variables over some domain that can be used to collect execution traces. We simplify the presentation by assuming all variables have the same domain. The set contains corresponding primed variables, also over the domain . A primed variable represents an update to the unprimed variable after a discrete time step. The transition relation describes the relationship between and for
and is represented using a characteristic function, i.e., a Boolean-valued expression over. The set of initial system states is represented using its characteristic function .
A valuation maps the variables in to values in . An observation at discrete time step is a valuation of the variables at that time, and is denoted by . A trace is a sequence of observations over time; we write a trace with observations as a sequence of valuations . We define an execution trace or positive trace for as a trace that corresponds to a system execution path, i.e., for and there exists a valuation such that . A negative trace is a trace that does not correspond to any system execution path. We represent the set of execution traces by .
The active learning algorithm learns a system model as an FSA. Our abstractions are represented symbolically and feature predicates on the transition edges, such as the abstraction in Fig. 2, and therefore extend finite automata to operate over infinite alphabets. We represent the learned abstraction as a non-deterministic finite automaton (NFA) over an infinite alphabet , where is a finite set of states, are the initial states, is the set of accepting states, and is the transition function. The alphabet corresponds to the set of valuations for variables in , i.e., . The exact alphabet need not be known a priori.
The NFA admits a trace if there exists a sequence of automaton states such that and for . Any finite prefix of a system execution trace is also an execution trace. Thus, if the generated NFA admits , it must also admit all finite prefixes of . In other words, the language of the automaton, , must be prefix-closed.
Definition 1 (Prefix-Closure)
A language is said to be prefix-closed if for all words , . Here, denotes the set of all prefixes of the word .
All states of our automaton are accepting, i.e., the NFA rejects traces by running into a ‘dead end’.
Ii-B Model Learning from Execution Traces
The approach uses a pluggable model learning component to generate models from traces. Our requirement for this component is simple: given a set of execution traces , the component returns an NFA that accepts (at least) all traces in .
There are several model-learning algorithms that satisfy this requirement [model_daikon, efsm_state_merge, compute_walkinshaw, Walkinshaw2016, jeppu, Biermann:1972:SFM:1638603.1638997]. For our experiments, we use the automata learning algorithm in [jeppu] as the model learning component. The algorithm uses a combination of Boolean Satisfiability (SAT) and program synthesis [gulwani2017program]
to generate compact accurate symbolic models using only execution traces. The algorithm has been implemented as an open source tool, Trace2Model (T2M)[t2m], which we use for our experiments.
Our choice of algorithm allows us to demonstrate the effectiveness of the active learning algorithm with minimal assumptions. T2M uses only execution traces to generate models, unlike other algorithms that use a priori known LTL properties [model_SAT, exact_fsm, state_merge]. Furthermore, T2M can generalise over multiple variables to generate symbolic models with predicates on the transition edges (Fig. 2). By demonstrating that the active learning algorithm can be used with T2M, we show that it can also be applied to learn models with transition edges labelled with simple Boolean events or letters from the alphabet .
Iii Active Learning of Abstract System Models
The NFA generated by the model learning component admits all system behaviours captured by the input trace set . However, might not capture all system behaviour. To evaluate the degree of completeness of the set of traces, we use the structure of the NFA to extract conditions that can be checked against the system implementation. The conditions collectively encode the following completeness hypothesis: For any transition available in the system defined by the transition relation , there is a corresponding transition in .
The hypothesis is formulated based on defining a simulation relation between the system and abstraction .
If represents the set of system states for and represent the system state characterised by valuation of variables in , then we define a binary relation to be a simulation if implies that i.e, , such that and .
If such a relation exists, we denote . A consequence of the existence of a simulation relation is trace inclusion, i.e., , as will be proved later in this section.
The extracted conditions are checked against the system implementation using software model checking (See Section III-B.). The procedure returns success if all conditions are satisfied and failure, along with a counterexample trace, if there is a violation. The counterexample trace is a sequence of valuations of variables in that corresponds to an execution path in the system representing the missing behaviour. The set of counterexample traces obtained for all violations is used to augment the input trace set i.e., . This is then fed to the model learning component to generate an extended abstraction that covers missing behaviours.
When all conditions are satisfied, the algorithm returns the generated automaton as the final learned system abstraction . The conditions extracted from serve as invariants that hold on the system implementation.
Iii-a Completeness Conditions for a Candidate Abstraction
Given a candidate abstraction for a system , we extract the following conditions:
where is the set of predicates for all outgoing transitions from an automaton state , and for all
where is the set of predicates on the incoming transitions to state and is the set of predicates on outgoing transitions from . A condition of the form (2) is extracted for every state .
We compute the fraction of conditions that hold on the system, denoted by , as a quantitative measure of the degree of completeness of the learned model. If all extracted conditions hold, i.e., , then the generated model admits all system behaviours. A violation indicates missing behaviour in .
Given a candidate abstraction for a system and the set of conditions extracted from ,
We will prove this by contradiction. Let us assume that all conditions are satisfied and there exists a trace such that . Let be the longest prefix of that is accepted by . Hence, there exists a sequence of states such that and , for .
Iii-B Verifying Extracted Conditions Against the System
To enable the application of existing software model checkers, we construct source code for functions that encode conditions (1) and (2) of the form as assume/assert pairs, as illustrated in Fig. 3. Here, (line 3 in Fig 3) represents an implementation of the transition relation . System behaviours are modelled as multiple unwindings of the loop in line 2 in Fig. 3.
To check if the system satisfies a condition, we run model checking using -induction [k-ind, dhkr2011-sas] on the function with , as each condition describes a single system transition. Note that the procedure in Fig. 3 does not start from an initial state, but an arbitrary state that satisfies . A proof with is thus sufficient to assert that the system satisfies the condition for any number of transitions. When all assume/assert pairs are proved valid, this implies that the extracted conditions are always satisfied and therefore can be used as invariants that hold on the system. Other model checking algorithms can be used in place of -induction for the condition check.
In case of a failure, the checker returns a sequence of valuations as the counterexample, such that . This can be used to construct a set of new traces as follows. For each trace we find the smallest prefix such that . We then construct a new trace for each prefix . Note that since , the new trace does not change the system behaviour represented by but merely augments it to include the missing behaviour. The set of new traces thus generated is used as an additional input to the model learning component, which in turn generates a refined abstraction that admits the missing behaviour.
For a violation of condition (1), the checker returns a counterexample such that and . is therefore a valid counterexample. However, the counterexample for a violation of condition (2) could be spurious. Let be the corresponding counterexample generated by the model checker. Here, it is not guaranteed that the system state characterised by is reachable from an initial system state. Therefore, the counterexample may not actually correspond to missing system behaviour.
Iii-C Identifying Spurious Violations
To check if a counterexample is spurious, the valuation is encoded as the following Boolean formula:
and the negation, , is used to assert that never holds at any point in the execution of starting from an initial state, as shown in Fig. 3. We verify this using -induction with . If both the base case and step case for -induction hold, it is guaranteed that the counterexample is spurious, in which case we strengthen the assumption in Fig. 3 to and repeat the condition check. In case of a violation only in the step case, there is no conclusive evidence for the validity of the counterexample. Since we are not interested in generating an exact model of the system but rather an over-approximation that provides useful insight into the system, we treat such a counterexample as valid but record it for future reference.
For the bound , a value greater than or equal to the diameter of the system guarantees completeness [k-ind]. In practice, it if often difficult to determine this value without any system domain knowledge. An alternative is to approximate the value of based on available trace information. For instance, if there are observable counters in the system that affect system behaviour when they reach a pre-defined limit, a good approximation for can be twice the maximum counter limit. Note that a poor choice for the bound results in more spurious behaviours being added to the model, resulting in low accuracy. But, the learned models are guaranteed to admit all system traces defined over , irrespective of the value for .
Iv Evaluation and Results
Iv-a Evaluation Setup
For our experiments we use T2M [t2m] as the model learning component, as discussed in Section II-B. To evaluate the degree of completeness we use the C Bounded Model Checker (CBMC v5.35) [ckl2004]. We implement Python modules for the following: generating the wrapper function to check each condition, processing the CBMC output to return the result of a condition check and translating CBMC counterexamples into a set of trace inputs for model learning. Note that any software model checker can be used in place of CBMC, with relevant modules to process the corresponding outputs.
To evaluate the active learning algorithm, we attempt to reverse-engineer a set of FSAs from their respective C implementations. For this purpose, we use the dataset of Simulink Stateflow example models [stateflow], available as part of the Simulink documentation. The dataset comprises benchmarks that are available in MATLAB 2018b. For each benchmark, we use Embedded Coder [simcoder] to automatically generate a corresponding C code implementation. The generated C implementation is used as the system in our experiments.
Out of the benchmarks, Embedded Coder fails to generate code for ; a total of have no sequential behaviour and implement Recursive State Machines (RSM) [rsm].111We learn abstractions as FSAs, which are known to represent exactly the class of regular languages. Reverse-engineering an RSM from traces requires a modeling formalism that is more expressive than FSAs, such as Push-Down Automata (PDA) [pda], which is outside the scope of this work. In the future, we wish to look at extensions of this work to generate RSMs. We use the remaining benchmarks for our evaluation. The implementation and benchmarks are available online [exp].
|Benchmark||Our Algorithm||Random Sampling|
|Superstep||With Super Step||1||10||1||1||1||1||139.7||0.4||1||1||21.8|
|Without Super Step||1||1||3||1||141.4||0.8||3||1||25.5|
Iv-B Experiments and Results
For each benchmark, we generate an initial set of traces, each of length , by executing the system with randomly sampled inputs. We assume that the value for counterexample validity check is known a priori and is supplied to the algorithm for each benchmark. Some of the Stateflow models are implemented as multiple parallel and hierarchical FSAs. For a given implementation and a set of observables , we attempt to reproduce each state machine separately using traces defined over all variables in . We therefore generate an abstraction with state transitions at a system level for each FSA in .
The results are summarised in Table I. We quantitatively assess the quality of the final generated model for each FSA by assigning a score , computed as the fraction of state transitions in the Stateflow model that match corresponding transitions in the abstraction. For hierarchical Stateflow models, we flatten the FSAs and compare the learned abstraction with the flattened FSA. We record the number of model learning iterations , the number of states and degree of completeness for the final model, the total runtime and the percentage of total runtime attributed to model learning, denoted by . We set a timeout of h for our experiments. For benchmarks that time out, we present the results for the model generated right before timeout.
The active learning algorithm is able to generate abstractions in under h for the majority of the benchmarks. For the benchmarks that time out, we see that the model checker tends to go through a large number of invalid counterexamples before arriving at a valid counterexample for a condition violation. This is because, depending on the size of the domain for the variables , there can be a large number of possible valuations that violate an extracted condition, of which very few may correspond to a valid system state. In such cases, runtime can be improved by strengthening the assumption in Fig. 3 with domain knowledge to guide the model checker towards valid counterexamples. For the FrameSyncController benchmark, CBMC takes a long time to check each condition, even with . This is because the implementation features several operations, such as memory access and array operations, that especially increase proof complexity and proof runtime.
Iv-B2 Generated Model Accuracy
The algorithm is guaranteed to generate an abstraction that admits all system behaviours, as is confirmed by in Table I. We also see that for these benchmarks. For two benchmarks, although the Simulink model matched the generated abstraction , the algorithm timed out before it could eliminate all spurious violations .
Iv-B3 Number of Learning Iterations
In each learning iteration , as and . Here, and are the generated abstraction and the set of new traces collected in iteration respectively. The algorithm terminates when . The number of learning iterations therefore depends on , where is the abstraction generated from the initial trace set.
Iv-C Comparison with Random Sampling
We performed a set of experiments to check if random sampling is sufficient to learn abstractions that admit all behaviours. A million randomly sampled inputs are used to execute each benchmark. Generated traces are fed to T2M to passively learn a model. T2M fails to generate a model for benchmarks, as its predicate synthesis procedure returns ‘segmentation fault’. For of the remaining benchmarks, random sampling fails to produce a model admitting all system behaviours ().
Iv-D Threats to Validity
The key threat to the validity of our experimental claim is benchmark bias. We have attempted to limit this bias by using a set of benchmarks that was curated by others. Further, we use C implementations of Simulink Stateflow models that are auto-generated using a specific code generator. Although there is diversity among these benchmarks, our algorithm may not generalise to software that is not generated from Simulink models, or software generated using a different code generator.
V Related Work
State-merge [Biermann:1972:SFM:1638603.1638997] is a popular approach for learning finite automata from system traces. The approach is predominantly passive and generated abstractions admit only those system behaviours exemplified by the traces [edsm, Heule_2010, model_daikon, compute_walkinshaw, Walkinshaw2016].
One of the earliest active model learning algorithms using state-merge is Query-Driven State Merging (QSM) [qsm], where model refinement is guided by responses to membership queries posed to an end-user. Other active versions of state-merge use model checking [state_merge, exact_fsm] and model-based testing [state_merge_testing] to identify spurious behaviours in the generated model. However, the learned model is not guaranteed to admit all system behaviour.
Angluin’s L* algorithm [lstar] is a classic active automata learning approach to construct Deterministic Finite Automata (DFA) for regular languages. The approach assumes the presence of a Minimally Adequate Teacher (MAT) that has sufficient information of the system to answer membership and equivalence queries posed by the learning framework. Algorithms based on this MAT framework [lstar, mat_star, ttt, lstar_mealy, active_kearns] can, in principle, generate exact system models. But the absence of an equivalence oracle, in practice, often restricts their ability to generate exact models or even accurate system over-approximations.
In a black-box setting, membership queries are posed as tests
on the (unknown) System Under Learning (SUL). The elicited response to a test is used to classify the corresponding query as accepting or rejecting. Equivalence queries are often approximated using techniques such as conformance testing or random testing[sl_star, tlv, ralib, black_box_cegar, dynamic_test], through a finite number of membership queries. An essential pre-requisite to enable black-box model learning is that the SUL can be simulated with an input sequence to elicit a response or output. Moreover, obtaining an adequate approximation of an equivalence oracle in a black-box setting may require a large number of membership queries, that is exponential in the number of states in the SUL. The resulting high query complexity constrains these algorithms to learning only partial models for large systems [Howar2019, Howar2018ActiveAL].
One way to address these challenges is to combine model learning with white-box techniques, such as fuzzing [fuzzing], symbolic execution [component_interface, component_interface_imprv] and model checking [lstar_assume, lstar_assume_prob], to extract system information at a lower cost [Howar2019]. In [fuzzing], model learning is combined with mutation based testing that is guided by code coverage. This proves to be more effective than conformance testing, but the approach does not always produce complete models. In [component_interface, component_interface_imprv], symbolic execution is used to answer membership queries and generate component interface abstractions modeling safe orderings of component method calls. Sequences of method calls in a query are symbolically executed to check if they reach an a priori known unsafe state. However, learned models may be partial as method call orderings that are unsafe but unknown due to insufficient domain knowledge are missed by the approach. In [lstar_assume, lstar_assume_prob], model checking is used in combination with model learning for assume guarantee reasoning. The primary goal of the approach is not to generate an abstract model of a component and may therefore terminate before generating a complete model. Also, learned models are defined over an a priori known finite alphabet consisting of observable actions.
Very closely related to our work are the algorithms that use L* in combination with black-box testing [lstar_Peled1999] and model checking [lstar_model, lstar_model2]. The latter uses pre-defined LTL system properties, similar to [state_merge, exact_fsm], and therefore generated abstractions may not model system behaviours outside the scope of these LTL properties. Black-box testing can be adopted to check degree of completeness by simulating the learned model with a set of system execution traces to identify missing behaviour. However, generated models are not guaranteed to admit all system behaviour, as this requires a system load that exercises the implementation to cover all behaviours.
An open challenge with query-based active model learning is learning symbolic models. Many practical applications of L* [kroenig-lstar, lstar_assume] and its variants are limited to learning system models defined over a finite alphabet consisting of Boolean events, such as function calls, that need to be known a priori. Sigma* [sigma*] addresses this by extending the L* algorithm to learn symbolic models of software. Dynamic symbolic execution is used to find constraints on inputs and expressions generating output to build a symbolic alphabet. However, behaviours modeled by the generated abstraction are limited to input-output steps of a software.
Another extension of the L* algorithm [mat_star] generates symbolic models using membership and equivalence oracles. Designing and implementing such oracles to answer queries on long system traces comprising sequences of valuations of multiple variables, some of which could have large domains, is not straightforward [Howar2019]. In [mapper], manually constructed mappers abstract concrete valuations into a finite symbolic alphabet. However, this process can be laborious and error prone. In [abstract_alphabet], this problem is overcome using automated alphabet abstraction refinement. In [symbolic_mealy], an inferred mealy machine is converted to a symbolic version in a post-processing step. These algorithms are however restricted to learning models with simple predicates such as equality/inequality relations.
The SL* algorithm [sl_star] extends MAT model learning to infer register automata that model both control flow and data flow. In addition to equality/inequality relations, automaton transition feature simple arithmetic expressions such as increment by and sum. Due to the high query complexity it is not obvious how the approach can be generalised to symbolic models over richer theories. An extension of this algorithm [grey_box_sl], uses taint analysis to boost performance by extracting constraints on input and output parameters. Both algorithms use individual oracles for each type of operation represented in the symbolic model and do not allow analysis of multiple or more involved operations on data values. The SUL is modeled as a register automaton and model learning is performed in a black-box setting, thereby generating partial models.
In our approach, the procedure used to check degree of completeness for the learned model operates at the level of the abstraction and not system traces, and therefore can be easily implemented using existing model checkers. Further, the model learning procedure and subsequent evaluation of the degree of completeness are independent of each other. This enables our approach to generate more expressive models, when combined with a model learning component that can infer transition guards and data update functions from traces, such as T2M [jeppu].
Vi Use-Cases and Future Work
In this paper, we have presented a new active model-learning algorithm to learning FSA abstractions of a system implementation from traces. The generated models are guaranteed to admit all system behaviour over a set of observable variables.
This can be particularly useful when system specifications are incomplete, and so any implementation errors outside the scope of defined requirements cannot be flagged. This is a common risk when essential domain knowledge gets progressively pruned as it is passed on from one team to another during the development life cycle. In such scenarios, manual inspection of the learned models can help identify errors in implementation. With our approach, the conditions extracted from the learned model are invariants that hold on the implementation. These can be used as additional specifications to verify multiple system implementations. The approach can also be used to evaluate test coverage for a given test suite and generate new tests to address coverage holes.
In the future, we intend to explore these potential use-cases further. This will drive improvements to reduce runtime, such as ways to guide the model checker towards valid counterexamples. We intend also to investigate extensions of the approach to model recursive state machines.