1 Introduction
Mutations—small syntactic modifications of programs that mimic typical programming errors—are used to assess the quality of existing test suites. A test kills a mutated program (or mutant), obtained by applying a mutation operator to a program, if its outcome for the mutant deviates from the outcome for the unmodified program. The percentage of mutants killed by a given test suite serves as a metric for test quality. The approach is based on two assumptions: (a) the competent programmer hypothesis [11], which states that implementations are typically closetocorrect, and (b) the coupling effect [26], which states that a test suites ability to detect simple errors (and mutations) is indicative of its ability to detect complex errors.
In the context of modelbased testing, mutations are also used to design tests. Modelbased test case generation is the process of deriving tests from a reference model (which is assumed to be free of faults) in such a way that they reveal any nonconformance of the reference model and its mutants, i.e., kill the mutants. The tests detect potential errors (modeled by mutation operators) of implementations, treated as a black box in this setting, that conform to a mutant instead of the reference model. A test strongly kills a mutant if it triggers an observable difference in behavior [11], and weakly kills a mutant if the deviation is merely in a difference in traversed program states [21].
The aim of our work is to automatically construct tests that strongly kill mutants derived from a reference model. To this end, we present two main contributions:

A formalization of mutation killing in terms of hyperproperties [14], a formalism to relate multiple execution traces of a program which has recently gained popularity due to its ability to express security properties such as noninterference and observational determinism. Notably, our formalization also takes into account potential nondeterminism, which significantly complicates killing of mutants due to the unpredictability of the test outcome.

An approach that enables the automated construction of tests by means of model checking the proposed hyperproperties on a model that aggregates the reference model and a mutant of it. To circumvent limitations of currently available model checking tools for hyperproperties, we present a transformation that enables the control of nondeterminism via additional program inputs. We evaluate our approach using a stateoftheart model checker on a number of models expressed in two different modeling languages.
Running example.
We illustrate the main concepts of our work in Figure 1. Figure 0(a) shows the SMV [24] model of a beverage machine, which nondeterministically serves coff (coffee) or tea after input req (request), assuming that there is still enough wtr (water) in the tank. Water can be refilled with input fill. The symbol represents absence of input and output, respectively.
The code in Figure 0(a) includes the variable mut (initialized nondeterministically in line 1), which enables the activation of a mutation in line 7. The mutant refills unit of water only, whereas the original model fills units.
Figure 0(b) states a hyperproperty over the inputs and outputs of the model formalizing that the mutant can be killed definitely (i.e., independently of nondeterministic choices). The execution shown in Figure 0(c) is a witness for this claim: the test requests two drinks after filling the tank. For the mutant, the second request will necessarily fail, as indicated in Figure 0(d), which shows all possible output sequences for the given test.

Outline.
Section 2 introduces our system model and HyperLTL. Section 3 explains the notions of potential and definite killing of mutants, which are then formalized in terms of hyperproperties for deterministic and nondeterministic models in Section 4. Section 5 introduces a transformation to control nondeterminism in models, and Section 6 describes our experimental results. Related work is discussed in Section 7.
2 Preliminaries
This section introduces symbolic transition systems as our formalisms for representing discrete reactive systems and provides the syntax and semantics of HyperLTL, a logic for hyperproperties.
2.1 System Model
A symbolic transition system (STS) is a tuple , where are finite sets of input, output, and state variables, is a formula over (the initial conditions predicate), and is a formula over (the transition relation predicate), where is a set of primed variables representing the successor states. An input , output , state , and successor state , respectively, is a mapping of , , and , respectively, to values in a fixed domain that includes the elements and (representing true and false, respectively). denotes the restriction of the domain of mapping to the variables . Given a valuation and a Boolean variable , denotes the value of in (if defined) and and denote with set to and , respectively.
We assume that the initial conditions and transition relation predicate are defined in a logic that includes standard Boolean operators , , , , and . We omit further details, as as our results do not depend on a specific formalism. We write and to denote that and evaluate to true under an evaluation of inputs , outputs , states , and successor states . We assume that every STS has a distinct output , representing absence of output.
A state with output such that are an initial state and initial output. A state has a transition with input to its successor state with output iff , denoted by . A trace of is a sequence of tuples of concrete inputs, outputs, and states such that and . We require that every state has at least one successor, therefore all traces of are infinite. We denote by the set of all traces of . Given a trace , we write for , for , for and to denote . We lift restriction to sets of traces by defining as .
is deterministic iff there is a unique pair of an initial state and initial output and for each state and input , there is at most one state with output , such that . Otherwise, the model is nondeterministic.
In the following, we presume the existence of sets of atomic propositions (intentionally kept abstract)^{1}^{1}1Finite domains can be characterized using binary encodings; infinite domains require an extension of our formalism in Section 2.2 with equality and is omitted for the sake of simplicity. serving as labels that characterize inputs, outputs, and states (or properties thereof).
For a trace the corresponding trace over is . We lift this definition to sets of traces by defining .
Example 1
Figure 0(a) shows the formalization of a beverage machine in SMV [24]. In Figure 0(b), we use atomic propositions to enumerate the possible values of in and out. This SMV model closely corresponds to an STS: the initial condition predicate and transition relation are formalized using integer arithmetic as follows:
The trace , is one possible execution of the system (for brevity, variable names are omitted).
Examples of atomic propositions for the system are and the respective atomic proposition trace of is
2.2 HyperLTL
In the following, we provide an overview of the HyperLTL, a logic for hyperproperties, sufficient for understanding the formalization in Section 4. For details, we refer the reader to [13]. HyperLTL is defined over atomic proposition traces (see Section 2.1) of a fixed STS as defined in Section 2.1.
Syntax.
Let be a set of atomic propositions and let be a trace variable from a set of trace variables. Formulas of HyperLTL are defined by the following grammar:
Connectives and are universal and existential trace quantifiers, read as ”along some traces” and ”along all traces”. In our setting, atomic propositions express facts about states or the presence of inputs and outputs. Each atomic proposition is subscripted with a trace variable to indicate the trace it is associated with. The Boolean connectives , , and are defined in terms of and as usual. Furthermore, we use the standard temporal operators eventually , and always .
Semantics
states that is valid for a given mapping of trace variables to atomic proposition traces. Let be as except that is mapped to . We use to denote the trace assignment for all . The validity of a formula is defined as follows:
We write if holds and is empty. We call a witness of a formula , if and .
3 Killing mutants
In this section, we introduce mutants, tests, and the notions of potential and definite killing. We discuss how to represent an STS and its corresponding mutant as a single STS, which can then be model checked to determine killability.
3.1 Mutants
Mutants are variations of a model obtained by applying small modifications to the syntactic representation of . A mutant of an STS (the original model) is an STS with equal sets of input, output, and state variables as but a deviating initial predicate and/or transition relation. We assume that is equally inputenabled as , that is , i.e., the mutant and model accept the same sequences of inputs. In practice, this can easily be achieved by using selfloops with empty output to ignore unspecified inputs. We use standard mutation operators, such as disabling transitions, replacing operators, etc. Due to space limitations and the fact that mutation operators are not the primary focus of this work, we do not list them here, but refer to Appendix 0.A and [5].
We combine an original model represented by and a mutant into a conditional mutant , in order to perform mutation analysis via model checking the combined model. The conditional mutant is defined as , where is a fresh Boolean variable used to distinguish states of the original and the mutated STS.
Suppose replaces a subformula of by , then the transition relation predicate of the conditional mutant is obtained by replacing in by . We fix the value of in transitions by conjoining with . The initial conditions predicate of the conditional mutant is defined similarly. Consequently, for a trace it holds that if then , and if then . Formally, is nondeterministic, since is chosen nondeterministically in the initial state. However, we only refer to as “nondeterministic” if either or is nondeterministic, as is typically fixed in the hypertproperties in Section 4.
3.2 Killing
Killing a mutant amounts to finding inputs for which the mutant produces outputs that deviate from the original model. In a reactive, modelbased setting, killing has been formalized using conformance relations [28], for example in [4, 15], where an implementation conforms to its specification if all its input/output sequences are part of/allowed by the specification. In modelbased testing, the model takes the role of the specification and is assumed to be correct by design. The implementation is treated as black box, and therefore mutants of the specification serve as its proxy. Tests (i.e., input/output sequences) that demonstrate nonconformance between the model and its mutant can be used to check whether the implementation adheres to the specification or contains the bug reflected in the mutant. The execution of a test on a system under test fails if the sequence of inputs of the test triggers a sequence of outputs that deviates from those predicted by the test. Formally, tests are defined as follows:
Definition 1 (Test)
A test of length for comprises inputs and outputs of length , such that there exists a trace with and .
For nondeterministic models, in which a single sequence of inputs can trigger different sequences of outputs, we consider two different notions of killing. We say that a mutant can be potentially killed if there exist inputs for which the mutant’s outputs deviate from the original model given an appropriate choice of nondeterministic initial states and transitions. In practice, executing a test that potentially kills a mutant on a faulty implementation that exhibits nondeterminism (e.g., a multithreaded program) may fail to demonstrate nonconformance (unless the nondeterminism can be controlled). A mutant can be definitely killed if there exists a sequence of inputs for which the behaviors of the mutant and the original model deviate independently of how nondeterminism is resolved. Note potential and definite killability are orthogonal to the folklore notions of weak and strong killing, which capture different degrees of observability. Formally, we define potential and definite killability as follows:
Definition 2 (Potentially killable)
is potentially killable if
Test for of length potentially kills if
Definition 3 (Definitely killable)
is definitely killable if there is a sequence of inputs , such that
Test for of length definitely kills if
Definition 4 (Equivalent Mutant)
is equivalent iff is not potentially killable.
Note that definite killability is stronger than potential killabilty, though for deterministic systems, the two notions coincide.
Proposition 1 ()
If is definitely killable then is potentially killable.
If is deterministic then: is potentially killable iff is definitely killable.
The following example shows a definitely killable mutant, a mutant that is only potentially killable, and an equivalent mutant.
Example 2
The mutant in Figure 0(a), is definitely killable, since we can force the system into a state in which both possible outputs of the original system (coff, tea) differ from the only possible output of the mutant ().
Consider a mutant that introduces nondeterminism by replacing line 7 with the code if(in=fill):(mut ? {1,2} : 2), indicating that the machine is filled with either or units of water. This mutant is potentially but not definitely killable, as only one of the nondeterministic choices leads to a deviation of the outputs.
Finally, consider a mutant that replaces line 4 with if(in=req&wtr>0):(mut ? coff : {coff,tea}) and removes the mut branch of line 7, yielding a machine that always creates coffee. Every implementation of this mutant is also correct with respect to the original model. Hence, we consider the mutant equivalent, even though the original model, unlike the mutant, can output tea.
4 Killing with hyperproperties
In this section, we provide a formalization of potential and definite killability in terms of HyperLTL, assert the correctness of our formalization with respect to Section 3, and explain how tests can be extracted by model checking the HyperLTL properties. All HyperLTL formulas depend on inputs and outputs of the model, but are modelagnostic otherwise. The idea of all presented formulas is to discriminate between traces of the original model () and traces of the mutant (). Furthermore, we quantify over pairs of traces with globally equal inputs and express that such pairs will eventually have different outputs .
4.1 Deterministic Case
To express killability (potential and definite) of a deterministic model and mutant, we need to find a trace of the model () such that the trace of the mutant with the same inputs () eventually diverges in outputs, formalized by as follows:
Proposition 2 ()
For a deterministic model and mutant it holds that
If is a witness for , then kills (for some ).
4.2 Nondeterministic Case
For potential killability of nondeterministic models and mutants,^{2}^{2}2Appendix 0.A covers deterministic models with nondeterministic mutants and viceversa. we need to find a trace of the mutant () such that all traces of the model with the same inputs () eventually diverge in outputs, expressed in :
Proposition 3 ()
For nondeterministic and , it holds that
If is a witness for , then for any trace with , potentially kills (for some ).
To express definite killability, we need to find a sequence of inputs of the model () and compare all nondeterministic outcomes of the model () to all nondeterministic outcomes of the mutant () for these inputs, as formalized by :
In Figure 0(b), we present an instance of for our running example.
Proposition 4 ()
For nondeterministic and , it holds that
If is a witness for , then definitely kills (for some ).
To generate tests, we use model checking to verify whether the conditional mutant satisfies the appropriate HyperLTL formula presented above and obtain test cases as finite prefixes of witnesses for satisfaction.
5 Nondeterministic models in practice
As stated above, checking the validity of the hyperproperties in Section 4 for a given model and mutant enables testcase generation. To the best of our knowledge, MCHyper [17] is the only currently available HyperLTL model checker. Unfortunately, MCHyper is unable to model check formulas with alternating quantifiers.^{3}^{3}3While satisfiability in the presence of quantifier alternation is supported to some extent [16]. Therefore, we are currently limited to checking for deterministic models, since witnesses of may not satisfy in the presence of nondeterminism.
To remedy this issue, we propose a transformation that makes nondeterminism controllable by means of additional inputs and yields a deterministic STS. The transformed model overapproximates killability in the sense that the resulting test cases only kill the original mutant if nondeterminism can also be controlled in the system under test. However, if equivalence can be established for the transformed model, then the original nondeterministic mutant is also equivalent.
5.1 Controlling nondeterminism in STS
The essential idea of our transformation is to introduce a fresh input variable that enables the control of nondeterministic choices in the conditional mutant . The new input is used carefully to ensure that choices are consistent for the model and the mutant encoded in . W.l.o.g., we introduce an input variable with a domain sufficiently large to encode the nondeterministic choices in and , and write to denote a value of that uniquely corresponds to state with output . Moreover, we add a fresh Boolean variable to used to encode a fresh initial state.
Let and be valuations of , , , and , and and denote and , respectively. Furthermore, , , and are formulas uniquely satisfied by , , and respectively.
Given conditional mutant , we define its controllable counterpart . We initialize and incrementally add constraints as described below.
Nondeterministic initial conditions:
Let be an arbitrary, fixed state. The unique fresh initial state is , which, together with an empty output, we enforce by the new initial conditions predicate:
We add the conjunct to , in order to force evaluating to in all states other than . In addition, we add transitions from to all pairs of initial states/outputs in . To this end, we first partition the pairs in into pairs shared by and exclusive to the model and the mutant:
For each , we add the following conjunct to :
In addition, for inputs without corresponding target state in the model or mutant, we add conjuncts to that represent self loops with empty outputs:
Nondeterministic transitions:
Analogously to initial states, for each state/input pair, we partition the successors into successors shared or exclusive to model or mutant:
A pair causes nondeterminism if
For each pair that causes nondeterminism and each , we add the following conjunct to :
Finally, we add conjuncts representing self loops with empty output for inputs that have no corresponding transition in the model or mutant:
The proposed transformation has the following properties:
Proposition 5 ()
Let be a model with inputs , outputs , and mutant then

is deterministic (up to ).

.

then is equivalent.
The transformed model is deterministic, since we enforce unique initial valuations and make nondeterministic transitions controllable through input . Since we only add transitions or augment existing transitions with input , every transition of is still present in (when input is disregarded). The potential additional traces of Item 2 originate from the labeled transitions for nondeterministic choices present exclusively in the model or mutant. These transitions enable the detection of discrepancies between model and mutant caused by the introduction or elimination of nondeterminism by the mutation.
For Item 3 (which is a direct consequence of Item 2), assume that the original nondeterministic mutant is not equivalent (i.e., potentially killable). Then , and the corresponding witness yields a test which kills the mutant assuming nondeterminism can be controlled in the system under test. Killability purported by , however, could be an artifact of the transformation: determinization potentially deprives the model of its ability to match the output of the mutant by deliberately choosing a certain nondeterministic transition. In Example 2, we present an equivalent mutant which is killable after the transformation, since we will detect the deviating output tea of the model and of the mutant. Therefore, our transformation merely allows us to provide a lower bound for the number of equivalent nondeterministic mutants.
5.2 Controlling nondeterminism in modeling languages
The exhaustive enumeration of states () and transitions () outlined in Section 5.1 is purely theoretical and infeasible in practice. However, an analogous result can often be achieved by modifying the syntactic constructs of the underlying modeling language that introduce nondeterminism.

Nondeterministic assignments. Nondeterministic choice over a finite set of elements , as provided by SMV [24], can readily be converted into a caseswitch construct over . More generally, explicit nondeterministic assignments to state variables x [25] can be controlled by assigning the value of to x.

Nondeterministic schedulers. Nondeterminism introduced by concurrency can be controlled by introducing input variables that control the scheduler (as proposed in [22] for bounded context switches).
In case nondeterminism arises through variables underspecified in transition relations, these variable values can be made inputs as suggested by Section 5.1. In general, however, identifying underspecified variables automatically is nontrivial.
Example 3
Consider again the SMV code in Figure 0(a), for which nondeterminism can be made controllable by replacing line if(in=req&wtr>0):{coff,tea} with lines if(nd=0&in=req&wtr>0):coff, elif(nd=1&in=req&wtr>0):tea and adding init(nd):={0,1}.
Similarly, the STS representation of the beverage machine, given in Example 1, can be transformed by replacing the first two rules by the following two rules:
6 Experiments
In this section, we present an experimental evaluation of the presented methods. We start by presenting the deployed toolchain. Thereafter, we present a validation of our method on one case study with another modelbased mutation testing tool. Finally, we present quantitative results on a broad range of generic models.
6.1 Toolchain
Figure 2 shows the toolchain that we use to produce test suites for models encoded in the modeling languages Verilog and SMV. Verilog models are deterministic while SMV models can be nondeterministic.
Variable annotation.
As a first step, we annotate variables as inputs and outputs. These annotations were added manually for Verilog, and heuristically for SMV (partitioning variables into outputs and inputs).
Mutation and transformation. We produce conditional mutants via a mutation engine. For Verilog, we implemented our own mutation engine into the open source Verilog compiler VL2MV [12]. We use standard mutation operators, replacing arithmetic operators, Boolean relations, Boolean connectives, constants, and assignment operators. The list of mutation operators used for Verilog can be found in Appendix 0.A. For SMV models, we use the NuSeen SMV framework [5, 6], which includes a mutation engine for SMV models. The mutation operators used by NuSeen are documented in [5]. We implemented the transformation presented in Section 5 into NuSeen and applied it to conditional mutants.
Translation. The resulting conditional mutants from both modeling formalisms are translated into AIGER circuits [9]. AIGER circuits are essentially a compact representation for finite models. The formalism is widely used by model checkers. For the translation of Verilog models, VL2MV and the ABC model checker are used. For the translation of SMV models, NuSMV is used.
Test suite creation. We obtain a test suite, by model checking on conditional mutants. Tests are obtained as counterexamples, which are finite prefixes of witnesses to . In case we can not find a counterexample, and use a complete model checking method, the mutant is provably equivalent.
Case study test suite evaluation
We compare the test suite created with our method for a case study, with the modelbased mutation testing tool MoMuT [2, 15]. The case study is a timed version of a model of a car alarm system (CAS), which was used in the modelbased test case generation literature before [4, 3, 15].
To this end, we created a test suite for a SMV formulation of the model. We evaluated its strength and correctness on an Action System (the native modeling formalism of MoMuT) formulation of the model. MoMuT evaluated our test suite by computing its mutation score — the ratio of killed to the total number of mutants— with respect to Action System mutations, which are described in [15].
This procedure evaluates our test suite in two ways. Firstly, it shows that the tests are well formed, since MoMuT does not reject them. Secondly, it shows that the test suite is able to kill mutants of a different modeling formalism than the one it was created from, which suggests that the test suite is also able to detect faults in implementations.
We created a test suite consisting of 61 tests, mapped it to the test format accepted by MoMuT. MoMuT then measured the mutation score of our translated test suite on the Action System model, using Action System mutants. The measured mutation score is 91% on 439 Action System mutants. In comparison, the test suite achieves a mutation score of 61% on 3057 SMV mutants. Further characteristics of the resulting test suite are presented in the following paragraphs.
Quantitative Experiments
All experiments presented in this section were run in parallel on a machine with an Intel(R) Xeon(R) CPU at 2.00GHz, 60 cores, and 252GB RAM. We used 16 Verilog models which are presented in [17], as well as models from opencores.org. Furthermore, we used 76 SMV models that were also used in [5]. Finally, we used the SMV formalism of CAS. All models are available in [1]. Verilog and SMV experiments were run using property driven reachability based model checking with a time limit of 1 hour. Property driven reachability based model checking did not perform well for CAS, for which we therefore switched to bounded model checking with a depth limit of 100.
Characteristics of models. Table 1 present characteristics of the models. For Verilog and SMV, we present average (
, minimum (Min), and maximum (Max) measures per model of the set of models. For some measurements, we additionally present average (Avg.) or maximum (Max) number over the set of mutants per model. We report the size of the circuits in terms of the number of inputs (#Input), outputs (#Output), state (#State) variables as well as And gates (#Gates), which corresponds to the size of the transition relation of the model. Moreover, the row “Avg. # Gates” shows the average size difference (in of Gates) of the conditional mutant and the original model, where the average is over all mutants. The last row of the table shows the number of the mutants that are generated for the models.We can observe that our method is able to handle models of respectable size, reaching thousands of gates. Furthermore, Gates of the conditional mutants is relatively low. Conditional mutants allow us to compactly encode the original and mutated model in one model. Hyperproperties enable us to refer to and juxtapose traces from the original and mutated model, respectively. Classical temporal logic does not enable the comparison of different traces. Therefore, mutation analysis by model checking classical temporal logic necessitates strictly separating traces of the original and the mutated model, resulting in a quadratic blowup in the size of the input to the classical modelchecker, compared to the size of the input to the hyperproperty modelchecker.
Parameters  Verilog  SMV  CAS  
Min  Max  Min  Max  
# Models  16  76  1  
# Input  186.19  309.59  4  949  8.99  13.42  0  88  58 
# Output  176.75  298.94  7  912  4.49  4.26  1  28  7 
# State  15.62  15.56  2  40           
# Gates  4206.81  8309.32  98  25193  189.12  209.59  7  1015  1409 
Avg. # Gates  3.98%  14.71%  10.2%  57.55%  8.14%  8.23%  0.22%  35.36%  0.86% 
# Mutants  260.38  235.65  43  774  535.32  1042.11  1  6304  3057 
Model checking results. Table 2 summarizes the quantitative results of our experiments. The quantitative metrics we use for evaluating our test generation approach are the mutation score (i.e. percentage of killed mutants) and the percentage of equivalent mutants, the number of generated tests, the amount of time required for generating them and the average length of the test cases. Furthermore, we show the number of times the resource limit was reached. For Verilog and SMV this was exclusively the 1 hour timeout. For CAS this was exclusively the depth limit 100.
Finally, we show the total test suite creation time, including times when reaching the resource limit. The reported time assumes sequential test suite creation time. However, since mutants are model checked independently, the process can easily be parallelized, which drastically reduces the total time needed to create a test suite for a model. The times of the Verilog benchmark suite are dominated by two instances of the secure hashing algorithm (SHA), which are inherently hard cases for model checking.
We can see that the test suite creation times are in the realm of a few hours, which collapses to minutes when model checking instances in parallel. However, the timing measures really say more about the underlying model checking methods than our proposed technique of mutation testing via hyperporperties. Furthermore, we want to stress that our method is agnostic to which variant of model checking (e.g. property driven reachability, or bounded model checking) is used. As discussed above, for CAS switching from one method to the other made a big difference.
The mutation scores average is around 60% for all models. It is interesting to notice that the scores of the Verilog and SMV models are similar on average, although we use a different mutation scheme for the types of models. Again, the mutation score says more about the mutation scheme than our proposed technique. Notice that we can only claim to report the mutation score, because, besides CAS, we used a complete model checking method (property driven reachability). That is, in case, for example, 60% of the mutants were killed and no timeouts occurred, then 40% of the mutants are provably equivalent. In contrast, incomplete methods for mutation analysis can only ever report lower bounds of the mutation score. Furthermore, as discussed above, the 61.7% of CAS translate to 91% mutation score on a different set of mutants. This indicates that failure detection capability of the produced test suites is well, which ultimately can only be measured by deploying the test cases on real systems.
Metrics  Verilog  SMV  CAS  
Min  Max  Min  Max  
Mutation Score  56.82%  33.1%  4.7%  99%  64.79%  30.65%  0%  100%  61.7 % 
Avg. Testcase Len.  4.26  1.65  2.21  8.05  15.41  58.23  4  461.52  5.92 
Max Testcase Len.  21.62  49.93  3  207  187.38  1278.56  4  10006  9 
Avg. Runtime  83.08s  267.53s  0.01s  1067.8s  1.2s  5.48s    46.8s  7.8s 
Equivalent Mutants  33.21%  32.47%  0%  95.3%  35.21%  30.65%  0%  100%  0% 
Avg. Runtime  44.77s  119.58s  0s  352.2s  0.7s  2.02s    14.9s   
# Resource Limit  9.96%  27.06%  0%  86.17%  3.8%  19.24%  0%  100%  38.34 % 
Total Runtime  68.58h  168.62h  0h  620.18h  0.4h  1.19h  0h  6.79h  1.15h 
7 Related Work
A number of test case generation techniques are based on model checking; a survey is provided in [18]. Many of these techniques (such as [29, 27, 20]) differ in abstraction levels and/or coverage goals from our approach.
Model checking based mutation testing using trap properties is presented in [19]. Trap properties are conditions that, if satisfied, indicate a killed mutant. In contrast, our approach directly targets the input / output behavior of the model and does not require to formulate model specific trap properties.
Mutation based test case generation via module checking is proposed in [10]. The theoretical framework of this work is similar to ours, but builds on module checking instead of hyperproperties. Moreover, no experimental evaluation is given in this work.
The authors of [4] present mutation killing using SMT solving. In this work, the model, as well as killing conditions, are encoded into a SMT formula and solved using specialized algorithms. Similarly, the MuAlloy [30] framework enables modelbased mutation testing for Alloy models using SAT solving. In this work, the model, as well as killing conditions, are encoded into a SAT formula and solved using the Alloy framework. In contrast to these approaches, we encode only the killing conditions into a formula. This allows us to directly use model checking techniques, in contrast to SAT or SMT solving. Therefore, our approach is more flexible and more likely to be applicable in other domains. We demonstrate this by producing test cases for models encoded in two different modeling languages.
Symbolic methods for weak mutation coverage are proposed in [8] and [7]. The former work describes the use of dynamic symbolic execution for weakly killing mutants. The latter work describes a sound and incomplete method for detecting equivalent weak mutants. The considered coverage criterion in both works is weak mutation, which, unlike the strong mutation coverage criterion considered in this work, can be encoded as a classic safety property. However, both methods could be used in conjunction with our method. Dynamic symbolic execution could be used to first weakly kill mutants and thereafter strongly kill them via hyperproperty model checking. Equivalent weak mutants can be detected with the methods of [7] to prune the candidate space of potentially strongly killable mutants for hyperpropery model checking.
A unified framework for defining multiple coverage criteria, including weak mutation and hyperproperties such as uniquecause MCDC, is proposed in [23] . While strong mutation is not expressible in this framework, applying hyperproperty model checking to the proposed framework is interesting future work.
8 Conclusion
Our formalization of mutation testing in terms of hyperproperties enables the automated modelbased generation of tests using an offtheshelf model checker. In particular, we study killing of mutants in the presence of nondeterminism, where testcase generation is enabled by a transformation that makes nondeterminism in models explicit and controllable. We evaluated our approach on publicly available SMV and Verilog models, and will extend our evaluation to more modeling languages and models in future work.
References
 [1] Mutation testing with hyperproperies benchmark models. https://gitservice.ait.ac.at/sctdsepublic/mutationtestingwithhyperproperties. Uploaded: 20190425.
 [2] B. Aichernig, H. Brandl, E. Jöbstl, W. Krenn, R. Schlick, and S. Tiran. MoMuT::UML modelbased mutation testing for UML. In Software Testing, Verification and Validation (ICST), 2015 IEEE 8th International Conference on, ICST, pages 1–8, April 2015.
 [3] Bernhard K. Aichernig, Harald Brandl, Elisabeth Jöbstl, Willibald Krenn, Rupert Schlick, and Stefan Tiran. Killing strategies for modelbased mutation testing. Softw. Test., Verif. Reliab., 25(8):716–748, 2015.
 [4] Bernhard K. Aichernig, Elisabeth Jöbstl, and Stefan Tiran. Modelbased mutation testing via symbolic refinement checking. 2014.
 [5] Paolo Arcaini, Angelo Gargantini, and Elvinia Riccobene. Using mutation to assess fault detection capability of model review. Softw. Test., Verif. Reliab., 25(57):629–652, 2015.
 [6] Paolo Arcaini, Angelo Gargantini, and Elvinia Riccobene. Nuseen: A tool framework for the nusmv model checker. In 2017 IEEE International Conference on Software Testing, Verification and Validation, ICST 2017, Tokyo, Japan, March 1317, 2017, pages 476–483. IEEE Computer Society, 2017.
 [7] Sébastien Bardin, Mickaël Delahaye, Robin David, Nikolai Kosmatov, Mike Papadakis, Yves Le Traon, and JeanYves Marion. Sound and quasicomplete detection of infeasible test requirements. In 8th IEEE International Conference on Software Testing, Verification and Validation, ICST 2015, Graz, Austria, April 1317, 2015, pages 1–10, 2015.
 [8] Sébastien Bardin, Nikolai Kosmatov, and François Cheynier. Efficient leveraging of symbolic execution to advanced coverage criteria. In Seventh IEEE International Conference on Software Testing, Verification and Validation, ICST 2014, March 31 2014April 4, 2014, Cleveland, Ohio, USA, pages 173–182, 2014.
 [9] Armin Biere, Keijo Heljanko, and Siert Wieringa. AIGER 1.9 and beyond, 2011. Available at fmv.jku.at/hwmcc11/beyond1.pdf.
 [10] Sergiy Boroday, Alexandre Petrenko, and Roland Groz. Can a model checker generate tests for nondeterministic systems? Electronic Notes in Theoretical Computer Science, 190(2):3–19, 2007.
 [11] Timothy A Budd, Richard J Lipton, Richard A DeMillo, and Frederick G Sayward. Mutation analysis. Technical report, DTIC Document, 1979.
 [12] SzuTsung Cheng, Gary York, and Robert K Brayton. Vl2mv: A compiler from verilog to blifmv. HSIS Distribution, 1993.
 [13] Michael R. Clarkson, Bernd Finkbeiner, Masoud Koleini, Kristopher K. Micinski, Markus N. Rabe, and César Sánchez. Temporal Logics for Hyperproperties, pages 265–284. Springer Berlin Heidelberg, Berlin, Heidelberg, 2014.
 [14] Michael R. Clarkson and Fred B. Schneider. Hyperproperties. Journal of Computer Security, 18(6):1157–1210, 2010.
 [15] Andreas Fellner, Willibald Krenn, Rupert Schlick, Thorsten Tarrach, and Georg Weissenbacher. Modelbased, mutationdriven test case generation via heuristicguided branching search. In JeanPierre Talpin, Patricia Derler, and Klaus Schneider, editors, Formal Methods and Models for System Design (MEMOCODE), pages 56–66. ACM, 2017.
 [16] Bernd Finkbeiner, Christopher Hahn, and Tobias Hans. Mghyper: Checking satisfiability of HyperLTL formulas beyond the fragment. In Shuvendu K. Lahiri and Chao Wang, editors, Automated Technology for Verification and Analysis (ATVA), volume 11138 of Lecture Notes in Computer Science, pages 521–527. Springer, 2018.
 [17] Bernd Finkbeiner, Markus N. Rabe, and César Sánchez. Algorithms for model checking HyperLTL and HyperCTL. In Daniel Kroening and Corina S. Păsăreanu, editors, Computer Aided Verification (CAV), Lecture Notes in Computer Science, pages 30–48. Springer, 2015.
 [18] Gordon Fraser, Franz Wotawa, and Paul E Ammann. Testing with model checkers: a survey. Software Testing, Verification and Reliability, 19(3):215–261, 2009.
 [19] Angelo Gargantini and Constance Heitmeyer. Using model checking to generate tests from requirements specifications. In ACM SIGSOFT Software Engineering Notes, volume 24, pages 146–162. SpringerVerlag, 1999.
 [20] Hyoung Seok Hong, Insup Lee, Oleg Sokolsky, and Hasan Ural. A temporal logic based theory of test coverage and generation. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 327–341. Springer, 2002.
 [21] William E. Howden. Weak mutation testing and completeness of test sets. IEEE Trans. Software Eng., 8(4):371–379, 1982.
 [22] Akash Lal and Thomas Reps. Reducing concurrent analysis under a context bound to sequential analysis. Formal Methods in System Design, 35(1):73–97, 2009.
 [23] Michaël Marcozzi, Mickaël Delahaye, Sébastien Bardin, Nikolai Kosmatov, and Virgile Prevosto. Generic and effective specification of structural test objectives. In 2017 IEEE International Conference on Software Testing, Verification and Validation, ICST 2017, Tokyo, Japan, March 1317, 2017, pages 436–441, 2017.
 [24] McMillan, Kenneth L. The SMV system. Technical Report CMUCS92131, Carnegie Mellon University, 1992.
 [25] Greg Nelson. A generalization of dijkstra’s calculus. ACM Transactions on Programming Languages and Systems (TOPLAS), 11(4):517–561, October 1989.
 [26] A. Jefferson Offutt. Investigations of the software testing coupling effect. ACM Trans. Softw. Eng. Methodol., 1(1):5–20, 1992.
 [27] Sanjai Rayadurgam and Mats Per Erik Heimdahl. Coverage based testcase generation using model checkers. In Engineering of Computer Based Systems (ECBS), pages 83–91. IEEE, 2001.
 [28] Jan Tretmans. Test generation with inputs, outputs and repetitive quiescence. Software  Concepts and Tools, 17(3):103–120, 1996.
 [29] Willem Visser, Corina S Pǎsǎreanu, and Sarfraz Khurshid. Test input generation with java pathfinder. ACM SIGSOFT Software Engineering Notes, 29(4):97–107, 2004.
 [30] Kaiyuan Wang, Allison Sullivan, and Sarfraz Khurshid. Mualloy: a mutation testing framework for alloy. In International Conference on Software Engineering: Companion (ICSECompanion), pages 29–32. IEEE, 2018.
Appendix 0.A Appendix
0.a.1 Verilog mutation operators
Type  Mutation 

Arithmetic  Exchange binary and 
Exchange unary and  
Relations  Exchange and 
Exchange , , ,  
Boolean  Exchange and 
Drop and  
Exchange ,,,, and  
Assignments  Exchange and 
(Blocking & NonBlocking Assignment)  
Constants  Replace Integer Constant by , and 
Replace BitVector Constant by , and 

0.a.2 Mixed determinism cases
Lemma 1
Let be a trace assignment with , , and a conditional mutant.

then

then

then

then
Proof
The first two statements follow directly from the definition of conditional mutants. The latter two statements follow directly from the fact that uniquely characterize inputs and outputs.
Proposition 6 ()
Let the model with inputs and outputs be deterministic and the mutant be nondeterministic.
Let be a witness for , then there is , such that the test potentially kills .
Proof
Assume is potentially killable. Let , such that . Since is equally inputenabled, there exists a trace , such that . Clearly, . Therefore, and are satisfying assignments for and , respectively.
Assume . Let be a witness of and let be a witness of without the first existential quantifier. From Lemma 1, we immediately get , and . This shows .
Since , there exists a smallest such that and . Clearly, potentially kills .
The following hyperproperty expresses definite killing for deterministic models and nondeterministic mutants:
(1) 
Proposition 7 ()
Let the model with inputs and outputs be deterministic and the mutant be nondeterministic.
If is a witness for , then definitely kills (for some ).
Proof
Assume that is definitely killable. Since is deterministic, for every input sequence, there is at most one trace with in with this input sequence. Therefore, there is an input sequence and a unique trace with
Comments
There are no comments yet.