1. Introduction
Test data generation is one of the most important and yet most challenging aspects of software test automation (Korel:90, ; McMinn:04, ; AliBHP10, ; Myers:2011, ). To be useful, test data often needs to satisfy logical constraints. This makes constraint solving an integral part of test data generation. For example, in whitebox testing, one may be interested in achieving path coverage. Doing so requires solving the path constraints for the different control paths in a program (McMinn:04, ).
Test data tends to get progressively more complex as one moves up the testing ladder, from unit to integration to system testing. Particularly, system testing, whose goal is ensuring that a system is in compliance with its requirements (Ammann:16, ), involves exercising endtoend system behaviors. This, in turn, typically requires numerous interdependent data structures, and thus the ability to account for the wellformedness and semantic correctness constraints of these data structures.
For example, system testing of an application that calculates citizens’ taxes would require meaningful assemblies of many interrelated entities, including taxpayers, incomes, dependent family members, and so on. When available, real data, e.g., real tax records in the above example, may be utilized for system testing. In many practical settings, however, real data is (1) incomplete, meaning that the data is insufficient for exercising every system behavior that needs to be tested, (2) incompatible, meaning that, due to changes in the system, the real data no longer matches the system’s data structures, or (3) inaccessible, meaning that the real data cannot be used for testing due to reasons such as data protection and privacy.
Because of these factors, system testing is done, by and large, using synthetic data. Generating synthetic data for system testing necessitates constraint solving over system concepts, including concepts related to a system’s environment (Iqbal:15, ). The languages used for expressing systemlevel constraints are featurerich. Languages based on firstorder logic are particularly common, noting that quantification, predicates and functions are usually inevitable when one attempts to specify constraints at a system level. This high degree of expressiveness often comes at the cost of making constraint solving undecidable in general, and computationally expensive even when the size of the data to generate is bounded (Libkin:2004, ; JacksonBook2012, ).
Our work in this article is prompted by the need to expand the feasibility and scalability of constraint solving to more diverse and complex test data generation problems in industrial settings. To this end, we propose a novel constraint solving approach. We ground our approach on UML (UML, ) and its constraint language, OCL (OCL, ). This choice is motivated by UML’s widespread use in industry. OCL supports, among other features, quantification and a variety of arithmetic, string and collection operations.
Our approach leverages earlier strands of work where (metaheuristic) search is applied for solving OCL constraints (Shaukat:2013, ; Shaukat:2016, ; Soltana:2017, ). In the context of testing, these earlier strands were motivated by the limitations – in terms of scalability and the UML/OCL constructs covered – of commonly used exhaustive approaches based on SAT solving, e.g., Alloy (JacksonBook2012, ), and traditional constraint programming, e.g., UMLtoCSP (UMLtoCSP, ).
The core idea behind our approach is to further enhance the performance of searchbased OCL solving by combining it with Satisfiability Modulo Theories (SMT) (SMT, ). Specifically, we observe that SMT solvers, e.g., Z3 (z3, ), have efficient decision procedures for several background theories, e.g., linear arithmetic (dechter2003constraint, ). While these background theories per se are not sufficient for solving complex OCL constraints, the following prospect exists: if a subformula in a constraint is solvable by a combination of background theories, then the associated decision procedures are likely to handle more efficiently than search. What we set out to do in this article is to provide, based on the above prospect, a fully fledged combination of search and SMT for OCL solving.
Contributions. The contributions of this article are threefold:
1) We develop a hybrid approach for solving OCL constraints using search and SMT. The approach first transforms OCL constraints into a normal form. It then distributes the responsibility of solving the different subformulas in this normal form across search and SMT, in an attempt to best exploit the strengths of each technique. Although our focus in this article remains on OCL, the principles and practical considerations underlying our solving strategy are general, and can be applied to other constraint languages.
2) We evaluate our approach on three industrial case studies from distinct application domains. The case studies are all concerned with the generation of data for system testing. In each case, we automatically produce synthetic test data of varying sizes. Our results indicate that our approach is able to solve complex OCL constraints while generating, in practical time, fairly large data samples. We further compare against alternative solvers, and demonstrate that our approach leads to significant improvements in applicability and scalability. It is important to note that our evaluation is not aimed at assessing the effectiveness of different test strategies, but rather at assessing the feasibility and efficiency of automatically generating system test data regardless of test strategy.
3) We develop a tool in support of our approach. The tool is publicly available at https://sites.google.com/view/hybridoclsolver/. To facilitate the replication of our empirical results and also to support future research on test data generation, we provide on the above website our (sanitized) case study material as well.
Structure. Section 2 presents a running example. Section 3 provides background. Section 4 outlines our approach. Sections 5 and 6 elaborate the technical aspects of the approach. Section 7 describes our evaluation. Section 8 discusses limitations and threats to validity. Section 9 compares with related work. Section 10 concludes the article.
2. Running Example
In this section, we present an example which we use throughout the article for illustration. For several practical reasons, system test generation is usually blackbox and requires an explicit test data model (Utting:07, ). In UML, test data models can be expressed with relative ease using Class Diagrams (CD) augmented with OCL constraints (Warmer:03, ). The example, shown in Fig. 1, is a small and simplified excerpt of the system test data model for the Government of Luxembourg’s personal income tax management application. The full data model is the subject of one of our case studies in Section 7.
The graphical notation of a CD already places certain constraints on how the classes and associations in a CD can be instantiated (UML, ). One basic constraint is that abstract classes, e.g., Income in the CD of Fig. 1(a), cannot be instantiated. A second basic constraint is that the objects on the two ends of a link (association instance) must conform to the CD. For example, a supports link should have a Taxpayer object on one end and a Child object on the other. A CD may additionally have multiplicity constraints. For example, the multiplicities attached to the two ends of the earns association impose the following constraints: (1) each TaxPayer object must be linked to at least one object of kind Income; and (2) each object of kind Income must be linked to exactly one TaxPayer object.
The constraints that cannot be captured graphically are defined in OCL, as we illustrate in Fig. 1(b). Constraint C1 states that the age range for individuals is within the range . C2 states that the disability rate is within the range when the individual is disabled, and zero otherwise. C3 states that taxpayers who have an address in Luxembourg are considered residents. C4 states that taxpayers with a local income but no local address are considered nonresidents. C5 states that a tax card applies only to employment and pension incomes.
The constraint solving solution we propose in this article is aimed at generating valid instantiations of CDs, meaning that the generated instance models should satisfy both the diagrammatic and OCL constraints.
3. Background
In this section, we provide background on OCL, searchbased OCL solving, and SMT.
OCL. OCL is the defacto constraint and query language for UML (OCL, ). One can use OCL for defining: (1) invariants (conditions that must hold over all instances of a given class), e.g., C1 to C5 in Fig. 1(b), (2) operations, e.g., getAge in Fig. 1(a), (3) derived attributes (attributes whose values are calculated from other elements), (4) guard conditions for transitions in behavioral diagrams such as UML State Machines, and (5) pre and postconditions for operations.
We concentrate on handling OCL invariants and the userdefined operations needed by them. In our context, userdefined operations provide a convenient mechanism for factoring out queries which are used multiple times, or which are too complex to spell out within the invariants. Derived attributes are treated as userdefined operations. For example, the getAge operation in Fig. 1(b) would be interchangeable with a derived attribute that calculates the age of an individual (not shown).
Guard conditions fall outside the scope of our work, since we deal exclusively with CDs and their instantiations. Pre and postconditions are mainly an apparatus for verifying the executability and determinism of system behaviors (CabotCR09, ). Although uncommon, pre and postconditions can also be used for constraining the instantiations of CDs. Despite such use of pre and postconditions being plausible, we leave pre and postconditions out of the scope of our work, and assume that all the relevant data restrictions are provided as invariants.
OCL constraint solving via (metaheuristic) search. Search has been employed for addressing a wide range of software engineering problems (HarmanMZ12, ). Related to our objectives in this article, Ali et al. (Shaukat:2013, ; Shaukat:2016, )
develop a searchbased method for solving OCL constraints. In our previous work, we enhanced this method with new heuristics aimed at improving the exploration of the search space
(Soltana:2017, ). The work we present in this article builds on our enhanced implementation of searchbased OCL solving (SoltanaTool, ).The main idea behind searchbased OCL solving is to use fitness functions for assessing how far an instance model is from satisfying a given set of OCL constraints. We briefly illustrate this idea. For example, suppose that we want to satisfy the following: TaxPayer.allInstances()>forAll (x x.incomes>size()>1 and x.incomes>size()<4). Now, suppose that the instance model has two taxpayers, one with a single income and the other with four incomes. The distance for the first taxpayer is 0, as she already satisfies the constraint; whereas the distance for the second taxpayer is 1, since she has one more income than what the constraint allows. The distance for forAll, and thus for the whole constraint, is computed as the average of the distances for all (here, the two) taxpayers: . An overall distance of 0 indicates that a solution has been found; otherwise, search will iteratively attempt to minimize the (overall) distance by tweaking the instance model.
SMT. SMT is the problem of determining whether a constraint expressed in firstorder logic is satisfiable, when certain symbols have specific interpretations in a combination of background theories (MouraB11, ). Stateoftheart SMT solvers, e.g., Z3 (z3, ) and CVC4 (CVC4, ), accept as input a standard language, called SMTLIB (SMTLIB, ). For example, C2 of Fig. 1(b) can be expressed in SMTLIB as follows: (ite (= x None) (= y 0) (and (> y 0) (<= y 1)))), where x and y respectively represent the disability type and the disability rate for a given person.
Most SMT solvers employ a SAT solver alongside the decision procedures for the background theories they support (KatzBTRH16, ). The SAT solver operates on Boolean formulas, and the decision procedures operate on the symbols within the formulas. SMT solvers are highly efficient for solving quantifierfree constraints, when suitable background theories exist. Nevertheless, when constructs such as quantification, collections, or symbols with no background theories are added to the mix, these solvers often become inefficient.
Since SMT solvers alone are not able to efficiently handle all types of OCL constraints that are commonly used for specifying software systems, we propose to exploit SMT only for solving OCL subformulas where SMT is very likely to deliver efficiency gains.
4. Approach
Fig. 2 presents an overview of our approach. The approach takes as input a CD and a set of OCL constraints. The output is a valid instance model if one can be found.
The approach works in two phases: First, it constructs a single OCL constraint in Negation Normal Form (NNF) that is logically equivalent to the conjunction of the constraints that need to be solved. A constraint is in NNF if it uses only the primary Boolean operators, i.e., conjunction, disjunction and negation, and further has all the negations pushed inward to the atomic level (NNFBook, ). We use NNF because it facilitates defining and distributing the solving tasks over search and SMT, while also simplifying the translation from OCL to SMTLIB. Specifically, the single constraint is a conjunction of the following: (1) the input OCL constraints, and (2) OCL representations of the multiplicity constraints in the input CD. For example, the 1..* cardinality attached to the target end of the earns association in Fig. 1(a) is represented in OCL as: self.incomes>size() >= 1, where self refers to any instance of TaxPayer. We leave out of this conjunction the basic constraints imposed by the CD’s diagrammatic notation, e.g., type conformance for the association ends and the noninstantiability of abstract classes, as illustrated in Section 2. Our approach implicitly enforces these basic constraints when creating or tweaking instance models.
The second phase of the approach is to solve the resulting NNF constraint. To do so, we utilize a combination of search and SMT, with the solving tasks distributed in the following manner: We have search handle subformulas whose satisfaction involves structural tweaks to the instance model, i.e., additions and deletions of objects and links. An example constraint (subformula) handled by search is C5 of Fig. 1(b). If the instance model happens to violate C5, any successful attempt at satisfying the constraint will necessarily involve adding / removing objects and links.
We have SMT handle subformulas which exclusively constrain attributes with primitive types. Many such subformulas, e.g., those whose symbols are within linear arithmetic, can be efficiently handled by background SMT theories. For example, we use SMT to assign a value to the birthYear attribute of an instance of PhysicalPerson in such a way that C1 of Fig. 1(b) is satisfied. Finally, we have both search and SMT handle subformulas whose satisfaction may require a combination of structural tweaks and value assignments to primitive attributes. For example, satisfying C3 of Fig. 1(b) may involve both adding instances of Address and setting the country and isResident attributes of Address and TaxPayer instances.
We note that our examples above refer directly to the original constraints in Fig. 1, rather than to the corresponding fragments in the derived NNF constraint. This is only to ease illustration; the whole solving process, including making decisions about which subformulas to delegate to search, SMT or both, is with respect to the NNF constraint resulting from the first phase of the approach.
5. Transformation to Negation Normal Form
In this section, we elaborate the first phase of the approach outlined in Section 4. The transformation to NNF is performed using the pipeline of Fig. 3.
As noted in Section 3, our focus is on constraints expressing OCL invariants. Invariants are implicitly universally quantified. In Step 1 of the pipeline of Fig. 3, we make this implicit quantification explicit. Doing so enables us to later merge different constraints (Step 6), even when the constraints do not share the same context.
In Fig. 4, we show the result of applying Step 1 to C1 and C3 of Fig. 1(b). For efficiency reasons, Step 1 binds only one universal quantifier to a given OCL context. This is achieved by taking the conjunction of all the invariants that have the same context, before universal quantification is made explicit. For example, C1 and C2 of Fig. 1(b) are both defined over PhysicalPerson. Step 1 would thus consider C1 and C2 as one constraint. Due to space, we do not illustrate this treatment in Fig. 4.
Step 2 adds a userdefined nonemptiness constraint to the set of constraints resulting from Step 1. The nonemptiness constraint excludes the empty solution and thus avoids vacuous truth, noting that the constraints from Step 1 are invariants and thus always satisfied by the empty instance model. The nonemptiness constraint is, in contrast, not universally quantified. In the simplest case, this constraint states that a certain class of the input CD must have an instance. A natural nonemptiness constraint for our running example would be TaxPayer.allInstances()>size() >= 1, stating that the instance model must contain at least one instance of TaxPayer. In practice, the nonemptiness constraint can be more elaborate, e.g., bounding the maximum number of instances to generate from each class. Regardless of its complexity, the nonemptiness constraint must always be violated by the empty instance model and its satisfaction has to involve the creation of certain objects within the instance model.
Step 3 of the pipeline performs an inline expansion of any let expressions within the constraints. These expressions are commonly used for avoiding repetition and improving readability. For our OCL solving process to be sound, we need to expand all let expressions, including nested ones. Fig. 4 illustrates how the let expression in C1 is expanded inline.
Step 4 reexpresses OCL’s secondary Boolean operators, i.e., implies, xor and ifthenelse, in terms of the primary ones, i.e., and, or and not. Step 5 pushes the negation operators inward by applying De Morgan’s law (Hurley:2014, ). Both Steps 4 and 5 are illustrated in Fig. 4 over C3.
Finally, Step 6 takes the conjunction of the constraints obtained from Step 5. The resulting NNF constraint for our running example is shown in Fig. 5. For easier reference, we divide this constraint into five subformulas, labeled S1 to S5.
We need to point out two subtleties about the pipeline of Fig. 3: First, Steps 1 and 3 have to be done in such a way that no name clashes arise. To avoid name clashes, we give all variables distinct (automatically generated) names. Since no name clashes occur in our running example, we elected, for readability reasons, not to rename the variables in Figs. 4 and 5. Second, Steps 3 through 5 of the pipeline further apply to the userdefined operations needed by the constraints. In our example, the only operation needed, getAge, remains unaltered since it has no let expressions, secondary logical operators, or negations.
The NNF constraint generated by the pipeline of Fig. 3 is solved through the process that we present next.
6. Hybrid OCL Solving Using Search and SMT
In this section, we elaborate our hybrid OCL solving process, i.e., the second phase of the approach in Section 4. The solving process, depicted in Fig. 6, has four steps. The first step is performed once; the remaining three steps are iterative. These three steps are repeated until either an instance model satisfying the NNF constraint is found, or the maximum number of search iterations is reached. We discuss each step of the solving process below.
6.1. Delegating Solving Tasks to Search & SMT (Step 1, Fig. 6)
Step 1 of the process of Fig. 6 decides how to delegate to search and SMT the solving of different subformulas of the NNF constraint derived in Section 5.
Specifically, Step 1 applies a static and deterministic procedure, whereby the nodes in the Abstract Syntax Tree (AST) of the derived NNF constraint are given one of the following labels: (1) search, (2) SMT, or (3) both. These labels operationalize our strategy, described in Section 4, for having search and SMT collaborate for solving the NNF constraint.
The labeling procedure of Step 1 is realized by a depthfirst traversal of the AST, with visited nodes labeled according to a set of predefined rules. Our rules cover all the AST node types defined by the OCL 2.4 metamodel (OCL, ). Since the rules are numerous, we do not list them here; our complete rule set is provided in Appendix A. Below, we provide a general description of the rules and illustrate them over S5 of the NNF constraint in Fig. 5.
The AST for S5 is shown in Fig. 7. To avoid clutter, we have collapsed some nodes. The collapsed nodes, marked by dashed borders, expand similarly to the node with the “=
” symbol (bottom left of the figure). The label shown for each collapsed node is that of its root. Our rules can be broadly classified into three categories. The rules in the first category label an AST node based only on information contained within the node. Notably, (i) nodes representing quantifiers or object references are labeled
search, e.g., the root node of the AST of Fig. 7; and (ii) nodes representing calls to userdefined operations are labeled SMT if the operations are not recursive and have primitive return types (e.g., the getAge nodes in Fig. 7); otherwise, these nodes are labeled search.The rules in the second category infer a label for an AST node based on the type of the node itself and those of its ancestors. Notably, a node representing a primitive attribute is labeled SMT, unless the node has an ancestor of a certain type, e.g., exists, in which case, the attribute is labeled both. In Fig. 7, all primitive attributes are labeled SMT. To provide an example of a primitive attribute that is labeled both, consider S4 in the constraint of Fig. 5. In the AST for S4 (not shown), the node representing isLocal would be labeled both.
The rules in the third category infer a label for an AST node based on the node’s children. Notable constructs handled by this category of rules are logical and numerical operators. For example, the and, or, not, =, >=, <=, > nodes in Fig. 7 are all labeled based on their (immediate) children. Specifically, if all children have the same label, that label is propagated to the parent node. If the children have different labels, then the parent node is labeled both. AST nodes representing constants (e.g., 100 in Fig. 7) are left unlabeled; these nodes do not play a role in deciding about the delegation of solving tasks.
Although not illustrated in Fig. 7, when a node representing a call to a userdefined operation is labeled SMT, the AST of the operation is subject to the same labeling procedure discussed above. This treatment does not apply when these nodes are labeled search; in such cases, all the nodes within the AST of the referenced operation are labeled search. This is because, as we elaborate in Section 6.4, our approach hides from SMT any userdefined operation called from a node that is labeled search. Consequently, we need to handle any such operation call via search only.
6.2. The Search Step (Step 2, Fig. 6)
Step 2 runs one iteration of search. This step utilizes the searchbased OCL solver we developed previously (Soltana:2017, ) and outlined in Section 3. We refer to this solver as the baseline hereafter. We modify the baseline by limiting what search is allowed to do. Precisely, we allow search to manipulate only element types that are referenced by AST nodes labeled search or both. For example, if we were to satisfy S5 of the constraint in Fig. 5, search would be allowed to add / remove instances of only the following types: the PhysicalPerson and Address classes and the resides at association. Suppose that search creates an instance of PhysicalPerson. In doing so, ’s primitive attributes necessarily receive initial values (e.g., default or random values). However, once is added to the instance model, search is prohibited from tweaking ’s primitive attributes. This is because the primitive attributes of PhysicalPerson are referenced only within nodes that are labeled SMT in the AST of Fig. 7.
6.3. Avoiding Futile SMT Invocations (Step 3, Fig. 6)
Step 3 is aimed at avoiding SMT invocations that, given the current structure of the instance model, have no chance of solving the NNF constraint. We recall from Section 4 that the only means we make available to SMT for altering an instance model is by manipulating primitive attribute values. If a violation is unfixable this way, invoking SMT will be futile. For example, if the current instance model violates some multiplicity constraint, the violation cannot be fixed by SMT in our proposed solving process.
To decide whether to invoke SMT, we do as follows: First, we clone the (labeled) AST of the underlying NNF constraint. Next, in this (cloned) AST, we replace with true any node labeled SMT and representing a Boolean expression. Naturally, the branches below the replaced nodes are pruned. In Fig. 8, we show the result of this treatment applied to the AST of Fig. 7. The rationale behind the treatment is simple: We assume that SMT, if invoked, will be able to solve all subformulas delegated to it. If the constraint represented by the reduced AST evaluates to false on the current instance model, SMT cannot conceivably satisfy the whole constraint. Otherwise, we give SMT a chance to solve the subformulas delegated to it.
We make two remarks about Step 3. First, for the step to be sound, the original AST must represent an OCL expression in NNF, where negations have been pushed all the way inward, as discussed in Section 5. Second, the construction of the reduced AST is a oneoff activity. It is only the evaluation of this AST that needs to be repeated in each iteration of Step 3.
6.4. The SMT Step (Step 4, Fig. 6)
For a given instance model, Step 4 does the following: It first encodes into an SMTLIB formula the solving task that needs to be handled by SMT. Next, it invokes an SMT solver on the resulting formula. If the solver finds a solution, the solution is “lifted back” to the instance model. By lifting back, we mean that the values of variables in the solution are assigned to the instance model’s primitive attributes corresponding to the variables. For example, suppose that an instance model passes the check in Step 3 (Section 6.3), but still violates S5 of the NNF constraint in Fig. 5. In such a case, an SMT solver may be able to satisfy S5 by assigning appropriate values to the birthYear, disabilityType and disabilityRate of the PhysicalPerson instances already in the instance model.
6.4.1. Deriving the SMTLIB Formula
Alg. 1, named deriveFormula, presents our algorithm for constructing an SMTLIB formula. The algorithm takes the following as input: (1) the AST of an NNF constraint already processed by the labeling procedure of Section 6.1, and (2) an instance model passing the check discussed in Section 6.3. Initially, deriveFormula expands the primitiveattributescarrying quantifiers of the NNF constraint over the instance model (L. 2). The expansion hides quantification, thus allowing SMT to be invoked on a quantifierfree formula – this is where, as we noted in Section 3, existing SMT solvers are most efficient. After expansion, deriveFormula substitutes any subformula that is exclusive to search with the concrete evaluation of that subformula over the instance model (L. 3). The substitution hides from SMT those subformulas within the expansion whose evaluation will not be affected by SMT. Subsequently, the algorithm applies expansion and substitution to userdefined operations, taking into account the peculiarities posed by these operations (L. 45). Finally, deriveFormula translates the now processed NNF constraint and userdefined operations into an SMTLIB formula (L. 6).
In the remainder of this section, we elaborate the four algorithms used by Alg. 1 for (1) expansion, (2) substitution, (3) useroperation handling, and (4) SMTLIB translation, respectively.
(1) Expanding the NNF constraint: Alg. 2, named expand, reexpresses quantifiers in terms of their quantified objects. Expansion is warranted only for nodes whose body contains primitive attributes (L. 13). This is because SMT, the way we use it in our approach, cannot affect the evaluation of subformulas that have no primitive attributes. For example, consider S2 and S3 in the NNF constraint of Fig. 5. The quantifiers in these subformulas do not need to be expanded. A universal quantifier (forAll) is expanded by creating a copy of its body for each quantified object (L. 111), and then taking the conjunction of the copies (L. 1213). As an example, Fig. 9(b) shows S5 of the NNF constraint in Fig. 5 expanded over the instance model of Fig. 9(a). An existential quantifier (exists) is expanded similarly, but with a disjunction applied to the copies (L. 1415). If the collection of quantified objects is empty, the quantifier in question is replaced with true in case of forAll (L. 1920), and with false in case of exists (L. 21).
In addition to the explicit universal and existential quantifiers, expansion applies to certain other OCL operations which, logically speaking, are shortcuts for expressions with quantification (L. 2426 in Alg. 2). To illustrate, the operation c>includes(e), where c is a collection and e is an object, is equivalent to: c>exists(i i = e). We expand such shortcuts by making quantification explicit (L. 2526 in Alg. 2). In Appendix B, we provide a complete list of these shortcuts alongside their equivalent expressions with explicit quantification. In the final segment of the algorithm (L. 2831), expansion is applied recursively to the children of a visited node.
(2) Substituting subformulas that are exclusive to search: After expansion, Alg. 3, named substitute, hides from SMT the subformulas that are exclusive to search. Specifically, Alg. 3 substitutes by its concrete evaluation over the instance model any subformula whose root AST node has been labeled search by the procedure of Section 6.1 (L. 13). For example, consider the already expanded formula in Fig. 9(b). The AST nodes representing the two subformulas delimited by boxes have their root AST node labeled search (not shown in the figure). The substitute algorithm replaces these two subformulas with true, noting that T1 and C1 in Fig 9(a) are each associated to one instance of Address. Similarly to the expand algorithm (Alg. 2) discussed earlier, the final segment of the substitute algorithm (L. 58) recursively applies substitution to the children of the visited node.
Going back to the NNF constraint of Fig. 5, the expansion and substitution processes would unwind over the NNF constraint of Fig. 5 as follows: S1 to S3 are substituted by their concrete evaluation over the instance model without being expanded. The primitiveattributecarrying quantifiers in S4 and S5 are first expanded over the instance model of Fig 9(a) in the manner illustrated for S5 in Fig 9(b). Subformulas whose root AST nodes are labeled search are then substituted by their concrete evaluation over the same instance model.
(3) Handling userdefined operations: Userdefined operations may too contain quantification as well as subformulas that should be hidden from SMT. Since userdefined operations are expressed in OCL, we process them in the same way as the NNF constraint, i.e., by invoking expand and substitute (Algs. 2 and 3). Nevertheless, before expansion and substitution can be applied to userdefined operations, some preprocessing is required. Alg. 4, named processOperations and discussed next, tailors expansion and substitution to userdefined operations.
Initially, Alg. 4 determines which userdefined operations require expansion and substitution (L. 15). Specifically, these are the operations that are either directly or indirectly used by the (processed) NNF constraint via an AST node labeled SMT (L. 45). Indirect usage means that some operation op is not called directly within the NNF constraint; however, op appears in some chain of userdefined operation calls originating from the NNF constraint.
Due to OCL’s objectoriented nature, userdefined operations have access to properties that are encapsulated within the operations’ calling objects. For example, the getAge operation in Fig. 1(a) refers to the birthYear attribute of its calling object. In SMTLIB, however, only the input parameters and the builtin SMTLIB functions are accessible within the body of a given operation (function in SMTLIB). To get around this difference between OCL and SMTLIB, processOperations externalizes as a parameter any mutable primitive attribute within the userdefined operations that need to be processed (L. 68 in Alg. 4). For example, the operation call T1.getAge() in the constraint of Fig. 9(b) becomes T1.getAge(T1.birthYear); the OCL query that defines the operation is also updated accordingly. The getAge operation after externalization is provided in Fig. 10.
The expand and substitute algorithms need to be able to dynamically evaluate OCL subformulas. In the case of expand, this ability is required for extracting the quantified objects, and in the case of substitute – for calculating concrete results to replace the subformulas that should be hidden from SMT. To enable dynamic (on the fly) evaluations, we need knowledge of the calling object and parameter values of a given operation at the time the operation is processed. This knowledge is available a priori for the operations that are called directly by the (processed) NNF constraint. However, when an operation is called only indirectly, the knowledge needs to be obtained from the underlying chain of operation calls. To ensure that we have the required knowledge at the time we need it (during expansion or substitution), the dependencies of each userdefined operation need to be processed before the operation itself. In processOperations, this is achieved by ordering the userdefined operations to process according to their dependencies. Specifically, an operation in position should not depend on an operation in position (L. 9 in Alg. 4).
To illustrate, let us hypothetically assume that the AST in Fig. 5 is that of a userdefined operation op, instead of that of an NNF constraint. In this case, op must be processed first to enable the identification of the objects and parameters of calls to getAge. The reverse order, i.e., processing getAge before op, would not be an option because we neither know the calling object, nor the parameter value to pass to the (externalized version of) getAge shown in Fig. 10.
The final additional consideration regarding userdefined operations has to do with the fact that the same userdefined operation may be called by different objects. This means that (the AST of) a userdefined operation may expand differently for different calling objects. Further and during the substitution process, the concrete evaluation of the same subformula might yield different results for different calling objects. To handle this, processOperations replaces, for each calling object, the original operation with a new (cloned) operation that is specific to the calling object (L. 1016). This treatment does not apply to the (externalized) getAge operation in Fig. 10, since this operation has no quantification and has no subformulas that are exclusive to search (L. 11). For the sake of argument, let us hypothetically assume that the query defining getAge includes some quantification. Given the instance model of Fig. 9(a), the calls to getAge in the constraint of Fig. 9(b) would be replaced by getAge_for_T1 for T1 and by getAge_for_C1 for C1, with getAge_for_T1 and getAge_for_C1 being clones of the getAge operation (Fig. 10. After this final treatment, the userdefined operations that need processing are expanded and subsituted in exactly the same manner as the NNF constraint (L. 1719).
(4) Translation to SMTLIB: Alg. 5, named OCL2SMTLIB, translates the now processed NNF constraint and userdefined operations to SMTLIB. Fig. 11 presents the SMTLIB formula obtained for the NNF constraint of Fig. 5 and the instance model of Fig. 9(a). L. 1 of the formula of Fig. 11, produced by L. 1 of Alg. 5, indicates to the SMT solver that we would like to obtain a satisfying assignment if the formula is satisfiable. L. 24 of the formula, produced by L. 210 of Alg. 5, declare the UML enumeration types involved. L. 58 of the formula, produced by L. 1112 of Alg. 5, declare the involved variables corresponding to the primitive attributes of the instance model of Fig. 9(a). Each declared SMTLIB variable is bound to its corresponding primitive attributes in the instance model as shown by the annotations in Fig. 9(a). L. 910 of the formula, produced by L. 1317 of Alg. 5, define the required builtin and userdefined OCL operations in SMTLIB. For the NNF constraint in Fig. 5, only getAge needs to be defined. The generation of the SMTLIB header for getAge is straightforward (L. 17 of Alg. 5). The body of the corresponding SMTLIB function is built from the AST of getAge (shown in Fig. 10) by applying a set of predefined OCL to SMTLIB translation rules (L. 17 of Alg. 5). Our complete list of OCL to SMTLIB translation rules are provided in Appendix C.
Several builtin OCL operations such as abs, div, and mod are available as default functions in SMTLIB. Such operations do not require explicit SMTLIB function definitions. Builtin OCL operations without a counterpart SMTLIB function, e.g., max and round, are defined, when needed, using the rules listed in Appendix C (L. 1317 of Alg. 5). For example, max in OCL is translated into the following SMTLIB function: (definefun max ((a Real)(b Real)) Real (ite (>= a b) a b)).
L. 1116 of the formula in Fig. 11, produced by L. 18 of Alg. 5, represent the core assertion derived from the processed NNF constraint. This assertion is produced by applying the OCL to SMTLIB translation rules listed in Appendix C. Note that these rules are the same as those used for translating userdefined operation into SMTLIB.
6.4.2. Invoking the SMT Solver and Updating the Instance Model
Given the binding established between the variables in the constructed SMTLIB formula and the primitive attributes in the instance model (see Fig. 9(a)), updating the instance model is straightforward. For example, when invoked over the SMTLIB formula of Fig. 11, the solver may return the following satisfying assignment: X1 false, X2 FR, X3 false, X4 1918, X5 A, X6 1, X7 1975, X8 1, and X9 A. Using the binding shown in Fig. 9(a), the instance model is updated by lifting back this satisfying assignment. In our example, the updated instance model is already a valid solution and thus returned to the user.
7. Evaluation
In this section, we evaluate our hybrid solving approach over three industrial case studies.
7.1. Research Questions (RQs)
RQ1. How does our approach fare against the state of the art? We examine in RQ1 whether our hybrid approach presents practical improvements, compared to Alloy (AlloyTool, ), UMLtoCSP (UMLtoCSPTool, ), and pure search as implemented by the baseline we build on (SoltanaTool, ) (see Section 6.2).
RQ2. Can our approach generate large instance models in practical time? The need to test a large variety of system scenarios and system robustness and performance, e.g., load and stress testing (Zhang:11, ), entails large test inputs. RQ2 investigates how our approach scales as we have it produce increasingly larger instance models.
7.2. Tool Support
Fig. 12 provides an overview of our tool. NNF Transformer turns the input OCL constraints into an NNF constraint (Section 5). Constraint Labeler spreads the solving tasks over search and SMT (Section 6.1). Solver implements the three iterative steps depicted in Fig. 6 (Sections 6.2 through 6.4). Our tool uses Eclipse OCL (EclipseOCL, ) for building ASTs and evaluating OCL expressions, the OCL solver from our previous work (Soltana:2017, ) for searchbased solving, and Z3 (Z3Tool, ) for SMT solving. Excluding comments and thirdparty libraries, our tool consists of 16K lines of Java code. The tool is available online at https://sites.google.com/view/hybridoclsolver/.
Case A  Case B  Case C  
Class Diagram 
1  # of classes  
2  # of generalizations  
3  # of associations  
4  # of enumerations (Avg # of literals per enumeration)  17 (6.11)  4 (3.25)  8 (4.87)  
5  # of attributes
(primitive + nonprimitive) 
208 (158+50)  116 (72+44)  255 (219+36)  
NNF Constraint 
6  # of nodes in the AST of the NNF constraint  2004  1844  4014 
7  # of AST nodes labeled search  
8  # of AST nodes labeled SMT  
9  # of AST nodes labeled both  
10  # of unlabeled AST nodes (representing constants)  128  55  39 
7.3. Description of Case Studies
Our evaluation is based on three industrial case studies from three distinct domains. The case studies are denoted Case A, Case B, and Case C. The source material for each case study is a test data model expressed using UML using Class Diagrams (CDs) and OCL. Our case study material is borrowed from previous work. Specifically, Case A was built for system testing of an eGovernment application concerned with calculating personal income taxes (Soltana:2017, ), Case B for system testing of an occupant detection system in cars (Wang, ), and Case C for system testing of a satellite communication system (Dan, ). Our case study material (in a sanitized form) is available from our tool’s website. The example described in Section 2 and used throughout the article is a simplified excerpt from Case A.
The existing work strands from which our case study material originates employ different test strategies. In particular, Case A uses statistical testing (Runeson:95, ), CaseB uses coveragedriven modelbased testing (AliBHP10, ), and CaseC uses data mutation testing (Wong, ). Despite the different test strategies, the case studies all share a common technical challenge, which is to find a practical way of producing system test data under OCL constraints. Our evaluation in this article is not meant at assessing the effectiveness of the test strategies in the prior work where our case study material comes from. Instead, we evaluate the applicability and scalability of our hybrid OCL solving approach for system test data generation, irrespectively of test strategy.
Table 1 provides overall statistics about the CDs and OCL constraints in our case studies. We note that the constraint statistics (rows 6 to 10 of Table 1) are for the single NNF constraint derived in each case study (see Section 5), rather than the original constraints. This provides a more convenient basis for comparing the complexity of the solving tasks across the case studies when we discuss the evaluation results next.
7.4. Results and Discussion
In this section, we answer the RQs of Section 7.1 via experimentation. Our experiments were performed on a computer with a 3GHz dualcore processor and 16GB of memory. The maximum number of iterations for metaheuristic search was set to 1000.
RQ1. We ran our hybrid solver, the baseline searchbased solver, UMLtoCSP (Version from 2009 with its default ECLiPSe solver V5.0 (ECLiPSe, )), and Alloy (V5.0) on the case studies to which they are applicable. Alloy uses SAT solving as backend, and comes prepackaged with several alternative SAT solvers. We considered two alternatives: SAT4J (SAT4J, ) – Alloy’s default SAT solver – and miniSat (miniSat, ) – the SAT solver recommended in Alloy’s documentation for better performance (AlloyTool, ). The results we report below on Alloy are the ones obtained using miniSat, noting that Alloy used with miniSat consistently outperformed Alloy used with SAT4J in all our experiments.
The hybrid and baseline solvers are applicable to all three case studies. Alloy and UMLtoCSP are applicable to Case B only. Case A and Case C have features which are common in practice, but which Alloy and UMLtoCSP do not support. In the case of Alloy, the main issues for which we could not find a workaround had to do with the absence of real / floatingpoint arithmetic and string operations. In the case of UMLtoCSP, the main limitation was the lack of support for
null and OclInvalid. This meant that several OCL operations used in our case studies, e.g., oclIsUndefined (illustrated in Fig. 1(b)) and oclIsInvalid, could not be handled by UMLtoCSP. We note that for Case B, UML2Alloy (UML2AlloyTool, ) (V0.52) turned out to be adequate for translating that case study to Alloy’s input language. The results we report here for Alloy are based directly on the Alloy model derived by UML2Alloy.For each solver considered, we measured the execution time. Since the baseline and hybrid solvers are randomized, we ran these solvers 100 times each to account for random variation. For these two solvers, we additionally computed the success rate, i.e., how often each solver succeeded in finding a solution. The CDs in our case studies happen to have a root class, i.e., a class from whose instances one can reach all other objects in an instance model. In Case A, this class is Household – a unit of taxation (not shown in our example of Fig. 1). In Case B, the class is BodySenseSystem – a container for all system components. In Case C, the class is Vcdu – a container for data packages transmitted over a satellite channel. Through the notion of nonemptiness constraint discussed in Section 5, we instructed each applicable solver to generate exactly one instance of the root class. This is a natural choice for comparison purposes, since a single instance of the root class represents the smallest meaningful (system) test input in each of our case studies. Had the case studies not had a root class, we would have chosen multiple classes to instantiate in a way that would match system testing needs. We further note that one can seed the hybrid and baseline solvers with a partial (and potentially invalid) instance model as the starting point. We did not use this feature; both solvers used the default option, which is to start from an empty instance model.
Case A  Case B  Case C  
Baseline  Hybrid  Alloy  UMLtoCSP  Baseline  Hybrid  Baseline  Hybrid  
1  Succeeded?  In 28 / 100
runs 
In 100 / 100
runs 
Yes  Yes  In 100 / 100
runs 
In 100 / 100
runs 
In 0 / 100
runs 
In 100 / 100
runs 
2  Execution time (one run)  Avg = 587.2
sec SD = 383.2 
Avg = 27.8
sec SD = 18.25 
0.091 sec

1.68 sec

Avg = 4.5
sec SD = 1.7 
Avg = 3.6
sec SD = 0.7 
Avg = 1771
sec SD = 538.9 
Avg = 44.3
sec SD = 7.2 
3  Instance model size (in a successful run)  Avg = 28.8
obj. SD = 5.8 
Avg = 36.6
obj. SD = 7.1 
21 obj.

19 obj.

Avg = 19.7
obj. SD = 2.1 
Avg = 23.3
obj. SD = 3.1 
–  Avg = 121.4
obj. SD = 6.1 
Table 2
shows the results of RQ1, alongside the size of the generated instance models. Note that for the baseline and hybrid solvers, we report averages (Avg) and standard deviations (SD), since each solver was executed 100 times.
As seen from the table, the hybrid solver was applicable to all three case studies and able to produce results in all its runs. In Case B, all four solvers were equally successful and very quick, with Alloy being the quickest. How the execution time trends observed over Case B evolve as one attempts to build larger instance models is assessed in RQ2.
With regard to the performance of the hybrid solver compared to the baseline, the results in Table 2 suggest that the hybrid solver brings no practical benefit in Case B. This can be explained as follows: The primitive attributes in Case B are less constrained than those in Case A and Case C (rows 8 and 9 of Table 1). We further observed that most operations on the primitive attributes in Case B are simple and of the form x op constant, where x is a primitive attribute and op is an equality or inequality operator. Consequently, in Case B, search was as effective as SMT in handling primitive attributes.
In Case A and Case C, our hybrid solver presents compelling improvements over the baseline. This implies that the way we employ SMT in our approach is beneficial. Specifically, in Case A, the hybrid solver outperformed the baseline by a factor of 3.5 in terms of success rate and a factor of 21.7 in terms of execution time. In Case C, only the hybrid solver was able to generate data.
Although Case A and Case B are similar with respect to the size of the NNF constraint (row 6 of Table 1) and the size of the generated data (row 3 of Table 2), the performance difference between the hybrid solver and the baseline is much larger in Case A than in Case B. The better performance of the hybrid solver in Case A is due to SMT being more efficient than search at handling linear arithmetic, which is not only more prevalent in Case A, but also more complex. Notably, Case A has numerous inequalities with more than one primitive attribute and using OCL operations such as max and round.
Case C is inherently more complex than Case A and Case B, as suggested by Tables 1 and 2. The additional complexity has a detrimental effect on the baseline solver, resulting in all runs of the baseline solver on Case C to be unsuccessful. This outcome does not imply that the baseline solver is theoretically bound to fail on Case C. Rather, the outcome is a strong indication of the baseline solver being inefficient for Case C.
[style=MyFrame] The answer to RQ1 is that, for one of our case studies, Case B, which has comparatively simple constraints, all the techniques considered were as effective, and produced results within seconds. For the other two case studies, Case A and Case C, Alloy and UMLtoCSP were not applicable, thus reducing the set of alternatives to pure search and our hybrid approach. How our approach fares against pure search depends on the complexity of the constraints over primitive attributes. If such constraints are few and simple, both approaches have similar performance. But, when primitive attributes are subject to many complex restrictions, our approach performs substantially better. In Case A, our approach was 3.5 times more successful and 21.7 faster than pure search. In Case C, only our approach yielded results.
RQ2. We measured the execution time of our hybrid solver for building progressively larger instance models – a necessity for system testing, as noted earlier. Whereas in RQ1 we set to one the number of instances of the root class, in RQ2, we initially set this number to 20 and increased it up to 100 in increments of 20. Due to the nature of the root class (see RQ1), increasing its number of instances has a knockon effect, leading to an acrosstheboard increase in the size of the instance model.
This knockon effect comes primarily from the fact that most NNFconstraint fragments are of the shape Context.allInstances()>forAll(condition), potentially with several nested layers of quantification. The more instances of Context there are, the more computationally intensive solving becomes, particularly the expansion and fitness function calculations.
Despite the generated data having multiple roots, it is important to note that the data is not necessarily the union of independently constructed singlerooted instance models. This is because the roots are often logically connected. For a simple example from Case A, taxpayers are required to have distinct identifiers; this induces a global relationship between all taxpayers in all households (roots). For another example from the same case, consider child support which plays a role in several of the constraints. Case A allows children to be supported by taxpayers from different households; this is necessary, for example, when the parents of a child are divorced, with the child being in one household and the (financially) supporting parent in another. To build a realistic instance model, we need to cover such situations and thus be able to create links between objects falling under different roots. In short, building an instance model with multiple roots is not always decomposable into simpler solving tasks.
As a final remark, we note that, although this has no bearing on RQ2, we did not attempt to maximize diversity when creating multiple class instances in the instance model. If necessitated by the test strategy, diversity can be effectively enforced by dynamically updating the OCL constraints with new inequalities to guide data generation toward diversity (Soltana:2017, ).
Fig. 13 shows for each case study the execution time of the hybrid solver over instance models containing 20 to 100 instances of the root class. To account for random variation, we calculated average execution times based on ten runs. For example, in Case C, the average execution time for generating an instance model with 60 instances of the root class is 143.6 minutes (SD=29.4). In the figure, we further report the average number of objects in the generated instance models ().
In addition, Fig. 13 provides a breakdown of the execution times over the iterative steps of our hybrid solving process, i.e., Steps 2 through 4 of Fig. 6. The breakdown indicates that the time required for Step 3 (checking whether SMT should be invoked) is negligible. The proportions of the execution times for Steps 2 and 4 reflect how the solving load is spread over search and SMT, taking into account both the characteristics of the NNF constraint to solve and the size of the instance model. Across all the runs underlying the results of Fig. 13, 79.5% of the execution time in Step 4 was spent on expansion and substitution, 0.7% on SMT solving by Z3, and the remaining 19.8% on lifting the output of Z3 back to the instance model. From these percentages, we observe that the time spent by Z3 is proportionally small. This suggests that the subformulas delegated to SMT have been handled by efficient decision procedures.
As expected, Fig. 13 suggests an exponential trend in the execution times with the size of the instance models increasing. Despite this, all runs of our solver were successful, meaning that the solver maintained its 100% success rate (reported in RQ1) over substantially larger instance models. In Case A and Case B, which, as we argued in RQ1, are less complex than Case C, the execution times increase slowly and within reasonable range. In Case C, one needs to take additional measures for scalability, should one need instance models that are much larger than those in our experiments. Particularly, one may utilize existing mechanisms for enhancing the performance of constraint solving, e.g., model slicing (Slicing, ; Slicing1, ), parallelization (paraAlloy, ; Ranger, ), and bound reduction (JordiBound, ).
As shown in Table 2 and discussed in RQ1, for Case B, Alloy and UMLtoCSP are faster than our hybrid solver, when they are tasked with generating one instance of the root class. A natural question that arises here is whether this performance advantage carries over to instance models of larger sizes.
To answer the above question, we ran Alloy and UMLtoCSP on Case B, while iteratively requesting larger instance models. Specifically, in each iteration, we incremented by one the number of instances of the root class, and kept doing so until the execution time exceeded a predefined time budget. We set the budget at 6 minutes; this approximately corresponds to how long the hybrid solver took in Case B to produce an instance model with 100 instances of the root class (see Fig. 13).
Within the allowed time budget, Alloy and UMLtoCSP produced instance models with 54 and 7 instances of the root class, respectively. This suggests that, when the size of the data to generate grows, our hybrid solver increasingly becomes a better alternative, even for simple data generation problems.
[style=MyFrame] The answer to RQ2 is that, for our most complex case study, Case C, our approach generated in less than 13 hours data samples with over 10K objects. This level of performance is practical for testing, where one can start the generation of data well before the data is needed, and as soon as one has a stable test data model. Alloy and UMLtoCSP did not maintain the performance edge they had over Case B, as the size of the data sample to generate was increased.
8. Limitations and Threats to Validity
Limitations. Our approach is nonexhaustive. This means that, given a size bound on the instance model to build (the universe), our approach can neither guarantee that it will find a solution within this bound when there is one, nor guarantee the absence of such a solution when there are none. In other words, our approach cannot prove bounded (un)satisfiability. Note that OCL satisfiability checking is in general undecidable (simp1, ). Generally speaking, in any situation where both search and SMT are inefficient, our hybrid solver will be inefficient too. One such situation (not encountered in our case studies) is when the constraints contain recursive userdefined operations. We inherit some limitations from the searchbased solver we build on. The solver, despite covering the entire OCL, provides only coarsegrained fitness functions for certain constructs, e.g., subOrderedSet (Soltana:2017, ). This does not pose a practical problem, since the constructs in question are rarely used.
Threats to validity. Internal and external validity are the validity aspects most pertinent to our empirical evaluation.
With regard to internal validity, we need to note the following: Alloy requires upfront bounds on signature (class) instantiations; and UMLtoCSP requires bounds on both class and association instantiations. Selecting these bounds is critical and yet nontrivial, noting that large bounds often lead to time and / or memory blowup. In our experiments, we were as conservative as possible when comparing to Alloy and UMLtoCSP. Specifically, we minimized the bounds with hindsight from the hybrid and baseline solvers’ generated data. By giving Alloy and UMLtoCSP this advantage, we mitigate as much as possible the confounding effects posed by the tuning of the bounds.
As for external validity, we note that we applied our approach to three industrial case studies from different domains. The evaluation results are easy to interpret, and clearly suggest that our approach is widely applicable and beneficial, particularly for complex data generation problems that are common in system testing practice. This said, further case studies are necessary for improving external validity.
9. Related Work
Numerous approaches exist for automated test data generation. Several of these approaches work by deriving test data from source code (surevyCode, ). For example, CAUT (CAUT, )
uses dynamic symbolic execution and linear programming to generate unit test data for C programs. Given a small size bound, Korat
(Korat, ) generates all nonisomorphic test inputs for Java programs annotated using the Java Modeling Language (JML) (JML, ). SUSHI (SUSHI1, ; SUSHI2, ) combines symbolic execution and metaheuristic search to built input data that can exercise a given path in Java code.Whitebox approaches like the above are usually inapplicable to system testing, which, as we noted earlier, is a primarily blackbox activity (Utting:07, ). In particular, whitebox approaches cannot be easily used when the source code is composed of several heterogeneous languages or when the code is inaccessible / unavailable due to reasons such as confidentiality and the presence of thirdparty components. These situations are common when testing at the system level. In contrast to whitebox test data generation approaches, our approach requires no knowledge about the source code of the system under test, and can be applied as soon as one has a data model characterizing the system inputs.
Our data generation approach relates more closely to those that work based on a conceptual specification of data. In the context of UML, most such approaches utilize either only Class Diagrams (Dania:2016, ; Shaukat:2013, ; Shaukat:2016, ), or a combination of Class Diagrams and behavioral models such as State Machines (behave1, ) and Sequence Diagrams (behave2, ). Regardless of the exact formulation of UMLbased data generation, a core challenge in this line of work is the satisfaction of OCL constraints (Gonzalez:2014, ). The most commonly used strategy for OCL solving is through translation to SAT, SMT, and other formalisms equipped with solvers. Alloy (JacksonBook2012, ) and UMLtoCSP (UMLtoCSP, ) are two notable approaches in this class. Alloy is a generalpurpose SATbased analyzer for firstorder logic. Alloy can act as an OCL solver when complemented with a tool such as UML2Alloy (UML2Alloy, ), which adapts UML class diagrams and OCL constraints to Alloy’s input language. UMLtoCSP, which is specifically geared toward OCL solving, has as translation target a Prologbased constraint programming language. Since Alloy and UMLtoCSP are currently the main technologies used for OCL solving, we empirically compared our approach against them in our evaluation (Section 7). In addition to Alloy and UMLtoCSP, there are several threads of work where OCL constraints are solved through translation, with the main translation targets being SAT (SoekenWD11, ; KuhlmannG12, ; PrzigodaSWD16, ) and SMT (CantenotAB14, ; Przigoda:2016, ; Dania:2016, ).
The above approaches have two main limitations: First, they all have to translate class diagrams and OCL constraints into other formalisms. Since the formalisms considered are not sufficiently expressive, compromises have had to be made in terms of supported features. For example, we faced applicability problems with Alloy and UMLtoCSP in two out of our three case studies. Another limitation is that these approaches do not scale well for test data generation; we empirically demonstrated this for Alloy and UMLtoCSP.
To mitigate challenges posed by undecidability and the of lack scalability, there has been work on identifying decidable fragments in OCL (OCLLite, ; simp1, ; simp2, ). In practice, narrowing OCL to decidable fragments is problematic, since doing so often makes certain constraints inexpressible, and further poses a usability challenge to users, even when the constraints can be written in a decidable fragment.
An alternative strategy for OCL solving is metaheuristic search (Shaukat:2013, ; Shaukat:2016, ; Soltana:2017, ). Due to their nonexhaustive nature, searchbased approaches cannot provide guarantees about either satisfiability or unsatisfiability. Nevertheless, for activities such as simulation and testing, such guarantees are not necessary, noting that the objective of OCL solving here is not to assess the quality of a model, but rather generate synthetic data from a model that has been validated a priori. Searchbased OCL solving approaches are parallelizable, and have been shown to scale well for large and complex data generation problems. Further, these approaches cover nearly the entire OCL language – a direct benefit of not requiring a translation to another formalism – and thus present a major applicability advantage. Despite these advantages, search is a randomized process and not as effective as a decision procedure for handling the decidable fragments of OCL.
The approach that we proposed in this article is, to our knowledge, the first attempt at combining the best traits of searchbased and translationbased approaches to OCL solving. Our approach makes no compromises in terms of expressiveness (applicability), and yet provides better scalability than alternatives. Applicability and scalability are both paramount for system test generation driven by data models.
There is work outside UML on specificationbased data generation. A recent strand exploits a combination of exhaustive search and graph transformation for generating wellformed graph structures (VarroICSE, ). Here, the underlying constraints are defined through graphlogic expressions. While this approach has the advantage of providing guarantees in terms of satisfiability, the underlying constraint language has limited expressiveness and covers only a small fragment of the OCL metamodel. Furthermore, the approach does not provide adequate support for integer, real, and string variables beyond treating them as enumerations. Our approach covers the entire OCL and does not suffer from the above limitations.
10. Conclusion
We proposed a toolsupported approach based on UML for generating system test data under constraints. Our approach hybridizes metaheuristic search and SMT for solving complex constraints expressed in OCL. To our knowledge, we are the first to develop such a hybrid approach for constraint solving. We evaluated our approach on three industrial case studies, showing that the approach brings about important benefits in terms of applicability and scalability, when used on complex realworld problems.
Our current approach uses SMT only for solving constraints over primitive attributes. In the future, we would like to look into ways to exploit SMT more broadly, and thus further tap into the potential of background SMT theories. Another topic for future work has to do with how we delegate solving tasks to search and SMT. Currently, our delegation process is static. We plan to investigate whether a dynamic delegation strategy, e.g., using a machinelearningbased feedback loop, can provide advantages.
References
 [1] Shaukat Ali, Lionel C. Briand, Hadi Hemmati, and Rajwinder Kaur PanesarWalawege. A systematic review of the application and empirical investigation of searchbased test case generation. IEEE Transactions on Software Engineering, 36(6):742–762, 2010.
 [2] Shaukat Ali, Muhammad Zohaib Z. Iqbal, Andrea Arcuri, and Lionel C. Briand. Generating test data from OCL with search techniques. IEEE Transactions on Software Engineering, 39(10):1376–1402, 2013.
 [3] Shaukat Ali, Muhammad Zohaib Z. Iqbal, Maham Khalid, and Andrea Arcuri. Improving the performance of OCL constraint solving with novel heuristics for logical operations: A searchbased approach. Empirical Software Engineering, 21(6):2459–2502, 2016.
 [4] Paul Ammann and Jeff Offutt. Introduction to Software Testing. Cambridge University Press, 2nd edition, 2016.
 [5] Kyriakos Anastasakis. UML2Alloy (Version 0.52), 2009. http://www.cs.bham.ac.uk/~bxb/UML2Alloy/, last accessed: January 2019.
 [6] Kyriakos Anastasakis, Behzad Bordbar, Geri Georg, and Indrakshi Ray. UML2Alloy: A challenging model transformation. In Proceedings of the 10th ACM/IEEE International Conference on Model Driven Engineering Languages and Systems (MoDELS’07), pages 436–450, 2007.
 [7] Clark W. Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanović, Tim King, Andrew Reynolds, and Cesare Tinelli. CVC4. In Proceedings of the 23rd International Conference on Computer Aided Verification (CAV’11), pages 171–177, 2011.
 [8] Clark W. Barrett, Pascal Fontaine, and Cesare Tinelli. The SMTLIB standard: Version 2.6. Technical report, The University of Lowa, 2017. http://smtlib.cs.uiowa.edu/language.shtml, last accessed: January 2019.
 [9] Clark W. Barrett and Cesare Tinelli. Satisfiability Modulo Theories. In Handbook of Model Checking, pages 305–343. Springer, 2018.
 [10] Daniel Le Berre and Anne Parrain. The SAT4J library, release 2.2. Journal on Satisfiability, Boolean Modeling and Computation, 7(23):59–6, 2010.
 [11] Chandrasekhar Boyapati, Sarfraz Khurshid, and Darko Marinov. Korat: automated testing based on Java predicates. In Proceedings of the 11th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’02), pages 123–133, 2002.
 [12] Pietro Braione, Giovanni Denaro, Andrea Mattavelli, and Mauro Pezzè. Combining symbolic execution and searchbased testing for programs with complex heap inputs. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’17), pages 90–101, 2017.
 [13] Pietro Braione, Giovanni Denaro, Andrea Mattavelli, and Mauro Pezzè. SUSHI: a test generator for programs with complex structured inputs. In Proceedings of the 40th IEEE/ACM International Conference on Software Engineering: Companion Proceeedings (ICSE’18), pages 21–24, 2018.
 [14] Lilian Burdy, Yoonsik Cheon, David R. Cok, Michael D. Ernst, Joseph R. Kiniry, Gary T. Leavens, K. Rustan M. Leino, and Erik Poll. An overview of JML tools and applications. International Journal on Software Tools for Technology Transfer, 7(3):212–232, 2005.
 [15] Jordi Cabot, Robert Clarisó, and Daniel Riera. UMLtoCSP: A tool for the formal verification of UML/OCL models using constraint programming. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE’07), pages 547–548, 2007.
 [16] Jordi Cabot, Robert Clarisó, and Daniel Riera. Verifying UML/OCL operation contracts. In Proceedings of the 7th International Conference on Integrated Formal Methods (IFM’09), pages 40–55, 2009.
 [17] Jérôme Cantenot, Fabrice Ambert, and Fabrice Bouquet. Test generation with Satisfiability Modulo Theories solvers in modelbased testing. Software Testing, Verification and Reliability, 24(7):499–531, 2014.
 [18] Felix Chang, Emina Torlak, Julie Pelaez, and Daniel Jackson. Alloy Analyzer (Version 5), 2018. http://alloytools.org/, last accessed: January 2019.
 [19] Robert Clarisó, Carlos A. González, and Jordi Cabot. UMLtoCSP, 2009. http://gres.uoc.edu/UMLtoCSP/, last accessed: January 2019.
 [20] Robert Clarisó, Carlos A. González, and Jordi Cabot. Smart bound selection for the verification of UML/OCL class diagrams. ACM Transactions on Software Engineering, In press.
 [21] Coninfer Ltd. The ECLiPSe constraint programming system (version 5.0), 2006. http://eclipseclp.org/, last accessed: January 2019.
 [22] Carolina Dania and Manuel Clavel. OCL2MSFOL: A mapping to manysorted firstorder logic for efficiently checking the satisfiability of OCL constraints. In Proceedings of the 19th ACM/IEEE International Conference on Model Driven Engineering Languages and Systems (MoDELS’16), pages 65–75, 2016.
 [23] Leonardo Mendonça de Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08), pages 337–340. Springer, 2008.
 [24] Leonardo Mendonça de Moura and Nikolaj Bjørner. Satisfiability Modulo Theories: Introduction and applications. Communications of the ACM, 54(9):69–77, 2011.
 [25] Leonardo Mendonça de Moura and Nikolaj Bjørner (Microsoft Research). Z3 Solver, 2012. https://github.com/Z3Prover/z3, last accessed: January 2019.
 [26] Rina Dechter and David Cohen. Constraint processing. Morgan Kaufmann, 2003.
 [27] Daniel Di Nardo, Fabrizio Pastore, and Lionel C. Briand. Augmenting field data for testing systems subject to incremental requirements changes. ACM Transactions on Software Engineering and Methodology, 26(1):1–40, 2017.
 [28] Eclipse Foundation. Eclipse OCL (version 6.4.0), 2018. http://www.eclipse.org/ocl, last accessed: January 2019.
 [29] Niklas Eén and Niklas Sörensson. An extensible satsolver. In Proceedings of the 6th International Conference on Theory and Applications of Satisfiability Testing (SAT’03), pages 502–518, 2003.
 [30] Carlos A. González and Jordi Cabot. Formal verification of static software models in MDE: A systematic review. Information & Software Technology, 56(8):821–838, 2014.
 [31] Mark Harman, S. Afshin Mansouri, and Yuanyuan Zhang. Searchbased software engineering: Trends, techniques and applications. ACM Computing Surveys, 45(1):11:1–11:61, 2012.
 [32] Patrick Hurley. A concise introduction to logic. Nelson Education, 2014.
 [33] Muhammad Zohaib Z. Iqbal, Andrea Arcuri, and Lionel C. Briand. Combining searchbased and adaptive random testing strategies for environment modelbased testing of realtime embedded systems. In Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE’02), pages 136–151, 2012.
 [34] Muhammad Zohaib Z. Iqbal, Andrea Arcuri, and Lionel C. Briand. Environment modeling and simulation for automated testing of soft realtime embedded software. Software & System Modeling, 14(1):483–524, 2015.
 [35] Stefan J. Galler and Bernhard K. Aichernig. Survey on test data generation tools. International Journal on Software Tools for Technology Transfer, 16:727––751, 2014.
 [36] Daniel Jackson. Software Abstractions: Logic, language, and analysis. Massachusetts Institute of Technology Press, 2012.
 [37] Guy Katz, Clark W. Barrett, Cesare Tinelli, Andrew Reynolds, and Liana Hadarean. Lazy proofs for DPLL(T)based SMT solvers. In Proceedings of the 2016 International Conference on Formal Methods in ComputerAided Design (FMCAD’16), pages 93–100, 2016.
 [38] Bogdan Korel. Automated software test data generation. IEEE Transactions on Software Engineering, 16(8):870–879, 1990.
 [39] Mirco Kuhlmann and Martin Gogolla. Strengthening SATbased validation of UML/OCL models by representing collections as relations. In Proceedings of the 8th European Conference on Modelling Foundations and Applications (ECMFA’12), pages 32–48, 2012.
 [40] BaoLin Li, ZhiShu Li, Li Qing, and YanHong Chen. Test case automate generation from UML sequence diagram and OCL expression. In Proceedings of the 2007 International conference on Computational Intelligence and Security (CIS’07), pages 1048–1052, 2007.
 [41] Leonid Libkin. Elements Of Finite Model Theory (Texts in Theoretical Computer Science. An EATCS Series). Springer Verlag, 2004.
 [42] Phil McMinn. Searchbased software test data generation: A survey. Software Testing, Verification and Reliability, 14(2):105–156, 2004.
 [43] Glenford J. Myers, Corey Sandler, and Tom Badgett. The Art of Software Testing. Wiley Publishing, 3rd edition, 2011.
 [44] Object Management Group. Object Constraint Language 2.4 Specification, 2004. http://www.omg.org/spec/OCL/2.4/, last accessed: January 2019.
 [45] Object Management Group. OMG Unified Modeling Language (UML), 2015. http://www.omg.org/spec/UML/2.5, last accessed: January 2019.
 [46] Xavier Oriol and Ernest Teniente. Ocl: Expressive UML/OCL conceptual schemas for finite reasoning. In Proceedings of the 36th International Conference on Conceptual Modeling (ER’17), pages 354–369, 2017.
 [47] Xavier Oriol and Ernest Teniente. Simplification of UML/OCL schemas for efficient reasoning. Journal of Systems and Software, 128:130–149, 2017.
 [48] Nils Przigoda, Mathias Soeken, Robert Wille, and Rolf Drechsler. Verifying the structure and behavior in UML/OCL models using satisfiability solvers. IET CyperPhysical Systems: Theory & Applications, 1(1):49–59, 2016.
 [49] Nils Przigoda, Robert Wille, and Rolf Drechsler. Ground setting properties for an efficient translation of OCL in SMTbased model finding. In Proceedings of the 19th ACM/IEEE International Conference on Model Driven Engineering Languages and Systems (MoDELS’16), pages 261–271, 2016.

[50]
Anna Queralt, Alessandro Artale, Diego Calvanese, and Ernest Teniente.
OCLLite: Finite reasoning on UML/OCL conceptual schemas.
Data & Knowledge Engineering
, 73:1–22, 2012. 
[51]
Alan Robinson and Andrei Voronkov.
Handbook of Automated Reasoning
. Elsevier, 2001.  [52] Nicolás Rosner, Juan P. Galeotti, Carlos López Pombo, and Marcelo F. Frias. ParAlloy: Towards a framework for efficient parallel analysis of Alloy models. In Proceedings of the 2nd Abstract State Machines, Alloy, B and Z (ABZ’10), pages 396–397, 2010.
 [53] Nicolás Rosner, Junaid Haroon Siddiqui, Nazareno Aguirre, Sarfraz Khurshid, and Marcelo F. Frias. Ranger: Parallel analysis of Alloy models by range partitioning. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE’13), pages 147–157, 2013.
 [54] Per Runeson and Claes Wohlin. Statistical usage testing for software reliability control. Informatica, 19(2):195–207, 1995.
 [55] Julia Seiter, Robert Wille, Mathias Soeken, and Rolf Drechsler. Determining relevant model elements for the verification of UML/OCL specifications. In Design, Automation and Test in Europe (DATE’13), pages 1189–1192, 2013.
 [56] Oszkár Semeráth, Andraás Szabolcs Nagy, and Dániel Varró. A graph solver for the automated generation of consistent domainspecific models. In Proceedings of the 40th IEEE/ACM International Conference on Software Engineering (ICSE’18), pages 969–980, 2018.
 [57] Asadullah Shaikh, Robert Clarisó, Uffe Kock Wiil, and Nasrullah Memon. Verificationdriven slicing of UML/OCL models. In Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering (ASE’10), pages 185–194, 2010.
 [58] Mathias Soeken, Robert Wille, and Rolf Drechsler. Encoding OCL data types for SATbased verification of UML/OCL models. In Proceedings of the 5th International Conference on Tests and Proofs (TAP’11), pages 152–170, 2011.
 [59] Ghanem Soltana, Mehrdad Sabetzadeh, and Lionel C. Briand. Searchbased synthetic data generator, 2017. http://people.svv.lu/tools/SDG/, last accessed: January 2019.
 [60] Ghanem Soltana, Mehrdad Sabetzadeh, and Lionel C. Briand. Synthetic data generation for statistical testing. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17), pages 872–882, 2017.
 [61] Ting Su, Zhoulai Fu, Geguang Pu, Jifeng He, and Zhendong Su. Combining symbolic execution and model checking for data flow testing. In Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE’15), pages 654–665, 2015.
 [62] Mark Utting and Bruno Legeard. Practical modelbased testing: A tools approach. Elsevier, 2010.
 [63] Chunhui Wang, Fabrizio Pastore, Arda Goknil, Lionel C. Briand, and Muhammad Zohaib Z. Iqbal. Automatic generation of system test cases from use case specifications. In Proceedings of the 24th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’15), pages 385–396, 2015.
 [64] Jos B Warmer and Anneke G Kleppe. The Object Constraint Language: Getting your models ready for MDA. AddisonWesley Professional, 2003.
 [65] W. Eric Wong. Mutation Testing for the New Century. Kluwer Academic Publishers, Norwell, MA, USA, 2001.
 [66] Pingyu Zhang, Sebastian G. Elbaum, and Matthew B. Dwyer. Automatic generation of load tests. In Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE’11), pages 43–52, 2011.
Appendix A Rules for Delegating Solving Tasks to Search and SMT
Table 3 provides our rule set for labeling the AST nodes of OCL expressions. The rules cover all the AST node types in the OCL 2.4 metamodel (https://www.omg.org/spec/OCL/2.4/). Fig. 14 depicts a simplified version of this metamodel, with only the inheritance (and not the composition) relationships shown. In the figure, abstract AST node types are shaded grey.
Each row in Table 3 defines one labeling rule. The first column of the table indicates the node type (from the metamodel of Fig. 14). When the different subtypes of the node type in the first column require different treatments (i.e., need to be assigned different labels), the second column enumerates the subtypes separately. The fourth column shows the label to attach to a node type (or subtype) if the conditions specified in the third column holds.
Node type  Subtypes  Conditions  Label 

PrimitiveLiteralExp’s subtypes, EnumLiteralExp       
TupleLiteralExp, InvalidLiteralExp, CollectionLiteralExp, TypeExp, LoopExp’s subtypes      search 
NavigationCallExp’s subtypes      search 
VariableExp    If the variable is nonprimitive  search 
If the variable is primitive, but is contained (directly or indirectly) within the body of an exists, select, reject, any, isUnique, or one  both  
If neither of the above two conditions hold  SMT  
If the defined variable is primitive  SMT  
OperationCallExp  and, or, not, <, >, <=, >=, +, , *, /  If all children are unlabeled   
If all labeled children have search as label  search  
If all labeled children have SMT as label  SMT  
If none of the above three conditions hold  both  
=, <>  If the operation compares collections  search  
If the operation compares numbers and all children are unlabeled    
If the operation compares numbers and all labeled children have search as label  search  
If the operation compares numbers and all labeled children have SMT as label  SMT  
If none of the above four conditions hold  both  
size, max, min  If the operation is called over a collection  search  
If the operation is called over a primitive type, and at the same time, the operation is (directly or indirectly) contained within the body of an exists, select, reject, any, isUnique, or one  both  
If neither of the above two conditions hold  SMT  
concat, substring, toInteger, toReal, round, ceil, abs, floor, div, mod  If the operation is (directly or indirectly) contained within the body of an exists, select, reject, any, isUnique, or one  both  
Otherwise  SMT  
toLower, sum,toUpper, union, allInstances, oclIsInvalid, oclIsKindOf, oclIsTypeOf, oclIsUndefined, oclAsType, any, asBag, collect, asOrderedSet, asSequence, asSet, collectNested, count, excludes, excludesAll, excluding, flatten, includes, includesAll, including, isEmpty, isUnique, notEmpty, one, product, reject, select, sortedBy, append, at, first, last, indexOf, insertAt, prepend, subsequence, intersection, subOrderedSet, symmetricDifference    search  
userdefined  If the operation is nonrecursive with a primitive return type  SMT  
Otherwise  search  
IfExp    If the body of the ifthenelse has a nonprimitive return type  search 
If the the body of the ifthenelse has a primitive return type, but is (directly or indirectly) contained within an exists, select, reject, any, isUnique, or one  both  
If neither of the above two conditions hold  SMT  
AttributeCallExp    If the node represents an access to a nonprimitive attribute  search 
If the nodes represents an access to a primitive attribute that is (directly or indirectly) contained within the body of an: exists, select, reject, any, isUnique, or one  both  
If none of the above two conditions hold  SMT  
Appendix B Expanding OCL’s Builtin Quantification Shortcuts during the Construction of an SMTLIB Formula
Table 4 lists all the OCL constructs that involve implicit quantification and thus need to be expanded over a given instance model when an SMTLIB formula is being built. The first column of the table presents the OCL construct to be expanded. The second column describes how the expansion is performed when the construct is applied to nonempty collections. The third column provides the literal that replaces the construct (1) when the construct is called over an empty collection, or (2) in cases where the construct compares two collections, when at least one collection is empty.
OCL construct  Description of the expansion process  Expansion involving empty collections 

excludes  Expanding c>excludes(x), where c is a collection and x is a given element, is equivalent to expanding: (not c>exists(i i = x)).  true 
excludesAll 
Expanding c1>excludesAll(c2), where c1 and c2 are two collections, is equivalent to expanding:
(c2>forAll(i not c1>exists(j i = j)). 
true 
includes  Expanding c>includes(x), where c is a collection and x is a given element, is equivalent to expanding: (c>exists(i i = x)).  false 
includesAll 
Expanding c1>includesAll(c2), where c1 and c2 are two collections, is equivalent to expanding:
(c2>forAll(j c1>exists(i i = j))). 
false 
isUnique 
Expanding c>isUnique(condition over c), where c is a collection, is equivalent to expanding:
(c>forAll(i c>forAll(j i = j or condition over c <> condition over c))). 
true 
one 
Expanding c>one(condition over c), where c is a collection, is equivalent to expanding:
c>exists(i condition over c and c>forAll(j i <> j and not condition over c)). 
false 
= (for sequences) 
Expanding s1 = s2, where s1 and s2 are two sequences, is equivalent to expanding:
(s1>size() = s2>size() and s1>asSet() >asSequence()>forAll(i s2>asSet()>asSequence()>at(s1>asSet() >asSequence()>indexOf(i)) = i)). 
If both collections are empty then true, otherwise false 
<> (for sequences) 
Expanding s1 <> s2, where s1 and s2 are two sequences, is equivalent to expanding:
(s1>size() <> s2>size() or s1>asSet() >asSequence()>exists(i s2>asSet()>asSequence()>at(s1>asSet() >asSequence()>indexOf(i)) <> i)). 
If both collections are empty then false, otherwise true 
= (for bags)  Expanding b1 = b2, where b1 and b2 are two bags, is equivalent to expanding: (b1>forAll(i b2>exists(j i = j and b1>count(i) = b2>count(j)))).  If both collections are empty then true, otherwise false 
<> (for bags)  Expanding b1 <> b2, where b1 and b2 are two bags, is equivalent to expanding: (b1>exists(i b2>forAll(j i <> j or b1>count(i) <> b2>count(j)))).  If both collection are empty then false, otherwise true 
= (for sets)  Expanding s1 = s2, where s1 and s2 are two sets, is equivalent to expanding: (s1>forAll(i  s2>exists(j  i = j)) and s2>forAll(i  s1>exists(j  i = j)))  If both collections are empty then true, otherwise false 
<> (for sets)  Expanding s1 <> s2, where s1 and s2 are two sets, is equivalent to expanding: (s1>exists(i  s2>forAll(j  i <> j)) or s2>exists(i  s1>forAll(j  i <> j))).  If both collection are empty then false, otherwise true 
= (for ordered sets)  Expanding os1 = os2, where os1 and os2 are two ordered sets, is equivalent to expanding: (os1>size() = os2>size() and os1>forAll(i  os2>exists(j̇ i = j))).  If both collections are empty then true, otherwise false 
<> (for ordered sets)  Expanding os1 <> os2, where os1 and os2 are two ordered sets, is equivalent to expanding: (os1>size() <> os2>size() or os1>exists(i  os2>forAll(j̇ i <> j))).  If both collection are empty then false, otherwise true 
= (between an ordered Set and a set)  Expanding os = s, where os is an ordered set and s is a set, is equivalent to expanding: (os>asSet()>forAll(i  s>exists(j  i = j)) and s>forAll(i  os>asSet>exists(j  i = j)))  If both collections are empty then true, otherwise false 
<> (between an ordered set and a set)  Expanding os <> s, where os is an ordered set and s is a set, is equivalent to expanding: (os>asSet()>exists(i  s>forAll(j  i <> j)) or s>exists(i  os>asSet>forAll(j  i <> j)))  If both collection are empty then false, otherwise true 
Appendix C Rules for Translating OCL to SMTLIB
Table 5 provides the rules that we use for translating NNF OCL expressions (after the execution of the expansion and substitution processes) to SMTLIB. The first column of the table indicates the type of the traversed node (from the metamodel of Fig. 14). The second column shows the general shape of the OCL expression represented by the node in the first column. The third column shows the SMTLIB fragment resulting from the translation.
Node type possible within a processed NNF expression  Shape of the OCL expression  Corresponding SMTLIB formula 

PrimitiveLiteralExp’s subtypes, EnumLiteralExp  A constant literal X  X 
IfExp  if(condition) then body1 else body2 endif  (ite toSMTLIB(condition) toSMTLIB(body1) toSMTLIB(body2)) 
VariableExp  A variable named X  The name of the SMTLIB variable that corresponds to X 
AttributeCallExp  An access to an attribute named X  The name of the SMTLIB variable that corresponds to X 
OperationCallExp  operand1 op operand2; Binary infix operations where op can be: and, or, <, >, <=, >=, +, , *, /, =, or <> 
(op toSMTLIB(operand1)
toSMTLIB(operand2)) 
not (expression)  (not toSMTLIB(expression))  
X.op(); Calls to OCL builtin operations that admit no parameters. Specifically, op can be: size, toInteger, toReal, round, ceil, abs, or floor 
(op toSMTLIB(X))
Except for the abs operation, which has a corresponding builtin function in SMTLIB with the same name, the remaining operations need to be explicitly defined in SMTLIB as illustrated below:


X.op(Y); Calls to OCL builtin operations that admit one parameter. Specifically, op can be: max, min, div, mod, or concat 
(op toSMTLIB(X) toSMTLIB(Y))
Except for div and mod, which have corresponding builtin functions in SMTLIB (with the same names), the remaining operations need to be explicitly defined as illustrated below:


X.substring(Y, Z) (this is the only OCL builtin operation on primitive variables that has two or more parameters)  (str.substr toSMTLIB(X) toSMTLIB(Y) toSMTLIB(Z))  
X.userdefined operation O ( parameters)  (O forEach () do {toSMTLIB()})  
Comments
There are no comments yet.