# Quantifier Elimination for Database Driven Verification

Running verification tasks in database driven systems requires solving quantifier elimination problems (not including arithmetic) of a new kind. In this paper, we supply quantifier elimination algorithms based on Knuth-Bendix completions and begin studying the complexity of these problems, arguing that they are much better behaved than their arithmetic counterparts. This observation is confirmed by analyzing the preliminary results obtained using the MCMT tool on the verification of data-aware process benchmarks. These benchmarks can be found in the last version of the tool distribution. The content of this manuscript is very preliminary, its role being simply that of expanding the documentation available from MCMT v. 2.8 distribution.

There are no comments yet.

## Authors

• 28 publications
• 13 publications
• 13 publications
• 29 publications
• 11 publications
• ### Counter Simulations via Higher Order Quantifier Elimination: a preliminary report

Quite often, verification tasks for distributed systems are accomplished...
12/05/2017 ∙ by Silvio Ghilardi, et al. ∙ 0

• ### Presburger arithmetic with threshold counting quantifiers is easy

We give a quantifier elimination procedures for the extension of Presbur...
03/08/2021 ∙ by Dmitry Chistikov, et al. ∙ 0

• ### Complexity Estimates for Fourier-Motzkin Elimination

In this paper, we propose a new method for removing all the redundant in...
11/05/2018 ∙ by Rui-Juan Jing, et al. ∙ 0

• ### Non-linear Real Arithmetic Benchmarks derived from Automated Reasoning in Economics

We consider problems originating in economics that may be solved automat...
06/28/2018 ∙ by C. Mulligan, et al. ∙ 0

• ### A Concise Function Representation for Faster Exact MPE and Constrained Optimisation in Graphical Models

We propose a novel concise function representation for graphical models,...
08/09/2021 ∙ by Filippo Bistaffa, et al. ∙ 0

• ### The theory of hereditarily bounded sets

We show that for any k∈ω, the structure (H_k,∈) of sets that are heredit...
04/14/2021 ∙ by Emil Jeřábek, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

During the last two decades, the (fundamental) problem studying integrated management of business processes and master data received great attention in academia and the industry [24, 16, 23]. In its core, the problem requires a change of an entrenched control-flow perspective adopted within the business process community to a more holistic approach that moves towards considering how data are manipulated and evolved by the process, and how the flow of activities is affected by the presence of data as well as the evaluation of data-driven decisions.

In the light of this recent development, two main lines of research emerged: one on the development of integrated models for processes and data [22], and the other on their static analysis and verification [8]. Many various concrete languages (as well as software platforms for their modeling and enactment) for data-aware processes spawned from the first line of research. The main unifying theme for such approaches is a shift from standard activity-centric models to data-centric ones, where the focus is put on key business entities of the organization, integrating their structural and behavioral (lifecycle) aspects. This resulted in the creation of various languages and frameworks [20, 19] for modeling and execution such as IBM’s declarative rule-based Guard-Stage-Milestone (GSM) notation [13], OMG’s modeling standard CMMN (Case Management Model and Notation) and object-aware PHILharmonic Flows [20].

In turn, the second line of research resulted in a series various results on the boundaries of decidability and complexity for the static analysis of data-aware processes [26, 8]. It is worth noting that formal models adopted along this line of research can be divided into two main classes. The first one considers very general data-aware processes that evolve a (full-fledged) relational database with integrity constrains by means of atomic create-read-update-delete operations that may introduce new values222The values are taken from an infinite data domain [6, 5, 1, 9]. Here, verification tasks take an initial database instance as input and proof desired properties by constructing an infinite-state transition system (whose states are labeled wit database instances) considering all possible process evolutions. Conversely, the second class adopts artifact-centric processes [14, 12] with the underlying formal model based on: (i) a read-only relational database that stores fixed, background information, (ii) a working memory that stores the evolving state of artifacts, and (iii) actions that update the working memory. Different variants of this model have been considered towards decidability of verification, by carefully tuning the relative expressive power of these three components. The most interesting settings consider pure relational structures with a single-tuple working memory [7], and artifact systems operating over a read-only database equipped with constraints and tracking the co-evolution of multiple, unboundedly many artifacts [15]. Even though in these works the working memory can be updated only using values from the read-only database (i.e., no fresh values can be injected), verification is extremely challenging as it is studied parametrically to the read-only database itself, thus requiring to check infinitely many finite transition systems. This is done to assess whether the system behaves well irrespectively of the read-only data it operates on.

In [10], we propose a generalized model for artifact-centric systems and focus on the (parameterized) safety problem, which amounts to determining whether there exists an instance of the read-only database that allows the system to evolve from its initial configuration to an undesired configuration falsifying a given state property. We study this problem by establishing for the first time a bridge between verification of artifact-centric systems and model checking based on Satisfiability-Modulo-Theories (SMT). Specifically, our approach is grounded in array-based systems – a declarative formalism originally introduced in [17, 18] to handle the verification of distributed systems (parameterized on the number of interacting processes), and afterwards successfully employed also to attack the static analysis of other types of systems [3, 2]. The overall state of the system is typically described by means of arrays indexed by process identifiers, and used to store the content of process variables like locations and clocks. These arrays are genuine second order variables. In addition, quantified formulae are used to represent sets of system states. These formulae together with second order function variables form the core of the model checking methodologies developed in [17, 18] and following papers. The declarative formalism of array-based systems is exploited as the model-theoretic framework of the tool mcmt. This tool manages the verification of infinite-state systems by implementing a symbolic version of the backward reachability algorithm.

In the work [10] we encode artifact systems into array-based systems by providing a “functional view” of relational theories endowed with primary and foreign key dependencies, where the read-only database and the artifact relations forming the working memory are represented with sorted unary function symbols. The resulting framework, however, requires novel and non-trivial extensions of the array-based technology to make it operational. In fact, quantifiers are handled in array-based systems both by their instantiation and elimination. While the first can be transposed to the new framework leveraging the Herbrand Theorem, the latter becomes problematic due to the following reason: quantified data variables do not range over simple data types (e.g., integers, reals or enumerated sets) as in standard array-based systems, but instead refer to the content of a full-fledged (read-only) relational database. To overcome this problem, we employ classic model-theoretic machinery, namely model completions [25], using which we prove that the runs of the systems we are interested in can be lifted w.l.o.g. to richer contexts – so-called random-like structures –, where quantifier elimination is indeed available, despite the fact that it was not available in the original setting. This allows us to recast the original safety problem into an equivalent safety problem in the richer setting where quantifier elimination is available. Specifically, the quantifier elimination permits to resort for symbolic representation of sets of reachable states without using quantifiers over data taken from the read-only database.

The described quantifier elimination is the central topic of this paper. Note that, in order to be able to eliminate quantifiers from the data variables, it is important to study algorithms that could correctly perform this task. Specifically, we aim at developing formal procedures that eliminate quantifiers in the model completions of the theories of sorted unary functions mentioned before. In order to realize these procedures, we employ techniques based on Knuth-Bendix completions that not only show the correctness of the proposed approach, but also guarantee its computational efficiency. These procedures have been already partially implemented in mcmt version 2.8.

## 2 Preliminaries

We adopt the usual first-order syntactic notions of signature, term, atom, (ground) formula, and so on; our signatures are multi-sorted and include equality for every sort. This implies that variables are sorted as well. For simplicity, most basic definitions in this Section will be supplied for single-sorted languages only (the adaptation to multi-sorted languages is straightforward). We compactly represent a tuple of variables as . The notation means that the term , the formula has free variables included in the tuple .

We assume that a function arity can be deduced from the context. Whenever we build terms and formulae, we always assume that they are well-typed, in the sense that the sorts of variables, constants, and function sources/targets match.A formula is said to be universal (resp., existential) if it has the form (resp., ), where is a quantifier-free formula. Formulae with no free variables are called sentences.

From the semantic side, we use the standard notion of a -structure and of truth of a formula in a -structure under a free variables assignment.

A -theory is a set of -sentences; a model of is a -structure where all sentences in are true. We use the standard notation to say that is true in all models of for every assignment to the variables occurring free in . We say that is -satisfiable iff there is a model of and an assignment to the variables occurring free in making true in .

We give now the definitions of constraint satisfiability problem and quantifier elimination for a theory .

A -formula is a -constraint (or just a constraint) iff it is a conjunction of literals. The constraint satisfiability problem for is the following: we are given an existential formula333For the purposes of this definition, we may equivalently take to be quantifier-free. and we are asking whether there exist a model of and an assignment to the free variables such that .

A theory has quantifier elimination iff for every formula in the signature of there is a quantifier-free formula such that . It is well-known (and easily seen) that quantifier elimination holds in case we can eliminate quantifiers from primitive formulae, i.e. from formulae of the kind , where is a conjunction of literals (i.e. of atomic formulae and their negations). Since we are interested in effective computability, we assume that when we talk about quantifier elimination, an effective procedure for eliminating quantifiers is given.

We recall also some basic definitions and notions from logic and model theory. We focus on the definitions of diagram, embedding, substructure and amalgamation.

### 2.1 Substructures and embeddings

Let be a first-order signature. The signature obtained from by adding to it a set of new constants (i.e., -ary function symbols) is denoted by . Analogously, given a -structure , the signature can be expanded to a new signature by adding a set of new constants (the name for ), one for each element in , with the convention that two distinct elements are denoted by different ”name” constants. can be expanded to a -structure just interpreting the additional costants over the corresponding elements. From now on, when the meaning is clear from the context, we will freely use the notation and interchangeably: in particular, given a -structure and a -formula with free variables that are all in , we will write, by abuse of notation, instead of .

A -homomorphism (or, simply, a homomorphism) between two -structures and is any mapping among the support sets of and of satisfying the condition

 M⊨φ⇒N⊨φ (1)

for all -atoms (here is regarded as a -structure, by interpreting each additional constant into itself and is regarded as a -structure by interpreting each additional constant into ). In case condition (1) holds for all -literals, the homomorphism is said to be an embedding and if it holds for all first order formulae, the embedding is said to be elementary. Notice the following facts:

(a)

since we have equality in the signature, an embedding is an injective function;

(b)

an embedding must be an algebraic homomorphism, that is for every -ary function symbol and for every in , we must have ;

(c)

for an -ary predicate symbol we must have iff .

It is easily seen that an embedding can be equivalently defined as a map satisfying the conditions (a)-(b)-(c) above. If is an embedding which is just the identity inclusion , we say that is a substructure of or that is an extension of . A -structure is said to be generated by a set included in its support iff there are no proper substructures of including .

The notion of substructure can be equivalently defined as follows: given a -structure and a -structure such that , we say that is a -substructure of if:

• for every function symbol inf , the interpretation of in (denoted using ) is the restriction of the interpretation of in to (i.e. for every in ); this fact implies that a substructure must be a subset of which is closed under the application of .

• for every relation symbol in and every tuple , iff , which means that the relation is the restriction of to the support of .

We recall that a substructure preserves and reflects validity of ground formulae, in the following sense: given a -substructure of a -structure , a ground -sentence is true in iff is true in .

### 2.2 Robinson Diagrams and Amalgamation

Let be a -structure. The diagram of , denoted by , is defined as the set of ground -literals (i.e. atomic formulae and negations of atomic formulae) that are true in . For the sake of simplicity, once again by abuse of notation, we will freely say that is the set of -literals which are true in .

An easy but nevertheless important basic result, called Robinson Diagram Lemma [11], says that, given any -structure , the embeddings are in bijective correspondence with expansions of to -structures which are models of . The expansions and the embeddings are related in the obvious way: is interpreted as .

Amalgamation is a classical algebraic concept. We give the formal definition of this notion.

###### Definition 1 (Amalgamation)

A theory has the amalgamation property if for every couple of embeddings , among models of , there exists a model of endowed with embeddings and such that

& The triple (or, by abuse, itself) is said to be a -amalgama of over

## 3 Read-only Database Schemas

In this section, we provide a formal definition of (read-only) DB-schemas by relying on an algebraic, functional characterization.

###### Definition 2

A DB schema is a pair , where: (i) is a DB signature, that is, a finite multi-sorted signature whose only symbols are equality, unary functions, and constants; (ii) is a DB theory, that is, a set of universal -sentences.

Given a DB signature , we respectively denote by and the set of sorts and functions in . In the following, we sometimes omit the explicit definition of DB schema, and refer directly to a (DB) theory with a (DB) signature .

We assume that in a DB signature all function and constant symbols are typed. Thus, every function symbol has a source and a target: given a function symbol in , we write to say that is the source of and is the target of , where and are sorts from . Constant symbols are also sorted. Whenever we build terms and formulae, we always assume that they are well-typed, in the sense that the sorts of variables, constants, and function sources/targets match. Consequently, sorts are implicitly determined by the context: if we write , we implicitly get that the sort of constant c is the source of , and that the target sort of is the source sort of . Since only unary function symbols and equality are allowed in , all atomic -formulae are of the form , where are possibly complex terms, and are either variables or constants.

We associate to a DB signature a characteristic graph capturing the dependencies that are induced by functions over sorts. Specifically, is an edge-labeled graph whose nodes are the sorts in , and such that contains a labeled edge if and only if contains function symbol . We say that is acyclic if is so. The leaves of are the nodes of without outgoing edges. From a pragmatic point of view, these terminal sorts are divided in two subsets, respectively representing unary relations and value sorts. Non-value sorts (i.e., unary relations and non-leaf sorts) are called id sorts, and are conceptually used to represent (identifiers of) different kinds of objects. Value sorts, instead, represent datatypes such as strings, numbers, clock values, etc. Whenever needed, we identify the set of id sorts in by , and that of value sorts by (recall that ).

We now focus on extensional data conforming to a given DB schema.

###### Definition 3

A DB instance of DB schema is a -structure such that: (i) is a model of , and (ii) every id sort of is interpreted by on a finite set.

As usual, a DB instance has to be distinguished from an arbitrary model of , where no finiteness assumption is posed on the interpretation of id sorts. What may appear as not customary in Definition 3 is the fact that value sorts can be interpreted on infinite sets. This allows us, at once, to reconstruct the classical notion of DB instance as a finite model (since only finitely many values can be pointed from id sorts using functions), at the same time supplying a potentially infinite set of fresh values to be dynamically introduced in the working memory during the evolution of the artifact system. We respectively denote by , , and the interpretation in of the sort (this is a set), of the function symbol (this is a set-theoretic function), and of the constant (this is an element of the interpretation of the corresponding sort). Obviously, and must match the sorts declared in . For instance, if the source and the target of are, respectively, and , then the function has domain and range .

We close our discussion on the formalization of DB schemas by discussing DB theories. The role of a DB theory is to encode background axioms to express constraints on the different elements of the corresponding signature. We illustrate a typical background axiom, required to handle the possible presence of undefined identifiers/values in the different sorts. This, in turn, is essential to capture AAS whose working memory is initially undefined, in the style of [15, 21]. To accommodate this, we add to every sort of a constant (written by abuse of notation just undef from now on), used to specify an undefined value. Then, for each function symbol of , we add the following axioms to the DB theory:

 ∀x (x=undef↔f(x)=undef) (2)

This axiom states that the application of to the undefined value produces an undefined value, and it is the only situation for which is undefined.

###### Remark 1

In the remainder of the paper, we always implicitly assume that the DB theory consists of Axiom 2, but our technical results are not bound to this specific choice. The specific conditions we require on the DB Theory towards our results will be explicitly stated later.

As shown in [10], the algebraic, functional characterization of DB schema and instance can be actually reinterpreted in the classical, relational model. Definition 2 naturally corresponds to the definition of relational database schema equipped with single-attribute primary keys and foreign keys (plus a reformulation of constraint (2)). In order to do so, we adopt the named perspective, where each relation schema is defined by a signature containing a relation name and a set of typed attribute names. Let be a DB schema. Each id sort corresponds to a dedicated relation with the following attributes: (i) one identifier attribute with type ; (ii) one dedicated attribute with type for every function symbol of the form .

The fact that is constructed starting from functions in naturally induces corresponding functional dependencies within , and inclusion dependencies from to other relation schemas. In particular, we obtain the following constraints for :

• For each non-id attribute of , we get a functional dependency from to . Altogether, such dependencies in turn witness that is the (primary) key of .

• For each non-id attribute of whose corresponding function symbol has id sort as image, we get an inclusion dependency from to the id attribute of . This captures that is a foreign key referencing .

Given a DB instance of , its corresponding relational instance is the minimal set satisfying the following property: for every id sort , let be all functions in with domain ; then, for every identifier , contains a labeled fact of the form . With this interpretation, the active domain of is the finite set

 ⋃S∈Σids(SM∖{undefM})∪{v∈⋃V∈ΣvalVM∣∣ ∣∣there exist f∈Σfunand o∈dom(fM) s.t.~{}fM(o)=v}

consisting of all (proper) identifiers assigned by to id sorts, as well as values obtained in via the application of some function. Since such values are necessarily finitely many, one may wonder why in Definition 3 we allow for interpreting value sorts over infinite sets. The reason is that, in our framework, an evolving artifact system may use such infinite provision to inject and manipulate new values into the working memory.

## 4 Quantifier Elimination and Model Completion for DB schemata

We fix a DB signature and a DB theory as in Definition 2.

A DB theory (in the sense of Definition 2) need not eliminate quantifiers; it is however often possible to strengthen in a conservative way (with respect to constraint satisfiability) and get quantifier elimination. We say that has a model completion iff there is a stronger theory (still within the same signature of ) such that (i) every -constraint which is satisfiable in a model of is satisfiable in a model of ; (ii) eliminates quantifiers.

The following Lemma gives a useful folklore technique for finding model completions:

###### Lemma 1

Suppose that for every primitive -formula it is possible to find a quantifier-free formula such that

(i)

;

(ii)

for every model of , for every tuple of elements from the support of such that it is possible to find another model of such that embeds into and .

Then has a model completion axiomatized by the infinitely many sentences 444Notice that our is assumed to be universal according to Definition 2, whereas turns out to be universal-existential.

 ∀y–(ψ(y–)→∃xϕ(x,y–)) . (3)
###### Proof

From (i) and (3) we clearly get that admits quantifier elimination: in fact, in order to prove that a theory enjoys quantifier elimination, it is sufficient to teliminate quantifiers from primitive formulae (then the quantifier elimination for all formulae can be easily shown by an induction over their complexity). This is exactly what is guaranteed by (i) and (3).

Let be a model of . We show (by using a chain argument) that there exists a model of such that embeds into . For every primitive formula , consider the set such that (where is related to as in (i)). By Zermelo’s Theorem, the set can be well-ordered: let be such a well-ordered set (where is an ordinal). By transfinite induction on this well-order, we define and, for each , as the extension of such that , which exists for (ii) since (remember that validity of ground formulae is preserved passing through substructures and superstructures, and ).

Now we take the chain union : since is universal, is again a model of , and it is possible to construct an analogous chain as done above, starting from instead of . Clearly, we get by construction. At this point, we iterate the same argument countably many times, so as to define a new chain of models of :

 M0:=M⊆M1⊆...⊆Mn⊆...

Defining , we trivially get that is a model of such that and satisfies all the sentences of type (3). The last fact can be shown using the following finiteness argument.

Fix as in (3). For every tuple such that , by definition of there exists a natural number such that : since is a ground formula, we get that also . Therefore, we consider the step of the countable chain: there, we have that the pair appears in the enumeration given by the well-ordered set (for such ordinal ) such that . Hence, by construction and since is a ground formula, we have that there exists a such that and . In conclusion, since the existential formulae are preserved passing to extensions, we obtain , as wanted.

Observe that if is acyclic, there are only finitely many terms involving a single variable : in fact, there are as many terms as paths in starting from the sort of . If is the maximum number of terms involving a single variable, then (since all function symbols are unary) there are at most terms involving variables.

The following proposition shows an interesting family of theories that admit model completion, and gives an explicit algorithm for quantifier elimination in their model completions .

###### Theorem 4.1

has a model completion in case it is axiomatized by universal one-variable formulae and is acyclic.

###### Proof

We freely take inspiration from an analogous result in [27]. We preliminarly show that is amalgamable. Then, for a suitable choice of suggested by the acyclicity assumption, the amalgamation property will be used to prove the validy of the condition (ii) of Lemma 1: this fact (together with condition (i)) yields that has a model completion which is axiomatized by the infinitely many sentences (3).

Let and two models of with a submodel of in common (we suppose for simplicity that . We define a -amalgam of over as follows (we use in an essential way the fact that contains only unary function symbols). Let the support of be the set-theoretic union of the supports of and , i.e. . has a natural -structure inherited by the -structures and : for every function symbol in , we define, for each , , i.e. the interpretation of in is the restriction of the interpretation of in for every element . This is well-defined since, for every , we have that . It is clear that and are substructures of , and their inclusions agree on .

We show that the -structure , as defined above, is a model of . By hypothesis, is axiomatized by universal one-variable formulae: so, we can consider as a theory formed by axioms which are universal closures of clauses with just one variable, i.e. , where and ( and ) are atoms.

We show that satisfies all such formulae . In order to do that, suppose that, for every , for all . If , then implies , since is a ground formula. Since is model of and so , we get that for some , which means that , since is a ground formula. Thus, for every axiom of , i.e. and, hence, is a -amalgam of over , as wanted

Now, given a primitive formula , we find a suitable such that the hypothesis of Lemma 1 holds. We define as the conjunction of the set of all quantifier-free -formulae such that is a logical consequences of (they are finitely many - up to -equivalence - because is acyclic). By definition, clearly we have that (i) of Lemma 1 holds.

We show that also condition (ii) is satisfied. Let be a model of such that for some tuple of elements from the support of . Then, consider the -substructure of generated by the elements : this substructure is finite (since is acyclic), it is a model of and we trivially have that , since is a ground formula. In order to prove that there exists an extension of such that , it is sufficient to prove (by the Robinson Diagram Lemma) that the -theory is -consistent. For reduction to absurdity, suppose that the last theory is -inconsistent. Then, there are finitely many literals from (remember that is a finite set of literals since is a finite structure) such that . Therefore, defining , we get that , which implies that is one of the -formulae appearing in . Since , we also have that , which is a contraddiction: in fact, by definition of diagram, must hold. Hence, there exists an extension of such that . Now, by amalgamation property, there exists a -amalgam of and over : clearly, is an extension of and, since and , also holds, as required.

The proof of Theorem 4.1 gives an algorithm for quantifier elimination in the model completion. The algorithm works as follows (see the formula (3)): to eliminate the quantifier from take the conjunction of the clauses implied by . Note that this algorithm is not practically efficient. In fact, better algorithms can be obtained by using Knuth-Bendix procedure, which we are going to study in detail in the following section.

## 5 Algorithms for quantifier elimination

The algorithm for quantifier elimination suggested by the proof of Theorem 4.1 is highly impractical: it relies on the formula (3), where is in fact obtained by conjoining the clauses implied by .

In this section, we introduce better algorithms for the special theories we are interested in and discuss their complexities. The content of this section gives some details about our implementation in mcmt.

We take as complexity of a quantifier-elimination procedure the time/space cost of applying it to a primitive formula: this reflects the needs of our applications and separates the cost of the procedure itself from other costs related to disjunctive normal form conversions. Notice that array-based model checkers, in order to represent sets of states - in particular, sets of states which are backward reachable - use lists of primitive formulae 555Conjunctions of literals (i.e. matrices of primitive formulae) are often called ’cubes’, whence the name ‘Cubicle’ for the tool developed at LRI-Intel for bakward reachability in array-based systems. and it is precisely to these formulae that quantifier elimination in is applied in our tool mcmt.

One of the reasons for the high complexity of quantifier elimination in linear arithmetics is that eliminating quantifiers from a primitive formula does not yield in general a primitive formula: we shall see that in our contexts the situation is different. Another problem in quantifier elimination for linear arithmetic (even in real linear arithmetic, which is handled e.g. by Fourier-Motzkin algorithm) is that the size of terms might grow after eliminating quantified variables - in fact terms are here arbitrary linear polynomials. Again, this is not the case for us: if we show that eliminating quantifiers from a primitive formula yields a conjunction of literals (and not a conjunction of clauses), then it is clear that the space of the output is polynomially bounded in the length of the tuple (keeping

as a constant). This may suggest that also the time for the computation might be polynomial in relevant cases. In other words, quantifier elimination in our context is computationally much better behaved than in the arithmetic case, so that more sophisticated machinery (predicate abstraction, interpolants, etc.) used in infinite state model checking to circumvent quantifier elimination might not be needed here.

In all the algorithms below, we make reference to the Knuth-Bendix completion procedure, applied to a set of ground literals. Such procedure always terminates in the ground case, we refer the reader to [4] for the necessary background.

### 5.1 The Basic Algorithm

We first give an algorithm for the case in which is empty (notice that the algorithm applies also to signatures which may not be acyclic). The steps of the algorithm are the following: Input: , with a conjunction of literals (we write for the tuple ). Replace variables by free constants - we keep the names for these constants. Choose a reduction ordering total for ground terms giving higher precedence to with respect to all the other symbols (thus equations are always oriented as ). Run the Knuth-Bendix completion procedure (with simplification) to the literals in considered as ground literals; let be the conjunction of the literals resulting from the completion. Delete from the literals in which occurs and terminate. Output: Let be the output.

We assume that in case a literal like is produced (while normalizing a negative literal in Step 3 above), then the procedure stops with output .

We want to prove that the algorithm is correct in the sense that

###### Proposition 1

Let be empty; then the set of axioms (varying among the conjunctions of finite sets of literals) axiomatize the model completion of .

###### Proof

In order to reach our goal, we apply Lemma 1; condition (i) of the Lemma follows from the fact that Knuth-Bendix completion manipulates a set of literals only up to logical equivalence. As a consequence, it is sufficient to show the validity of the following Claim, corresponding to condition (ii) of the Lemma.

Claim: given a model of and elements from the support of such that (where is the output formula), can be embedded in a model of such that (where is the input formula).

To prove the Claim, we define a -structure which extends in the following way (we let , where is the interpretation function, extended to an assignment mapping the to the ):

-

Let be the set of the normal forms of the terms of the kind and let be the set of such normal forms which contain at least an occurrence of (notice that can be empty in case the completion procedure produces an equation like - recall that such equation is oriented as ).

-

We define ; we extend to as follows: (i) is the normal form of if it belongs to , otherwise it is where is the normal form of ; (ii) is the normal form of if it belongs to , otherwise it is where is the normal form of .

An easy induction now shows that for every term normalizing to some , we have ; moreover, if occurs in , then .

It remains to check that ; this is the same as saying that , because Knuth Bendix completion operates up to logical equivalence.

Now, literals from not involving are true in and so also in ; we need to analize equalities and disequalities from where occurs. These can be of four kinds:

(i)

equalities of the kind : since Knuth Bendix procedure removes trivial equalities and the order is total on ground terms, we must have e.g. and that is the normal form of , so that the claim is obvious;

(ii)

inequalities of the kind : here and must both be in normal forms (and different, otherwise the procedure would have output ), so that once again the claim is immediate;

(iii)

equalities of the kind : here normalizes to , so that the claim holds;

(iv)

inequalities of the kind : here and are both in normal forms and as a consequence .

This concludes the proof of the above Claim.

Notice that the above algorithm maps a primitive formula to a conjunction of literals (not to a conjunction of clauses). In case of an acyclic signature , it is easily seen to run in polynomial time: in fact, a step of Knuth-Bendix completion (with simplification, in the ground case), always replaces an equation by smaller ones and we already observed that, keeping constant, there can be only polynomially many terms and equations in a given finite number of variables.

### 5.2 Extensions

We consider two extensions of the above basic algorithm, both have been implemented in our tool mcmt.

In the first extension, we consider the axiom

 t(x)=undef↔x=undef (4)

for every term (here we assume to have many constants undef, one for every sort). One side of the above axiom is equivalent to the ground literal and as such it does not interfer with the completion process and the quantifier elimination procedure (we just add it to our constraint from the beginning).

To accommodate the other side, it is sufficient to do the following. We split the initial constraint into a disjunction , where contains the literal and contains the literal . Then, is handled in the trivial way (replacing everywhere with undef); as for , we check whether, at the end of the completion, we have an equality like in the current constraint: in that case, we add to the completion the literal .666This is sound because implies , so follows. The above correctness proof can be adjusted as follows to cover this modification. If (by absurd) there is a term (in which occurs) such that , then pick a minimal (wrt the ordering) such term ; since , cannot be in normal form by the definition of . Since it is minimal, there is an equality in the completion that rewrites itself (not a subterm!) to its normal form . Hence and as a consequence , which is the same as , but the latter is absurd because was a model of (because such a literal is added to the completion).

Thus, axioms (4) break our desired property that quantifier elimination applied to a primitive formula produces a conjunction of literals. However, in the implementation, it is possible to assume that always occurs in the matrix of a primitive formula we want to eliminate from. In fact, according to the backward search algorithm implemented in array-based systems tools, the variable to be eliminated always comes from the guard of a transition and we can assume that such a guard contains the literal (if we need a transition with - for an existentially quantified variable - it is possible to write trivially this condition without using a quantified variable).

In a second extension, we consider the possibility of enriching with unary and binary (sorted) relation symbols. These symbols are not used in our formal framework (they would represent relations without a key), but the extension is easy, so we decided to cover it too in our implementation. The modification to the above quantifier elimination algorithm is straightforward. Of course, terms occurring in relational literals are also subject to normalization during the completion phase. For unary relations this observation is sufficient,777 Remember that complementary literals nevertheless produce and that this applies to relational atoms too. whereas for binary relations there is the need of the following further operation: if in Step 3, the constraint contains (resp. ), then must be added to .888 Notice that ternary relations would generate disjunctions: should produce the disjunction .

## References

• [1] P. A. Abdulla, C. Aiswarya, M. F. M. M. Atig, and O. Rezine. Recency-bounded verification of dynamic database-driven systems. In Proc. of PODS. ACM Press, 2016.
• [2] F. Alberti, R. Bruttomesso, S. Ghilardi, S. Ranise, and N. Sharygina. An extension of lazy abstraction with interpolation for programs with arrays. Formal Methods in System Design, 45(1):63–109, 2014.
• [3] F. Alberti, S. Ghilardi, and N. Sharygina. A framework for the verification of parameterized infinite-state systems. Fundam. Inform., 150(1):1–24, 2017.
• [4] F. Baader and T. Nipkow. Term Rewriting and All That. Cambridge University Press, United Kingdom, 1998.
• [5] B. Bagheri Hariri, D. Calvanese, G. De Giacomo, A. Deutsch, and M. Montali. Verification of relational data-centric dynamic systems with external services. In Proc. of PODS, 2013.
• [6] F. Belardinelli, A. Lomuscio, and F. Patrizi. An abstraction technique for the verification of artifact-centric systems. In Proc. of KR, 2012.
• [7] M. Bojańczyk, L. Segoufin, and S. Toruńczyk. Verification of database-driven systems via amalgamation. In Proc. of PODS, pages 63–74, 2013.
• [8] D. Calvanese, G. De Giacomo, and M. Montali. Foundations of data aware process analysis: A database theory perspective. In Proc. of PODS, 2013.
• [9] D. Calvanese, G. De Giacomo, M. Montali, and F. Patrizi. First-order mu-calculus over generic transition systems and applications to the situation calculus. Inf. and Comp., 2017.
• [10] D. Calvanese, S. Ghilardi, A. Gianola, M. Montali, and A. Rivkin. Verification of data-aware processes via array-based systems. pages 1–12. preprint submitted to PODS 2019.
• [11] C.-C. Chang and J. H. Keisler. Model Theory. North-Holland Publishing Co., Amsterdam-London, third edition, 1990.
• [12] E. Damaggio, A. Deutsch, and V. Vianu. Artifact systems with data dependencies and arithmetic. ACM TODS, 37(3), 2012.
• [13] E. Damaggio, R. Hull, and R. Vaculín. On the equivalence of incremental and fixpoint semantics for business artifacts with Guard-Stage-Milestone lifecycles. In Proc. of BPM, 2011.
• [14] A. Deutsch, R. Hull, F. Patrizi, and V. Vianu. Automatic verification of data-centric business processes. In Proc. of ICDT, pages 252–267, 2009.
• [15] A. Deutsch, Y. Li, and V. Vianu. Verification of hierarchical artifact systems. In Proc. of PODS, pages 179–194. ACM Press, 2016.
• [16] M. Dumas. On the convergence of data and process engineering. In Proc. of ADBIS, volume 6909 of LNCS. Springer, 2011.
• [17] S. Ghilardi, E. Nicolini, S. Ranise, and D. Zucchelli. Towards SMT model checking of array-based systems. In Proc. of IJCAR, pages 67–82, 2008.
• [18] S. Ghilardi and S. Ranise. Backward reachability of array-based systems by SMT solving: Termination and invariant synthesis. Logical Methods in Computer Science, 6(4), 2010.
• [19] R. Hull. Artifact-centric business process models: Brief survey of research results and challenges. In Proc. of OTM, volume 5332 of LNCS. Springer, 2008.
• [20] V. Künzle, B. Weber, and M. Reichert. Object-aware business processes: Fundamental requirements and their support in existing approaches. Int. J. of Information System Modeling and Design, 2(2), 2011.
• [21] Y. Li, A. Deutsch, and V. Vianu. VERIFAS: A practical verifier for artifact systems. PVLDB, 11(3):283–296, 2017.
• [22] A. Meyer, S. Smirnov, and M. Weske. Data in business processes. Technical Report 50, Hasso-Plattner-Institut for IT Systems Engineering, Universität Potsdam, 2011.
• [23] M. Reichert. Process and data: Two sides of the same coin? In Proc. of the On the Move Confederated Int. Conf. (OTM 2012), volume 7565 of LNCS. Springer, 2012.
• [24] C. Richardson. Warning: Don’t assume your business processes use master data. In Proc. of BPM, volume 6336 of LNCS. Springer, 2010.
• [25] A. Robinson. On the metamathematics of algebra. Studies in Logic and the Foundations of Mathematics. North-Holland Publishing Co., Amsterdam, 1951.
• [26] V. Vianu. Automatic verification of database-driven systems: a new frontier. In Proc. of ICDT, pages 1–13, 2009.
• [27] W. H. Wheeler. Model-companions and definability in existentially complete structures. Israel J. Math., 25(3-4):305–330, 1976.