During the last two decades, a huge body of research has been dedicated to the challenging problem of reconciling data and process management within contemporary organizations [39, 28, 38]. This requires to move from a purely control-flow understanding of business processes to a more holistic approach that also considers how data are manipulated and evolved by the process. Striving for this integration, new models were devised, with two prominent representatives: object-centric processes , and business artifacts [34, 24].
In parallel, a flourishing series of results has been dedicated to the formalization of such integrated models, and on the boundaries of decidability and complexity for their static analysis and verification . Such results are quite fragmented, since they consider a variety of different assumptions on the model and on the static analysis tasks [43, 16]. Two main trends can be identified within this line. A recent series of results focuses on very general data-aware processes that evolve a full-fledged, relational database (DB) with arbitrary first-order constraints [11, 10, 1, 17]. Actions amount to full bulk updates that may simultaneously operate on multiple tuples at once, possibly injecting fresh values taken from an infinite data domain. Verification is studied by fixing the initial instance of the DB, and by considering all possible evolutions induced by the process over the initial data.
A second trend of research is instead focused on the formalization and verification of artifact-centric processes. These systems are traditionally formalized using three components [26, 23]: (i) a read-only DB that stores fixed, background information, (ii) a working memory that stores the evolving state of artifacts, and (iii) actions that update the working memory. Different variants of this model, obtained via a careful tuning of the relative expressive power of its three components, have been studied towards decidability of verification problems parameterized over the read-only DB (see, e.g., [26, 23, 12, 27]). These are verification problems where a property is checked for every possible configuration of the read-only DB.
The overarching goal of this work is to connect, for the first time, such formal models and their corresponding verification problems on the one hand, with the models and techniques of model checking via Satisfiability-Modulo-Theories (SMT) on the other hand. This is concretized through four technical contributions.
Our first contribution is the definition of a general framework of so-called Relational Artifact Systems (RASs), in which artifacts are formalized in the spirit of array-based systems, one of the most sophisticated setting within the SMT tradition. In this setting, SASs are a particular class of RASs, where only artifact variables are allowed. “Array-based systems” is an umbrella term generically referring to infinite-state transition systems implicitly specified using a declarative, logic-based formalism. The formalism captures transitions manipulating arrays via logical formulae, and its precise definition depends on the specific application of interest. The first declarative formalism for array-based systems was introduced in [31, 32] to handle the verification of distributed systems, and afterwards was successfully employed also to verify a wide range of infinite-state systems [8, 4]. Distributed systems are parameterized in their essence: the number of interacting processes within a distributed system is unbounded, and the challenge is that of supplying certifications that are valid for all possible values of the parameter . The overall state of the system is typically described by means of arrays indexed by process identifiers, and used to store the content of process variables like locations and clocks. These arrays are genuine second order function variables: they map indexes to elements, in a way that changes as the system evolves. Quantifiers are then used to represent sets of system states. RASs employ arrays to capture a very rich working memory that simultaneously accounts for artifact variables storing single data elements, and full-fledged artifact relations storing unboundedly many tuples. Each artifact relation is captured using a collection of arrays, so that a tuple in the relation can be retrieved by inspecting the content of the arrays with a given index. The elements stored therein may be fresh values injected into the RAS, or data elements extracted from the read-only DB, whose relations are subject to key and foreign key constraints. This constitutes a big leap from the usual applications of array-based systems, because the nature of such constraints is quite different and requires completely new techniques for handling them (for instance, for quantifier elimination, see below). To attack this complexity, by relying on array-based systems, RASs encode the read-only DB using a functional, algebraic view, where relations and constraints are captured using multiple sorts and unary functions. The resulting model captures the essential aspects of the model in , which in turn is tightly related (though incomparable) to the sophisticated formal model for artifact-centric systems of .
Our second contribution is the development of algorithmic techniques for the verification of (parameterized) safety properties over RASs, which amounts to determine whether there exists an instance of the read-only DB that allows the RAS to evolve from its initial configuration to an undesired one that falsifies a given state property. To attack this problem, we build on backward reachability [31, 32], one of the most well-established techniques for safety verification in array-based systems. This is a correct, possibly non-terminating technique that regresses the system from the undesired configuration to those configurations that reach the undesired one. This is done by iteratively computing symbolic pre-images, until they either intersect the initial configuration of the system (witnessing unsafety), or they form a fixpoint that does not contain the initial state (witnessing safety).
Adapting backward reachability to the case of RASs, by retaining soundness and completeness, requires genuinely novel research so as to eliminate new (existentially quantified) “data” variables introduced during regression. Traditionally, this is done by quantifier instantiation or elimination. However, while quantifier instantiation can be transposed to RASs, quantifier elimination cannot, since the data elements contained in the arrays point to the content of a full-fledged DB with constraints. To reconstruct quantifier elimination in this setting, which is the main technical contribution of this work, we employ the classic model-theoretic machinery of model completions : via model completions, we prove that the runs of a RAS can be faithfully lifted to richer contexts where quantifier elimination is indeed available, despite the fact that it was not available in the original structures. This allows us to recast safety problems over RASs into equivalent safety problems in this richer setting.
Our third contribution is the identification of three notable classes of RASs for which backward reachability terminates, in turn witnessing decidability of safety. The first class restricts the working memory to variables only, i.e., focuses on SAS. The second class focuses on RAS operating under the restrictions imposed in : it requires acyclicity of foreign keys and ensures a sort of locality principle where different artifact tuples are not compared. Consequently, it reconstructs the decidability result exploited in  if one restricts the verification logic used there to safety properties only. In addition, our second class supports full-fledged bulk updates, which greatly increase the expressive power of dynamic systems  and, in our setting, witness the incomparability of our results and those in . The third class is genuinely novel, and while it further restricts foreign keys to form a tree-shaped structure, it does not impose any restriction on the shape of updates, and consequently supports not only bulk updates, but also comparisons between artifact tuples.
Our fourth contribution concerns the implementation of backward reachability techniques for RASs. Specifically, we have extended the well-known mcmt model checker for array-based systems , obtaining a fully operational counterpart to all the foundational results presented in the paper. Even though implementation and experimental evaluation are not central in this paper, we note that our model checker correctly handles the examples produced to test verifas , as well as additional examples that go beyond the verification capabilities of verifas, and report some interesting case here. The performance of mcmt to conduct verification of these examples is very encouraging, and indeed provides the first stepping stone towards effective, SMT-based verification techniques for artifact-centric systems.
We adopt the usual first-order syntactic notions of signature, term, atom, (ground) formula, and so on. We use to represent a tuple . Our signatures are multi-sorted and include equality for every sort, which implies that variables are sorted as well. Depending on the context, we keep the sort of a variable implicit, or we indicate explicitly in a formula that variable has sort by employing notation . The notation , means that the term , the formula has free variables included in the tuple . Constants and function symbols have sources and a target , denoted as (relation symbols only have sources ). We assume that terms and formulae are well-typed, in the sense that the sorts of variables, constants, and relations, function sources/targets match. A formula is said to be universal (resp., existential) if it has the form (resp., ), where is a quantifier-free formula. Formulae with no free variables are called sentences.
From the semantic side, we use the standard notions of a -structure and of truth of a formula in a -structure under an assignment to the free variables. A -theory is a set of -sentences; a model of is a -structure where all sentences in are true. We use the standard notation to say that is true in all models of for every assignment to the free variables of . We say that is -satisfiable iff there is a model of and an assignment to the free variables of that make true in .
In the following (cf. Section 4) we specify transitions of an artifact-centric system using first-order formulae. To obtain a more compact representation, we make use there of definable extensions as a means for introducing so-called case-defined functions. We fix a signature and a -theory ; a -partition is a finite set of quantifier-free formulae such that and . Given such a -partition together with -terms (all of the same target sort), a case-definable extension is the -theory , where , with a “fresh” function symbol (i.e., )111Arity and source/target sorts for can be deduced from the context (considering that everything is well-typed)., and . Intuitively, represents a case-defined function, which can be reformulated using nested if-then-else expressions and can be written as By abuse of notation, we identify with any of its case-definable extensions . In fact, it is easy to produce from a -formula a -formula equivalent to in all models of : just remove (in the appropriate order) every occurrence of the new symbol in an atomic formula , by replacing with . We also exploit -abstractions (see, e.g., formula (6) below) for a more compact (still first-order) representation of some complex expressions, and always use them in atoms like as abbreviations of (where, typically, is a symbol introduced in a case-defined extension as above).
3 Read-only Database Schemas
We now provide a formal definition of (read-only) DB-schemas by relying on an algebraic, functional characterization, and derive some key model-theoretic properties.
A DB schema is a pair , where: (i) is a DB signature, that is, a finite multi-sorted signature whose only symbols are equality, unary functions, and constants; (ii) is a DB theory, that is, a set of universal -sentences.
Next, we refer to a DB schema simply through its (DB) signature and (DB) theory , and denote by the set of sorts and by the set of functions in . Since contains only unary function symbols and equality, all atomic -formulae are of the form , where , are possibly complex terms, and , are either variables or constants.
If desired, we can freely extend DB schemas by adding arbitrary -ary relation symbols to the signature . For this purpose, we give the following definition.
A DB extended-schema is a pair , where: (i) is a DB extended-signature, that is, a finite multi-sorted signature whose only symbols are equality, -ary relations, unary functions, and constants; (ii) is a DB extended-theory, that is, a set of universal -sentences.
Since for our application we are only interested in relations with primary and foreign key dependencies (even if our implementation takes into account also the case of “free” relations, i.e. without key dependencies), we restrict our focus on DB schemas, which are sufficient to capture those constraints (as explained in the following subsection). We notice that, in case Assumption 3.4 discussed below holds for DB extended-theories, all the results presented in Section 4 (and Theorem 5.1) still hold even considering DB extended-schemas instead of DB schemas.
We associate to a DB signature a characteristic graph capturing the dependencies induced by functions over sorts.222The same definition can be adopted also for extended DB signatures (relation symbols do not play a role in it). Specifically, is an edge-labeled graph whose set of nodes is , and with a labeled edge for each in . We say that is acyclic if is so. The leaves of are the nodes of without outgoing edges. These terminal sorts are divided in two subsets, respectively representing unary relations and value sorts. Non-value sorts (i.e., unary relations and non-leaf sorts) are called id sorts, and are conceptually used to represent (identifiers of) different kinds of objects. Value sorts, instead, represent datatypes such as strings, numbers, clock values, etc. We denote the set of id sorts in by , and that of value sorts by , hence .
We now consider extensional data.
A DB instance of DB schema is a -structure that is a model of and such that every id sort of is interpreted in on a finite set.
Contrast this to arbitrary models of , where no finiteness assumption is made. What may appear as not customary in Definition 3.3 is the fact that value sorts can be interpreted on infinite sets. This allows us, at once, to reconstruct the classical notion of DB instance as a finite model (since only finitely many values can be pointed from id sorts using functions), at the same time supplying a potentially infinite set of fresh values to be dynamically introduced in the working memory during the evolution of the artifact system. More details on this will be given in Section 3.1.
We respectively denote by , , and the interpretation in of the sort (this is a set), of the function symbol (this is a set-theoretic function), and of the constant (this is an element of the interpretation of the corresponding sort). Obviously, and must match the sorts in . E.g., if has source and target , then has domain and range .
The human resource (HR) branch of a company stores the following information inside a relational database: (i) users registered to the company website, who are potential job applicants; (ii) the different, available job categories; (iii) employees belonging to HR, together with the job categories they are competent in. To formalize these different aspects, we make use of a DB signature consisting of: (i) four id sorts, used to respectively identify users, employees, job categories, and the competence relationship connecting employees to job categories; (ii) one value sort containing strings used to name users and employees, and describe job categories. In addition, contains five function symbols mapping: (i) user identifiers to their corresponding names; (ii) employee identifiers to their corresponding names; (iii) job category identifiers to their corresponding descriptions; (iv) competence identifiers to their corresponding employees and job categories. The characteristic graph of is shown in Figure 1 (left part).
We close the formalization of DB schemas by discussing DB theories, whose role is to encode background axioms. We illustrate a typical background axiom, required to handle the possible presence of undefined identifiers/values in the different sorts. This axiom is essential to capture artifact systems whose working memory is initially undefined, in the style of [27, 37]. To specify an undefined value we add to every sort of a constant (written from now on, by abuse of notation, just as undef, used also to indicate a tuple). Then, for each function symbol of , we add the following axiom to the DB theory:
This axiom states that the application of to the undefined value produces an undefined value, and it is the only situation for which is undefined.
In the artifact-centric model in the style of [27, 37] that we intend to capture, the DB theory consists of Axioms (1) only. However, our technical results do not require this specific choice, and more general sufficient conditions will be discussed later. These conditions apply to natural variants of Axiom (1) (such variants might be used to model situations where we would like to have for instance many undefined values).
3.1 Relational View of DB Schemas
We now clarify how the algebraic, functional characterization of DB schema and instance can be actually reinterpreted in the classical, relational model. Definition 3.1 naturally corresponds to the definition of relational database schema equipped with single-attribute primary keys and foreign keys (plus a reformulation of constraint (1)). To technically explain the correspondence, we adopt the named perspective, where each relation schema is defined by a signature containing a relation name and a set of typed attribute names. Let be a DB schema. Each id sort corresponds to a dedicated relation with the following attributes: (i) one identifier attribute with type ; (ii) one dedicated attribute with type for every function symbol of the form .
The fact that is built starting from functions in naturally induces different database dependencies in . In particular, for each non-id attribute of , we get a functional dependency from to ; altogether, such dependencies in turn witness that is the (primary) key of . In addition, for each non-id attribute of whose corresponding function symbol has id sort as image, we get an inclusion dependency from to the id attribute of ; this captures that is a foreign key referencing .
Given a DB instance of , its corresponding relational instance is the minimal set satisfying the following property: for every id sort , let be all functions in with domain ; then, for every identifier , contains a labeled fact of the form . With this interpretation, the active domain of is the set
consisting of all (proper) identifiers assigned by to id sorts, as well as all values obtained in via the application of some function. Since such values are necessarily finitely many, one may wonder why in Definition 3.3 we allow for interpreting value sorts over infinite sets. The reason is that, in our framework, an evolving artifact system may use such infinite provision to inject and manipulate new values into the working memory. From the definition of active domain above, exploiting Axioms (1) we get that the membership of a tuple to a generic -ary relation with key dependencies (corresponding to an id sort ) can be expressed in our setting by using just unary function symbols and equality:
Hence, the representation of negated atoms is the one that directly follows from negating (2):
This relational interpretation of DB schemas exactly reconstructs the requirements posed by [27, 37] on the schema of the read-only database: (i) each relation schema has a single-attribute primary key; (ii) attributes are typed; (iii) attributes may be foreign keys referencing other relation schemas; (iv) the primary keys of different relation schemas are pairwise disjoint.
We stress that all such requirements are natively captured in our functional definition of a DB signature, and do not need to be formulated as axioms in the DB theory. The DB theory is used to express additional constraints, like that in Axiom (1). In the following subsection, we thoroughly discuss which properties must be respected by signatures and theories to guarantee that our verification machinery is well-behaved.
One may wonder why we have not directly adopted a relational view for DB schemas. This will become clear during the technical development. We anticipate the main, intuitive reasons. First, our functional view allows us to reconstruct in a single, homogeneous framework, some important results on verification of artifact systems, achieved on different models that have been unrelated so far [12, 27]. Second, our functional view makes the dependencies among different types explicit. In fact, our notion of characteristic graph, which is readily computed from a DB signature, exactly reconstructs the central notion of foreign key graph used in  towards the main decidability results. Finally, we underline, once again, that free -ary relation symbols can be added to our signatures (see Remark 3.1 and Definition 3.2 above) without compromising the results underlying our techniques.
In some situations, it is useful to have many undefined keys and possibly also incomplete relations with some undefined values. In such cases, then one can only assume the left-to-right side of (1), which is equivalent to the ground axiom
In order to preserve the condition of being a foreign key (i.e., the requirement that, for each non-id attribute of a relation whose corresponding function symbol has id sort as image, we want an inclusion dependency from to the id attribute of the relation ), the axioms
are also needed.
3.2 Formal Properties of DB Schemas
The theory from Definition 3.1 must satisfy few crucial requirements for our approach to work. In this section, we define such requirements and show that they are matched, e.g., when the signature is acyclic (as in ) and consists of Axioms (1) only. Actually, acyclicity is a stronger requirement than needed, which, however, simplifies our exposition.
Finite Model Property. A -formula is a -constraint (or just a constraint) iff it is a conjunction of literals. The constraint satisfiability problem for asks: given an existential formula (with a constraint333For the purposes of this definition, we may equivalently take to be quantifier-free.), are there a model of and an assignment to the free variables such that ?
We say that has the finite model property (for constraint satisfiability) iff every constraint that is satisfiable in a model of is satisfiable in a DB instance of .444This directly implies that is satisfiable also in a DB instance that interprets value sorts into finite sets. The finite model property implies decidability of the constraint satisfiability problem in case is recursively axiomatized. The following is proved in Appendix B:
has the finite model property in case is acyclic.
Quantifier Elimination. A -theory has quantifier elimination iff for every -formula there is a quantifier-free formula such that . It is known that quantifier elimination holds if quantifiers can be eliminated from primitive formulae, i.e., formulae of the kind , with a constraint. We assume that when quantifier elimination is considered, there is an effective procedure that eliminates quantifiers.
A DB theory does not necessarily have quantifier elimination; it is however often possible to strengthen in a conservative way (with respect to constraint satisfiability) and get quantifier elimination. We say that has a model completion iff there is a stronger theory (still within the same signature of ) such that (i) every -constraint satisfiable in a model of is also so in a model of ; (ii) has quantifier elimination. is called a model completion of .
has a model completion in case it is axiomatized by universal one-variable formulae and is acyclic.
In Appendix B we prove the above proposition and give an algorithm for quantifier elimination. This algorithm can be improved (and behaves much better than their linear arithmetics counterparts) using a suitable version of the Knuth-Bendix procedure  (studied in a dedicated paper , even if our mcmt implementation already partially takes into account such future development). Moreover, acyclicity is not needed in general: when, for instance, or when contains only Axioms (1), a model completion can be proved to exist, even if is not acyclic, by using the Knuth-Bendix version of the quantifier elimination algorithm.
Proposition 3.2 holds also for DB extended-schemas, in case the universal one-variable formulae do not involve the relation symbols (so, the relations are “free”): as explained in , our implementation of the quantifier elimination algorithm takes into account also this case. More generally, the model completion exists whenever we consider an acyclic DB extended-schema with a DB extended-theory that enjoys the amalgamation property.
Hereafter, we make the following assumption:
The DB theories we consider have decidable constraint satisfiability problem, finite model property, and admit a model completion.
This assumption is matched, for instance, in the following three cases: (i) when is empty; (ii) when is axiomatized by Axioms (1); (iii) when is acyclic and is axiomatized by finitely many universal one-variable formulae (such as Axioms (1),(4),(5), etc.).
Notice that the DB extended-schemas obtained by adding “free” relations to the DB schemas of (i), (ii), (iii) above match Assumption 3.4.
4 Relational Artifact Systems
We are now in the position to define our formal model of Relational Artifact Systems (RASs), and to study parameterized safety problems over RASs. Since RASs are array-based systems, we start by recalling the intuition behind them.
In general terms, an array-based system is described using a multi-sorted theory that contains two types of sorts, one accounting for the indexes of arrays, and the other for the elements stored therein. Since the content of an array changes over time, it is referred to using a second-order function variable, whose interpretation in a state is that of a total function mapping indexes to elements (so that applying the function to an index denotes the classical read operation for arrays). The definition of an array-based system with array state variable always requires: a formula describing the initial configuration of the array , and a formula describing a transition that transforms the content of the array from to . In such a setting, verifying whether the system can reach unsafe configurations described by a formula amounts to check whether the formula is satisfiable for some . Next, we make these ideas formally precise by grounding array-based systems in the artifact-centric setting.
The RAS Formal Model. Following the tradition of artifact-centric systems [26, 23, 12, 27], a RAS consists of a read-only DB, a read-write working memory for artifacts, and a finite set of actions (also called services) that inspect the relational database and the working memory, and determine the new configuration of the working memory. In a RAS, the working memory consists of individual and higher order variables. These variables (usually called arrays) are supposed to model evolving relations, so-called artifact relations in [27, 37]. The idea is to treat artifact relations in a uniform way as we did for the read-only DB: we need extra sort symbols (recall that each sort symbol corresponds to a database relation symbol) and extra unary function symbols, the latter being treated as second-order variables.
Given a DB schema , an artifact extension of is a signature obtained from by adding to it some extra sort symbols555By ‘signature’ we always mean ’signature with equality’, so as soon as new sorts are added, the corresponding equality predicates are added too.. These new sorts (usually indicated with letters ) are called artifact sorts (or artifact relations by some abuse of terminology), while the old sorts from are called basic sorts. In RAS, artifacts and basic sorts correspond, respectively, to the index and the elements sorts mentioned in the literature on array-based systems. Below, given and an artifact extension of , when we speak of a -model of , a DB instance of , or a -model of , we mean a -structure whose reduct to respectively is a model of , a DB instance of , or a model of .
An artifact setting over is a pair given by a finite set of individual variables and a finite set of unary function variables: the latter are required to have an artifact sort as source sort and a basic sort as target sort. Variables in are called artifact variables, and variables in artifact components. Given a DB instance of , an assignment to an artifact setting over is a map assigning to every artifact variable of sort an element and to every artifact component (with ) a set-theoretic function . In RAS, artifact components and artifact variables correspond, respectively, to arrays and constant arrays (i.e., arrays with all equal elements) mentioned in the literature on array-based systems.
We can view an assignment to an artifact setting as a DB instance extending the DB instance as follows. Let all the artifact components in having source be . Viewed as a relation in the artifact assignment , the artifact relation “consists” of the set of tuples . Thus each element of is formed by an “entry” (uniquely identifying the tuple) and by “data” taken from the read-only database . When the system evolves, the set of entries remains fixed, whereas the components may change: typically, we initially have , but these values are changed when some defined values are inserted into the relation modeled by ; the values are then repeatedly modified (and possibly also reset to undef, if the tuple is removed and is re-set to point to undefined values)666In accordance with mcmt conventions, we denote the application of an artifact component to a term (i.e., constant or variable) also as (standard notation for arrays), instead of ..
In order to introduce verification problems in the symbolic setting of array-based systems, one first has to specify which formulae are used to represent
sets of states,
the system initializations, and
system evolution. To introduce RASs we discuss the kind of formulae we use. In such formulae, we use notations like to mean that is a formula whose free individual variables are among the and whose free unary function variables are among the . Let be an artifact setting over , where are the artifact variables and are the artifact components (their source and target sorts are left implicit).
An initial formula is a formula of the form777Recall that abbreviates . , where , are constants from (typically, and are undef). A state formula has the form , where is quantifier-free and the are individual variables of artifact sorts. A transition formula has the form
where the are individual variables (of both basic and artifact sorts), (the ‘guard’) is quantifier-free, , are renamed copies of , , and the , (the ‘updates’) are case-defined functions. Transition formulae as above can express, e.g., (i) insertion (with/without duplicates) of a tuple in an artifact relation, (ii) removal of a tuple from an artifact relation, (iii) transfer of a tuple from an artifact relation to artifact variables (and vice-versa), and (iv) bulk removal/update of all the tuples satisfying a certain condition from an artifact relation. All the above operations can also be constrained: the formalization of the above operations in the formalism of our transition is straightforward (the reader can see all the details in Appendix F).
A Relational Artifact System (RAS) is
where: (i) is a (read-only) DB schema, (ii) is an artifact extension of , (iii) is an artifact setting over , (iv) is an intitial formula, and (v) is a disjunction of transition formulae.
We present here a RAS containing a multi-instance artifact accounting for the evolution of job applications. Each job category may receive multiple applications from registered users. Such applications are then evaluated, finally deciding which to accept or reject. The example is inspired by the job hiring process presented in  to show the intrinsic difficulties of capturing real-life processes with many-to-many interacting business entities using conventional process modeling notations (e.g., BPMN). An extended version of this example is presented in Appendix A.1.
As for the read-only DB, works over the DB schema of Example 3.1, extended with a further value sort used to score job applications. contains values in the range , where -1 denotes the non-eligibility of the application, and a score from 0 to 100 indicates the actual one assigned after evaluating the application. For readability, we use as syntactic sugar usual predicates , , and to compare variables of type .
As for the working memory, consists of two artifacts. The first single-instance job hiring artifact employs a dedicated variable to capture main phases that the running process goes through: initially, hiring is disabled (), and, if there is at least one registered user in the HR DB, becomes enabled. The second multi-instance artifact accounts for the evolution of of user applications. To model applications, we take the DB signature of the read-only HR DB, and enrich it with an artifact extension containing an artifact sort used to index (i.e., “internally” identify) job applications. The management of job applications is then modeled by an artifact setting with: (i) artifact components with domain capturing the artifact relation storing different job applications; (ii) additional individual variables as temporary memory to manipulate the artifact relation. Specifically, each application consists of a job category, the identifier of the applicant user and that of an HR employee responsible for the application, the application score, and the final result (indicating whether the application is accepted or not). These information slots are encapsulated into dedicated artifact components, i.e., function variables with domain that collectively realize the application artifact relation:
We now discuss the relevant transitions for inserting and evaluating job applications. When writing transition formulae, we make the following assumption: if an artifact variable/component is not mentioned at all, it is meant that is updated identically; otherwise, the relevant update function will specify how it is updated.888Non-deterministic updates can be formalized using existentially quantified variables in the transition. The insertion of an application into the system can be executed when the hiring process is enabled, and consists of two consecutive steps. To indicate when a step can be applied, also ensuring that the insertion of an application is not interrupted by the insertion of another one, we manipulate a string artifact variable . The first step is executable when is undef, and aims at loading the application data into dedicated artifact variables through the following simultaneous effects: (i) the identifier of the user who wants to submit the application, and that of the targeted job category, are selected and respectively stored into variables and ; (ii) the identifier of an HR employee who becomes responsible for the application is selected and stored into variable , with the requirement that such an employee must be competent in the job category targeted by the application; (iii) evolves into state received. Formally:
The second step transfers the application data into the application artifact relation (using its corresponding function variables), and resets all application-related artifact variables to undef (including , so that new applications can be inserted). For the insertion, a “free” index (i.e., an index pointing to an undefined applicant) is picked. The newly inserted application gets a default score of -1 (“not eligible”), and an undef final result:
Notice that such a transition does not prevent the possibility of inserting exactly the same application twice, at different indexes. If this is not wanted, the transition can be suitably changed so as to guarantee that no two identical applications can coexist in the same artifact relation (see Appendix A.1 for an example).
Each application currently considered as not eligible can be made eligible by assigning a proper score to it:
Finally, application results are computed when the process moves to state notified. This is handled by the bulk transition:
which declares applications with a score above 80 as winning, and the others as losing.
Parameterized Safety via Backward Reachability. A safety formula for is a state formula describing undesired states of . As usual in array-based systems, we say that is safe with respect to if intuitively the system has no finite run leading from to . Formally, there is no DB-instance of , no , and no assignment in to the variables such that the formula
is true in (here , are renamed copies of , ). The safety problem for is the following: given a safety formula decide whether is safe with respect to .
Algorithm 1 describes the backward reachability algorithm (or, backward search) for handling the safety problem for . An integral part of the algorithm is to compute symbolic preimages. For that purpose, we define for any and , as the formula . The preimage of the set of states described by a state formula is the set of states described by .999Notice that, when , then . in Line 6 is a subprocedure that extends the quantifier elimination algorithm of so as to convert the preimage of a state formula into a state formula (equivalent to it modulo the axioms of ), witnessing its regressability: this is possible since eliminates from primitive formulae the existentially quantified variables over the basic sorts, whereas elimination of quantified variables over artifact sorts is not possible, because these variables occur as arguments of artifact components (see Lemma D.1 and Lemma D.2 in Appendix D for more details). Algorithm 1 computes iterated preimages of and applies to them the above explained quantifier elimination over basic sorts, until a fixpoint is reached or until a set intersecting the initial states (i.e., satisfying ) is found.101010Inclusion (Line 2) and disjointness (Line 3) tests can be discharged via proof obligations to be handled by SMT solvers. The fixpoint is reached when the test in Line 2 returns unsat, which means that the preimage of the set of the current states is included in the set of states reached by the backward search so far. We obtain the following theorem, proved in Appendix D:
Backward search (cf. Algorithm 1) is effective and partially correct111111Partial correctness means that, when the algorithm terminates, it gives a correct answer. Effectiveness means that all subprocedures in the algorithm can be effectively executed. for solving safety problems for RASs.
Algorithm 1, to be effective, requires the availability of decision procedures for discharging the satisfiability tests in Lines 2-3. Thanks to the subprocedure , the only formulae we need to test in these lines have a specific form (i.e. -formulae121212As defined in Appendix D, we call -formulae the ones of the kind , where are variables whose sort is an artifact sort and is quantifier-free.). By our hypotheses in Assumption 3.4, we can freely assume that all the runs we are interested in take place inside models of (where we can eliminate quantifiers binding variables of basic sorts): in fact, a technical lemma (Lemma D.3) shows that formulae of the kind are satisfiable in a model of iff they are satisfiable in a DB instance iff they are satisfiable in a model of . The fact that a preimage of a state formula is a state formula is exploited to make both safety and fixpoint tests effective (in fact, we prove that the entailment between state formulae - and more generally satisfiability of sentences - can be decided via finite instantiation techniques). ∎
Theorem 4.2 shows that backward search is a semi-decision procedure: if the system is unsafe, backward search always terminates and discovers it; if the system is safe, the procedure can diverge (but it is still correct). Notice that the role of quantifier elimination (Line 6 of Algorithm 1) is twofold: (i) It allows to discharge the fixpoint test of Line 2 (see Lemma D.3). (ii) It ensures termination in significant cases, namely those where (strongly) local formulae, introduced in the next section, are involved.
5 Termination Results for RASs
We now present three termination results, two relating RASs to fundamental previous results, and one genuinely novel. All the proofs are given in the appendix.
Termination for “Simple” Artifact Systems. An interesting class of RASs is the one where the working memory consists only of artifact variables (without artifact relations). We call systems of this type SASs (Simple Artifact Systems). For SASs, the following termination result holds.
Let be a DB schema with acyclic. Then, for every SAS , backward search terminates and decides safety problems for in Pspace in the combined size of , , and .
We remark that Theorem 5.1 holds also for DB extended-schemas (so, even adding “free relations” to the DB signatures). Moreover, notice that it can be shown that every existential formula can be turned into the form of Formula (12). Furthermore, we highlight that the proof of the decidability result of Theorem 5.1 requires that the considered background theory : (i) admits a model completion; (ii) is locally finite, i.e., up to -equivalence, there are only finitely many atoms involving a fixed finite number of variables (this condition is implied by acyclicity); (iii) is universal; and (iv) enjoys decidability of constraint satisfiability. Conditions (iii) and (iv) imply that one can decide whether a finite structure is a model of . If (ii) and (iii) hold, it is well-known that (i) is equivalent to amalgamation . Moreover, (ii) alone always holds for relational signatures and (iii) is equivalent to being closed under substructures (this is a standard preservation theorem in model theory ). It follows that arbitrary relational signatures (or locally finite theories in general, even allowing -ary relation and -ary function symbols) require only amalgamability and closure under substructures. Thanks to these observations, Theorem 5.1 is reminiscent of an analogous result in , i.e., Theorem 5, the crucial hypotheses of which are exactly amalgamability and closure under substructures, although the setting in that paper is different (there, key dependencies are not discussed, whereas we are interested only in DB (extended-)theories).
In our first-order setting, we can perform verification in a purely symbolic way, using (semi-)decision procedures provided by SMT-solvers, even when local finiteness fails. As mentioned before, local finiteness is guaranteed in the relational context, but it does not hold anymore when arithmetic operations are introduced. Note that the theory of a single uninterpreted binary relation (i.e., the theory of directed graphs) has a model completion, whereas it can be easily seen that the theory of one binary relation endowed with primary key dependencies (i.e. the theory of a binary relation which is a partial function) has not, since it is not amalgamable. So, the second distinctive feature of our setting naturally follows from this observation: thanks to our functional representation of DB schemas (with keys), the amalgamation property, required by Theorem 5.1, holds, witnessing that our framework remains well-behaved even in the presence of key dependencies.
Termination with Local Updates. Consider an acyclic signature , a DB theory (satisfying our Assumption 3.4), and an artifact setting over an artifact extension of . We call a state formula local if it is a disjunction of the formulae
and strongly local if it is a disjunction of the formulae
In (8) and (9), is a conjunction of variable equalities and inequalities, , are quantifier-free, and are individual variables varying over artifact sorts. The key limitation of local state formulae is that they cannot compare entries from different tuples of artifact relations: each in (8) and (9) can contain only the existentially quantified variable .
A transition formula is local (resp., strongly local) if whenever a formula is local (resp., strongly local), so is (modulo the axioms of ). Examples of (strongly) local are discussed in Appendix F.
If is acyclic, backward search (cf. Algorithm 1) terminates when applied to a local safety formula in a RAS whose is a disjunction of local transition formulae.
Let be , i.e., expanded with function symbols and constants ( and are treated as symbols of , but not as variables anymore). We call a -structure cyclic131313This is unrelated to cyclicity of defined in Section 3, and comes from universal algebra terminology. if it is generated by one element belonging to the interpretation of an artifact sort. Since is acyclic, so is , and then one can show that there are only finitely many cyclic -structures up to isomorphism. With a -structure we associate the tuple of numbers counting the numbers of elements generating (as singletons) the cyclic substructures isomorphic to , respectively. Then we show that, if the tuple associated with is componentwise bigger than the one associated with , then satisfies all the local formulae satisfied by . Finally we apply Dikson Lemma . ∎
Note that Theorem 5.2 can be used to reconstruct the decidability results of  concerning safety problems. Specifically, one needs to show that transitions in  are strongly local which, in turn, can be shown using quantifier elimination (see Appendix F for more details). Interestingly, Theorem 5.2 can be applied to more cases not covered in . For example, one can provide transitions enforcing updates over unboundedly many tuples (bulk updates) that are strongly local (cf. Appendix F). One can also see that the safety problem for our running example is decidable since all its transitions are strongly local. Another case considers coverability problems for broadcast protocols [30, 25], which can be encoded using local formulae over the trivial one-sorted signature containing just one basic sort, finitely many constants and one artifact sort with one artifact component. These problems can be decided with a non-primitive recursive lower bound  (whereas the problems in  have an ExpSpace upper bound). Recalling that  handles verification of LTL-FO, thus going beyond safety problems, this shows that the two settings are incomparable. Notice that Theorem 5.2 implies also the decidability of the safety problem for SASs, in case of acyclic.
Termination for Tree-like Signatures. is tree-like if it is acyclic and all non-leaf nodes have outdegree 1. An artifact setting over is tree-like if is tree-like. In tree-like artifact settings, artifact relations have a single “data” component, and basic relations are unary or binary.
Backward search (cf. Algorithm 1) terminates when applied to a safety problem in a RAS with a tree-like artifact setting.
The crux is to show, using Kruskal’s Tree Theorem , that the finitely generated -structures are a well-quasi-order w.r.t. the embeddability partial order. ∎
While tree-like RAS restrict artifact relations to be unary, their transitions are not subject to any locality restriction. This allows for expressing rich forms of updates, including general bulk updates (which allow us to capture non-primitive recursive verification problems) and transitions comparing at once different tuples in artifact relations. Notice that tree-like RASs are incomparable with the “tree” classes of , since the former use artifact relations, whereas the latter only individual variables. In Appendix A we show the power of such advanced features in a flight management process example.
6 First experiments
We implemented a prototype of the backward reachability algorithm for RASs on top of the mcmt model checker for array-based systems. Starting from its first version , mcmt was successfully applied to a variety of settings: cache coherence and mutual exclusions protocols , timed  and fault-tolerant [6, 5] distributed systems, and imperative programs [7, 8]. Interesting case studies concerned waiting time bounds synthesis in parameterized timed networks  and internet protocols . Further related tools include safari , asasp , and Cubicle . The latter relies on a parallel architecture with further powerful extensions. The work principle of mcmt is rather simple: the tool generates the proof obligations arising from the safety and fixpoint tests in backward search (Lines 2-3 of Algorithm 1) and passes them to the background SMT-solver (currently it is Yices ). In practice, the situation is more complicated because SMT-solvers are quite efficient in handling satisfiability problems in combined theories at quantifier-free level, but may encounter difficulties with quantifiers. For this reason, mcmt implements modules for quantifier elimination and quantifier instantiation. A specific module for the quantifier elimination problems mentioned in Line 6 of Algorithm 1 has been added to Version 2.8 of mcmt.
We produced a benchmark consisting of eight realistic business process examples and ran it in mcmt (detailed explanations and results are given in Appendix G). The examples are partially made by hand and partially obtained from those supplied in . A thorough comparison with Verifas  is matter of future work, and is non-trivial for a variety of reasons. In particular, the two systems tackle incomparable verification problems: on the one hand, we deal with safety problems, whereas Verifas handles more general LTL-FO properties. On the other hand, we tackle features not available in Verifas, like bulk updates and comparisons between artifact tuples. Moreover, the two verifiers implement completely different state space construction strategies: mcmt is based on backward reachability and makes use of declarative techniques that rely on decision procedures, while Verifas employs forward search via VASS encoding.
The benchmark is available as part of the last distribution 2.8 of mcmt.141414http://users.mat.unimi.it/users/ghilardi/mcmt/, subdirectory /examples/dbdriven of the distribution. The user manual contains a new section (pages 36–39) on how to encode RASs in MCMT specifications. Table 1 shows the very encouraging results (the first row tackles Example 4.2). While a systematic evaluation is out of scope, mcmt effectively handles the benchmark with a comparable performance shown in other, well-studied systems, with verification times below 1s in most cases.
|Exp. #AC #AV #T Prop. Res. Time (sec) E1 9 18 15 E1P1 SAFE 0.06 E1P2 UNSAFE 0.36 E1P3 UNSAFE 0.50 E1P4 UNSAFE 0.35 E2 6 13 28 E2P1 SAFE 0.72 E2P2 UNSAFE 0.88 E2P3 UNSAFE 1.01 E2P4 UNSAFE 0.83 E3 4 14 13 E3P1 SAFE 0.05 E3P2 UNSAFE 0.06||Exp. #AC #AV #T Prop. Res. Time (sec) E4 9 11 21 E4P1 SAFE 0.12 E4P2 UNSAFE 0.13 E5 6 17 34 E5P1 SAFE 4.11 E5P2 UNSAFE 0.17 E6 2 7 15 E6P1 SAFE 0.04 E6P2 UNSAFE 0.08 E7 2 28 38 E7P1 SAFE 1.00 E7P2 UNSAFE 0.20 E8 3 20 19 E8P1 SAFE 0.70 E8P2 UNSAFE 0.15|
We have laid the foundations of SMT-based verification for artifact systems, focusing on safety problems and relying on array-based systems as underlying formal model. We have exploited the model-theoretic machinery of model completion to overcome the main technical difficulty arising from this approach, i.e., showing how to reconstruct quantifier elimination in the rich setting of artifact systems. On top of this framework, we have identified three classes of systems for which safety is decidable, which impose different combinations of restrictions on the form of actions and the shape of DB constraints. The presented techniques have been implemented on top of the well-established mcmt model checker, making our approach fully operational.
We consider the present work as the starting point for a full line of research dedicated to SMT-based techniques for the effective verification of data-aware processes, addressing richer forms of verification beyond safety (such as liveness, fairness, or full LTL-FO) and richer classes of artifact systems, (e.g., with concrete data types and arithmetics), while identifying novel decidable classes (e.g., by restricting the structure of the DB and of transition and state formulae). Implementation-wise, we want to build on the reported encouraging results and benchmark our approach using the Verifas system as a baseline, while incorporating the plethora of optimizations available in SMT-based model checking. Finally, we plan to tackle more conventional process modeling notations, in particular data-aware extensions of the de-facto standard BPMN.
-  P. A. Abdulla, C. Aiswarya, M. F. Atig, M. Montali, and O. Rezine. Recency-bounded verification of dynamic database-driven systems. In Proc. PODS, 2016.
-  F. Alberti, A. Armando, and S. Ranise. ASASP: Automated symbolic analysis of security policies. In Proc. CADE, pages 26–33, 2011.
F. Alberti, R. Bruttomesso, S. Ghilardi, S. Ranise, and N. Sharygina.
SAFARI: SMT-based abstraction for arrays with interpolants.In Proc. CAV, pages 679–685, 2012.
-  F. Alberti, R. Bruttomesso, S. Ghilardi, S. Ranise, and N. Sharygina. An extension of lazy abstraction with interpolation for programs with arrays. Formal Methods of System Design, 45(1):63–109, 2014.
-  F. Alberti, S. Ghilardi, E. Pagani, S. Ranise, and G. P. Rossi. Brief announcement: Automated support for the design and validation of fault tolerant parameterized systems - A case study. In Proc. DISC, pages 392–394, 2010.
-  F. Alberti, S. Ghilardi, E. Pagani, S. Ranise, and G. P. Rossi. Universal guards, relativization of quantifiers, and failure models in model checking modulo theories. JSAT, 8(1/2):29–61, 2012.
-  F. Alberti, S. Ghilardi, and N. Sharygina. Booster: An acceleration-based verification framework for array programs. In Proc. ATVA, pages 18–23, 2014.
-  F. Alberti, S. Ghilardi, and N. Sharygina. A framework for the verification of parameterized infinite-state systems. Fundamenta Informaticae, 150(1):1–24, 2017.
-  F. Baader and T. Nipkow. Term Rewriting and All That. Cambridge University Press, 1998.
-  B. Bagheri Hariri, D. Calvanese, G. De Giacomo, A. Deutsch, and M. Montali. Verification of relational data-centric dynamic systems with external services. In Proc. PODS, pages 163–174, 2013.
-  F. Belardinelli, A. Lomuscio, and F. Patrizi. An abstraction technique for the verification of artifact-centric systems. In Proc. KR, 2012.
-  M. Bojańczyk, L. Segoufin, and S. Toruńczyk. Verification of database-driven systems via amalgamation. In Proc. PODS, pages 63–74, 2013.
-  A. R. Bradley and Z. Manna. The Calculus of Computation - Decision Procedures with Applications to Verification. Springer, 2007.
-  D. Bruschi, A. Di Pasquale, S. Ghilardi, A. Lanzi, and E. Pagani. Formal verification of ARP (address resolution protocol) through SMT-based model checking - A case study. In Proc. IFM, pages 391–406, 2017.
-  R. Bruttomesso, A. Carioni, S. Ghilardi, and S. Ranise. Automated analysis of parametric timing-based mutual exclusion algorithms. In Proc. NFM, pages 279–294, 2012.
-  D. Calvanese, G. De Giacomo, and M. Montali. Foundations of data aware process analysis: A database theory perspective. In Proc. PODS, pages 1–12, 2013.
-  D. Calvanese, G. De Giacomo, M. Montali, and F. Patrizi. First-order mu-calculus over generic transition systems and applications to the situation calculus. Information and Computation, 2017.
-  D. Calvanese, S. Ghilardi, A. Gianola, M. Montali, and A. Rivkin. Quantifier elimination for database driven verification. Technical Report arXiv:1806.09686, arXiv.org, 2018.
-  A. Carioni, S. Ghilardi, and S. Ranise. MCMT in the land of parametrized timed automata. In Proc. VERIFY, pages 47–64, 2010.
-  A. Carioni, S. Ghilardi, and S. Ranise. Automated termination in model-checking modulo theories. Int. J. Found. Comput. Sci., 24(2):211–232, 2013.
-  C.-C. Chang and J. H. Keisler. Model Theory. North-Holland Publishing Co., 1990.
-  S. Conchon, A. Goel, S. Krstic, A. Mebsout, and F. Zaïdi. Cubicle: A parallel SMT-based model checker for parameterized systems - Tool paper. In Proc. CAV, pages 718–724, 2012.
-  E. Damaggio, A. Deutsch, and V. Vianu. Artifact systems with data dependencies and arithmetic. ACM TODS, 37(3):22, 2012.
-  E. Damaggio, R. Hull, and R. Vaculín. On the equivalence of incremental and fixpoint semantics for business artifacts with Guard-Stage-Milestone lifecycles. In Proc. BPM, 2011.
-  G. Delzanno, J. Esparza, and A. Podelski. Constraint-based analysis of broadcast protocols. In Proc. CSL, pages 50–66, 1999.
-  A. Deutsch, R. Hull, F. Patrizi, and V. Vianu. Automatic verification of data-centric business processes. In Proc. ICDT, pages 252–267, 2009.
-  A. Deutsch, Y. Li, and V. Vianu. Verification of hierarchical artifact systems. In Proc. PODS, pages 179–194, 2016.
-  M. Dumas. On the convergence of data and process engineering. In Proc. ADBIS, pages 19–26, 2011.
-  B. Dutertre and L. De Moura. The YICES SMT solver. Technical report, SRI International, 2006.
-  J. Esparza, A. Finkel, and R. Mayr. On the verification of broadcast protocols. In Proc. LICS, pages 352–359, 1999.
-  S. Ghilardi, E. Nicolini, S. Ranise, and D. Zucchelli. Towards SMT model checking of array-based systems. In Proc. IJCAR, pages 67–82, 2008.
-  S. Ghilardi and S. Ranise. Backward reachability of array-based systems by SMT solving: Termination and invariant synthesis. Logical Methods in Computer Science, 6(4), 2010.
-  S. Ghilardi and S. Ranise. MCMT: A model checker modulo theories. In Proc. IJCAR, pages 22–29, 2010.
-  R. Hull. Artifact-centric business process models: Brief survey of research results and challenges. In Proc. OTM, pages 1152–1163, 2008.
-  J. B. Kruskal. Well-quasi-ordering, the Tree Theorem, and Vazsonyi’s conjecture. Trans. Amer. Math. Soc., 95:210–225, 1960.
-  V. Künzle, B. Weber, and M Reichert. Object-aware business processes: Fundamental requirements and their support in existing approaches. Int. J. of Information System Modeling and Design, 2(2):19–46, 2011.
-  Y. Li, A. Deutsch, and V. Vianu. VERIFAS: A practical verifier for artifact systems. PVLDB, 11(3):283–296, 2017.
-  M. Reichert. Process and data: Two sides of the same coin? In Proc. OTM, pages 2–19, 2012.
-  C. Richardson. Warning: Don’t assume your business processes use master data. In Proc. BPM, pages 11–12, 2010.
-  A. Robinson. On the Metamathematics of Algebra. North-Holland Publishing Co., 1951.
-  S. Schmitz and P. Schnoebelen. The power of well-structured systems. In Proc. CONCUR, pages 5–24, 2013.
-  Bruce Silver. BPMN Method and Style. Cody-Cassidy, 2nd edition, 2011.
-  V. Vianu. Automatic verification of database-driven systems: a new frontier. In Proc. ICDT, pages 1–13, 2009.
-  William H. Wheeler. Model-companions and definability in existentially complete structures. Israel J. Math., 25(3-4):305–330, 1976.
Appendix A Examples
In this section, we present two full examples of RAS for which our backward reachability technique terminates. In particular, they are meant to highlight the expressiveness of our approach, even in presence of the restrictions imposed by Theorems 5.2 and 5.3 towards decidability of reachability. When writing transition formulae in the examples, we make the following assumption: when an artifact variable or component is not mentioned at all in a transition, it is meant that is updated identically; if it is mentioned, the relevant update function in the transition will specify how it is updated.151515Notice that non-deterministic updates can be formalized using the existential quantified variables in the transition.