The fundamental problem of verifying the correctness of business process models has been traditionally tackled by exclusively considering the control flow perspective. This means that correctness is assessed by only considering the ordering relations among activities present in the model. In this setting, one of the most investigated formal notions of correctness is that of soundness, originally introduced by van der Aalst in the context of workflow nets (a special class of Petri nets that is suitable to capture the control flow of business processes) . Intuitively, soundness guarantees the two good properties of “possibility of clean termination” and of “absence of deadlocks”. On the one hand, it ensures that whenever a process instance is being executed, it always has the possibility of reaching the completion of the process, and if it does so, then no running concurrent thread is still active in the process. On the other hand, it captures that all parts of the process can be executed in some scenario, that is, the process does not contain dead activities that are impossible to enact.
The control-flow perspective is certainly of high importance as it can be considered the main process backbone; however, many other perspectives should also be taken into account. In fact, the last decade has witnessed an increasing transformation in the design, engineering, and mining of processes, moving from a pure control-flow perspective to more integrated models where also data and decisions are explicitly considered. The fact that the incorporation of decisions within process models is gaining momentum is also testified by the recent introduction and development of the Decision Model and Notation (DMN), an OMG standard . This calls for methods and techniques able to ascertain the correctness of such integrated models, which is important not only during the design phase of the business process lifecycle, but also when it comes to decision and guard mining , as well as compliance checking .
Previous approaches to analyze correctness of decision-aware processes have typically focused on single decisions  or on the local interplay between decisions and their corresponding outgoing branches . More recent efforts have tackled this locality problem by holistically considering soundness of the overall, end-to-end process in the presence of data and decisions, but have mainly stayed at the foundational level [16, 2]. In particular, they do not come with actual techniques to effectively carry out the verification of soundness. In this work, we overcome this limitation by introducing a holistic, formal and operational approach to verify the end-to-end soundness of Data Petri nets (DPNs) . DPNs combine workflow nets with case data, decisions and conditional data updates, achieving a suitable balance between expressiveness and simplicity. Thanks to their solid formal foundation, DPNs come with a clear execution semantics, and consequently allow us to unambiguously extend the notion of soundness to incorporate the decision perspective. In addition, they combine the main ingredients that are needed to formally capture conventional process modelling notations, such as the combination of BPMN control- and data-flow with DMN decisions.
In the general case, verifying soundness of DPNs is undecidable, due to the presence of case data and the possibility of manipulating them so as to reconstruct Turing-powerful computational devices. This applies, in particular, when case data can be updated using arithmetical operators. We isolate here a decidable class of DPNs that employs both non-numerical and numerical domains, and is expressive enough to capture data-aware process models equipped with S-FEEL DMN decisions , such as those recently proposed in [3, 2]. Importantly, such DPNs cannot be directly analyzed algorithmically: due to the presence of data and corresponding updates, they in fact induce a state space that may have infinitely many states even when the control-flow is expressed by a bounded workflow net. To tame this infinity, we take inspiration from the technique of predicate abstraction , and in particular the approach adopted in , and present an effective technique that verifies soundness by translating the input net into a colored Petri net (CPN) with bounded color domains, which induces a finite state space and can be consequently analyzed using conventional tools. This technique has been implemented as a plug-in of the well-established ProM process mining framework.
The paper is organized as follows. In Section 2 we discuss related work; in Section 3 we provide the necessary background on DPNs and a precise formalisation of its execution semantics; In Section 4 we discuss the relation between DPNs and the DMN S-FEEL language; in Section 5 we illustrate an effective technique for translating a given DPN into a special colored Petri net with bounded color domains and which can be thus studied using standard tools, and then finally prove that we can analyze it to assess the properties of the original DPN, including soundness. Section 6 discusses the ProM implementation and reports on a number of experiments based on models of real-life processes, some of which were designed by hand and others were a combination of hand-design and of process discovery. Experiments show that the technique is operationally effective and can be applied to real-life case studies. Finally, Section 7 concludes the paper and delineates avenues of future research.
2 Related Work
Within the field of database theory, many approaches have been proposed to formalize and verify very sophisticated variants of data-aware processes , also considering data-aware extensions of soundness . However, such works are mainly foundational, and do not currently come with effective verification algorithms and implementations. Within the field of business process management and information systems engineering, a plethora of techniques and tools exists for verifying soundness of process models that only capture the control-flow perspective, but not much research has been carried out to incorporate the data and decision perspective in the analysis. Sadiq et al.  were among the first to acknowledge the importance of incorporating the data perspective within soundness analysis, but they did not propose any technique to carry out the verification. Sidorova et al. proposed a conceptual extension of workflow nets, equipping them with an abstract, high-level data model [19, 20]. In this approach, data are captured abstractly, and it is assumed that activities read and write entire guards, instead of reading and writing data variables that affect the satisfaction of guards. This abstract approach certainly simplifies the analysis because reading and writing guards is equivalent to reading and writing boolean values, which corresponds to a sort of a-priori propositionalization of the data. This is, however, not realistic: as testified by modern process modeling notations, such as BPMN and DMN, the data perspective requires (at least) to have data variables and full-fledged guards and updates over them.  focuses on single DMN decision, in particular verifying whether a DMN table is correct, or contains instead inconsistencies, missing and overlapping rules. This certainly fits in the context of data-aware soundness, but it is only a minor portion of it, since the analysis is only conducted locally to decision points in the process, and local forms of analysis do not suffice to guarantee good behavioral properties of the entire process . A similar drawback is also present in , where the contribution of decisions in the verification of soundness is also local, and limits itself to the interaction between decisions and their immediate outgoing sequence flows. As mentioned in the introduction, soundness verification plays a key role in decision and guard mining . In this setting, an initial process model is discovered by solely considering the control-flow perspective. In a second phase, decision points present in the model are enriched with decisions and conditions again inferred from the event data present in the log. This “local enrichment” does not guarantee that the overall model is indeed sound, so soundness verification techniques should be inserted in the loop to discard incorrect results and guide discovery towards the extraction of models that are correct by design.
The two closest works to our contribution are [2, 12]. In , the authors consider the interplay between BPMN and DMN, providing different notions of data-aware soundness on top of such process models (once the BPMN component is encoded into a Petri net, which can be seamlessly tackled by known techniques [9, 11]). As shown in Section 4, our approach is expressive enough to capture the process models studied in . In addition, our verification technique based on an encoding into CPNs does not only preserve the notion of soundness we introduce, but it actually guarantees that the obtained CPN is behaviorally equivalent to the input DPN. This, in turn, implies that all variants of soundness defined in  can be actually verified using this approach.
In , the authors introduce an abstraction approach that shares the same spirit of our technique: it is faithful (i.e., it preserves properties), and it is based on the idea of considering only boundedly many representative values in place of the entire data domains. There are however four fundamental differences between our setting and that of . First of all, in  abstractions are used to shrink the state space of the analysis, while in our case they are employed to tame the infinity brought by the presence of data and the possibility of updating them. Second,  defines abstract process graphs that do not come with a formal execution semantics, and consequently do not allow one to formally prove that the abstraction technique is indeed correct. Since our approach is expressive enough to capture the model of  (see Section 4.2), our correctness result captured in Section 5.3 can be actually lifted to  as well. Third,  focuses on compliance checking against LTL-based compliance rules, which are unable to capture soundness (in particular, the “possibility of termination”, which has an intrinsic branching nature); on the other hand, since our encoding produces a CPN that is behaviorally equivalent to the original DPN, it also preserves all the runs and, in turn, all LTL properties. Finally, while  translates the problem of compliance checking into a temporal model checking problem, we resort to Petri net-based techniques.
3 Syntax and Semantics of DPNs
We provide the necessary background on the DPN model , precisely defining its execution semantics and introducing a running example. We then lift the standard notion of soundness to the more sophisticated setting of DPNs. Assume an infinite universe of possible values .
Definition 1 (Domain)
A domain domain is a couple where is a set of possible values and is the set of binary predicates on .
We consider a set of domains , and in particular the notable domains , , , which, respectively, account for real numbers, integers, booleans, and strings ( denotes here the infinite set of all strings).
Consider a set of variables. Given a variable write or to denote that the variable is read or written, hence we consider two distinct sets and defined as and . When we do need to distinguish, we still use the symbol to denote any member of . To talk about the possible values variables may assume, we need to associate domains to variables. If a variable is assigned a domain , for brevity we denote by the corresponding typed variable, that is a shorthand to specify that can only assume values in .
Variables provide the basic building block to define logical conditions (formally, guards) on data.
Definition 2 (Guards)
Given a set of typed variables for a set , the set of possible guards is the largest set containing the following:
iff and ;
and iff and are guards in .
A variable assignment is a function , which assigns a value to read and written variables, with the restriction that is a possible value for , that is if is the corresponding typed variable then . The symbol is used to denote an undefined value, i.e., that the variable is not set. Given a variable assignment and a guard , we say that evaluates to true when variables are substituted as per , written , iff:
if then ;
if , then for ;
if then and ;
if then or .
In words, a guard is satisfied by evaluating it after assigning values to read and written variables, as specified by . We can now define our DPNs.
A state variable assignment, abbreviated hereafter as SV assignment, is instead a function , which assigns values to each variable , with the restriction that . Note that this is different from variable assignments, which are defined over . We can now define DPNs.
Definition 3 (Data Petri Net)
Let be the set of process variables. A Data Petri Net (DPN) is a Petri net with additional components, used to describe the additional perspectives of the process model:
is a finite set of process variables;
is a function assigning a domain to each ;
is the initial SV assignment;
returns the set of variable read by a transition;
returns the set of variable written by a transition;
returns a guard associated with the transition, so that appears in only if , and appears in only if , for every .
3.1 Execution Semantics
By considering the usual semantics for the underlying Petri net together with the guards associated to each of its transitions, we define the resulting execution semantics for DPNs. First, let as above be a DPN. Then the set of possible states of is formed by all pairs where:
111 indicates the set of all multisets of elements of , i.e., is the marking of the Petri net , and
is a SV assignment, defined as in the previous section.
In any state, zero or more transitions of a DPN may be able to fire. Firing a transition updates the marking, reads the variables specified in and selects a new, suitable value for those in . We model this through a variable assignment for the transition, which assigns a value to all and only those variables that are read or written. A pair is called transition firing.
Definition 4 (Legal transition firing)
A DPN evolves from state to state via the transition firing with iff:
if : assigns values as for read variables;
the new SV assignment is s.t.
namely the new SV assignment is as but updated as per ;
is valid, namely : the guard is satisfied under ;
each input place of contains at least one token: for any .
the new marking is calculated as usual, namely .
We denote this by writing . We extend this to sequences of legal transition firings, called traces, an denote the corresponding run by or equivalently by . By restricting to the initial marking of a DPN together with the initial variable assignment , we define the process traces of as the set of sequences as above, of any length, such that for some and , and the trace set of as the set of process traces such that for some , where is the final marking of .
Figure 1 shows a DPN representing a process for managing credit requests and corresponding loans. The DPN employs two case variables, and , respectively used to capture whether the credit request is accepted or not, and what is the requested amount. The process starts by acquiring the amount of the credit request (thus writing ), which must be positive. Then a verification step is performed, determining whether to accept of reject the request (thus writing ). In the rejection case, a new verification may be performed provided that the requested amount exceeds euros (skip assessment followed by renegotiate request). In the acceptance case, depending on the requested amount, a simple or advanced assessment is performed. The second phase of the process then deals, concurrently, with the opening of a loan (which can only be executed if the request is accepted), and with a communication sent to the customer, which depends again on the combination of data hold by the case variables. In Figure 2 we compactly represent some run fragments. As shown in the figure, the number of legal runs is infinite (e.g., the number of possible values for is infinite) and also their length may be unbounded (due to cycles in the process).
We are interested in characterising properties of DPNs. For this reason, is it useful to compare these nets by looking at their behaviour, i.e. their trace set.
This is achieved in two steps. We first define the notion of trace-equivalence, which will also be helpful for proving our results.
Definition 5 (Trace-equivalence between DPNs)
Given two runs and of two DPNs and , respectively, these runs are trace-equivalent iff and for any we have that , namely the transitions are the same.
Similarly, two DPNs and are trace-equivalent iff for every legal run of there exists a trace-equivalent run of and vice-versa.
Note that for any DPN, given a state and a legal transition firing from that state, there exists exactly one successor state such that , namely the DPN is transition-deterministic (for a given binding). As a consequence, two runs that are trace-equivalent also traverse the same markings, namely , .
3.2 Data-aware Soundness
We now lift the standard notion of soundness  to the case of DPNs. This requires to quantify not only over the markings of the net, but also on the assignments of its case variables, thus making soundness data-aware (we use ‘data-aware’ to distinguish our notion from the one of decision-aware soundness in the literature – see Section5.4). In what follows, we write to implicitly quantify existentially on sequences .
Definition 6 (Data-aware soundness)
A DPN is data-aware sound iff the following properties hold:
The first condition checks the reachability of the output state, that is, whether it is always possible to reach the final marking of by suitably choosing a continuation of the current run (i.e., transitions and variable assignments). The second condition captures that the output state is reached in a clean way, i.e., that cannot reach the final marking while in addition having other tokens in other places. The third condition verifies the absence of dead transitions, where a transition is considered dead if there is no way of assigning the case variables so as to enable it.
Consider again the DPN in Figure 1. Such a DPN is unsound for a number of reasons, related to the concurrent section in the second phase of the process. Suppose that the verification step assigns to false. Once the execution assigns a token to , and the following AND-split transition is fired, two tokens are produced, respectively placing them in and . Since the guard of open credit loan is false, token cannot be consumed, and thus it is not possible to properly complete the execution. In addition, if the requested amount is less than , the same occurs also for the token placed in .
4 Modeling with DPNs
From now on, we always consider DPNs working over the notable set of domains introduced at the beginning of Section 3. We show that this class of DPNs is expressive enough to directly incorporate in the model decisions expressed using the OMG standard DMN S-FEEL language [1, 5]. Specifically, we first discuss how DPNs can be enriched with such decision constructs, arguing that the so-obtained extended model captures those studied in the literature [3, 2]. We then show that such an extension is syntactic sugar, as it can be encoded back into standard DPNs. This implies that the results presented in this paper can be seamlessly used to formalize the interesting decision-aware process models studied in [3, 2], and check their soundness considering the different variants of soundness as defined in , as we will show in Section 5.4.
4.1 DPNs with DMN Decisions
The integration of DMN decision with models capturing the control flow of a process, such as workflow nets, has been recently studied in [3, 2]. As argued in [3, 2], using Petri nets to capture the process control flow does not incur in loss of generality: the integration can be in fact conceptually captured at a higher level of abstraction, such as that of the combination of DMN with BPMN, then applying standard control-flow translation mechanisms  to encode the control flow of the input BPMN model into a corresponding Petri net.
The standard way of incorporating a DMN decision into a BPMN process is to introduce a business rule task in the process. This task, in turn, is linked to the DMN decision. Whenever the business rule task is reached during the execution of a process instance, the inputs of the decision are bound to specific values, and the corresponding output result is calculated and incorporated into the state of the process instance for further use. This also corresponds to the notion of decision fragment in . In the context of DPNs, the natural incorporation of a DMN decision consequently amounts to introduce a special decision transition that is linked to a DMN decision. Since DPNs are natively equipped with case data, we assume that the inputs and outputs of the decision coincide with (some of) the case variables of the DPN.
Consider a variant of the DPN shown in Figure 1, where we want to explicitly track the type of assessment that must be conducted on a given credit request, from place . Therefore, we can transform the three transitions from into the rows of a decision table, and use an additional case variable , of type string, as output of the table and consequently in the conditions of the branches of the split-gateway, as shown in Figure 4. Such a variable can be assigned to one among the strings , , , respectively indicating no assessment, normal assessment, and advanced assessment. To do so, we extract the decision logic distributed over the outgoing arcs from place in Figure 1, and combine the conditions therein into a single DMN decision, which indicates how the value is computed depending on the values of the two input variables and . Then, we attach this DMN decision to a dedicated decision transition, which is in turn inserted in the net between the verification and assessment steps. Finally, we update the three assessment transitions, associating each of them to its corresponding value for . The resulting decision fragment is shown in Figure 4.
This extension of DPNs with DMN-based decision transitions captures the decision-aware models recently studied in [3, 2]. On the one hand, we reconstruct the decision transitions defined there. On the other hand, we explicitly account for case variables and for (guarded) updates of their values, introducing a source of nondeterminism that depends on picking a new value for the updated variable among a possibly infinite set of potential values.
When considering BPMN as an input specification language, we produce a corresponding DPN as follows:
For each data object name in the BPMN model, we introduce a case variable with the same name. We only deal with data object collections whose (largest) size is known a-priori, so that a dedicated case variable is produced for each element of the collection.
Whenever a BPMN activity connects to a data object with name , we ensure that the corresponding DPN transition writes the variable mirroring that data object, i.e., we set its guard to be the formula ;
If the BPMN diagram predicates over the states of an object with name , we introduce in the DPN a “state” variable of type string, to keep track of the current state of .
If a BPMN activity requires an object with name to be in a given state prior execution, we guard the corresponding DPN transition with condition .
If a BPMN activity updates an object with name to state upon completion, we guard the corresponding DPN transition with condition .
4.2 Encoding DPNs with DMN Decisions to Normal DPNs
We now show that the DMN S-FEEL extension proposed in Section 4.1 is actually syntactic sugar, in the sense that its induced decision logic can be mimicked by a normal DPN. In what follows, we restrict the attention to decision tables with unique hit policies, although other policies can be considered as well by introducing a case variable for each subset of possible outputs of the decision table. This however generates a combinatorial explosion.
We describe here the transformation intuitively, because a formal description would be too cumbersome. Consider a DPN extended with DMN decision transitions. Intuitively, we need to transform the application of each rule in the decision table, together with the successive branch in the split-gateway which covers it, into a simple transition with encoding all the condition of the rule on both input variables (read variables) and output variables (written variables). In Figure 4 we show an intuitive example. Notice that, whenever there exist more than one decision tasks in that are possible from the same place, to correctly preserve the independence of these tasks (and that of their decision tables), we need to introduce internal transitions.
5 Soundness Verification
Coloured Petri Nets (CPNs) are an extension to Data Petri Nets that have a better support for time and resource . Furthermore, CPNs can be simulated through CPN Tools , which makes it possible to build on existing techniques to compute soundness. Differently from Data Petri Nets where variables are global, CPNs encode the data aspects in the tokens, allowing tokens to have a data value, called color, attached to them. Each place in a CPNs usually contain tokens of one type, and this type is called color set of the place.
Figure 5 illustrates a CPNs. Differently from Petri nets and DPNs, tokens are associated with values (e.g. low or high in our example). When a transition fires, e.g. check_low, tries to consume one of the tokens, e.g. the token with value high, and assign the token value to the variable on the arc, i.e. variable takes on value high. This variable assignment (a.k.a. binding) is valid if it does not violate the possible guard. In the example, the guard states the must be given value low. This means that tokens with value high cannot be consumed by transition check_low. Conversely, tokens with value high can be consumed by transition check_high. All places in this example of CPN are allowed to contain tokens associated with an enumerated type , with the latter being the so-called color set associated with every place of this CPN.
Definition 7 provides a definition of a CPN, which is a simplifying version of the original definition to keep the explanation simple. Yet, it covers all the cases necessary in this paper. It is worth highlighting that tokens can also be associated with no values. To cover this case, we introduce the colorset , which namely corresponds to black tokens in normal Petri nets.
Definition 7 (Cpn)
A CPN is a tuple where:
are sets of places, transitions and direct arcs, respectively;
is a set of color sets defined within the CPN model and a set of variables;
is a color function from places to a color set in ;
is a node function that maps each arc to either a pair indicating that the arc is between a place to a , or indicating that the arc connects to ;
is an arc expression function, assigning variables to arcs;
is a guard function that maps each transition to an expression with the additional constraint that can only employ variables with which arcs entering are annotated: ;
is an initialisation function assigning color values to places. For a place , indicates the color of the tokens in at the initial marking, with .
Variable is a special variable that is intended to only take on one value, namely . In general, for any arc , expression can be more complex than just being a single variable. However, this simplification covers all the cases of arc’s expressions we consider here. The concept of a marking can be easily extended to CPN as where is a multiset of elements, each of which it is the data (a.k.a. color in CPN) associated to a different token in .
A CPN run is of the form where where, for all , is the so-called binding function. Function is defined over the set of variables of the arcs entering transition . When firing transition in marking , only legal bindings are possible. A binding is legal for a transition if:222In the remainder, given a transition , we denote and
Each variable associated with an arc s.t. for some is in the domain of :
takes on a value that is associated with one of the tokens in every place that has an arc to that is annotated with : , s.t. with and , .
The guard of evaluates to true when variables are substituted as per :
Firing with in marking leads to a marking , denoted as , that is constructed as follows:333Notation denotes the arc s.t. and cannot be employed if such an arc does not exist. Set-difference operator is overridden for multisets: given two multisets and , for each element with cardinality in and cardinality in , the cardinality of in is ; moreover, .
A firing is legal if is a valid binding of . A CPN run is legal if it is a sequence of legal firings.
5.1 Translating DPNs into Colored Petri Nets
This section illustrates how a DPN can be converted into a CPN . Intuitively, as exemplified in Figure 6, the transitions and places of the DPN become transitions and places of the CPN. Each variable of the DPN becomes one variable place that is associated with the same colorset as the variable type of (place in example in Figure 6 (right). These places always contain exactly one token, holding the current value of the variable. Guards are exactly the same as the guards of the CPN, and if a transition writes a variable , the token in the variable place for is consumed and a new token generated to model that is the value of is updated. For instance, the fact that transition of the DPN Figure 6 (left) writes a new value for variable (denoted ) is modelled in Figure 6 (right) through the two red arcs annotated with and that respectively enters and exits transition : this allows the token holding the value of to change value when returned back to the place. The read operations can be modelled as the blue arcs as in Figure 6 (right), with the same annotation, so that the token from the variable place is consumed and then put back. The initial marking of the DPN becomes part of the initial marking of the CPN: each variable place is initialized with a token that holds the initial value of the variable. In Figure 6 (right), the place contains a token with value , assuming . The following formalizes this intuition.
Places. The places of the CPN consist of all places of the DPN, plus one dedicated extra place , hereafter called variable place, for each DPN variable ;
. A variable place always has one token, and precisely the one holding the current value of variable at each step of the simulation of the CPN.
Transitions. The transitions of the CPN and DPN are the same: .
Arcs. Each arc in is preserved, and for any transition and variable read and/or written in , we add two extra arcs: , and the node function is defined as for any .
Color sets. The CPN supports the same variable types as the DPN, and we consider the color sets corresponding to the domains defined at the beginning of Section 3 for integers, reals, booleans and strings, respectively. Variables. For each variable the CPN considers the variables and , i.e., , where
is the special dummy variable with the only possible value.
Color functions. Recalling the shorthand notation for typed variables in , each place is associated with a color set as follows. If then , otherwise:
Guards. Guards are not changed: for each .
Arc expressions. The expression associated with any arc between a source node and a target with is as follows. If then , otherwise:
The first case refers to arcs of the CPN that are also present in the original DPN (e.g. in the set of arcs ); the places involved in these arcs contain tokens with no value associated and, which we represent by , and thus the arcs are annotated with the variable. The remaining cases refer to arcs connecting the variable places for each to a transition . If is written by then the incoming arc and the outgoing arc are annotated with and , respectively. This allows the token holding the value of to change value when returned back to .
If instead is not written by then both arcs are annotated with the same inscription , guaranteeing that the value of token does not change.
Initialization. Let be the initial marking of the DPN. Places that are also in the DPN take on the same number of tokens as in the DPN, whereas each variable place is initialized with a token holding the value specified by the initial SV assignment of the DPN. Namely, if , i.e., is a place in the original net, otherwise where