Enhancing workflow-nets with data for trace completion

The growing adoption of IT-systems for modeling and executing (business) processes or services has thrust the scientific investigation towards techniques and tools which support more complex forms of process analysis. Many of them, such as conformance checking, process alignment, mining and enhancement, rely on complete observation of past (tracked and logged) executions. In many real cases, however, the lack of human or IT-support on all the steps of process execution, as well as information hiding and abstraction of model and data, result in incomplete log information of both data and activities. This paper tackles the issue of automatically repairing traces with missing information by notably considering not only activities but also data manipulated by them. Our technique recasts such a problem in a reachability problem and provides an encoding in an action language which allows to virtually use any state-of-the-art planning to return solutions.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

09/27/2019

Verification of data-aware workflows via reachability: formalisation and experiments

The growing adoption of IT-systems for the modelling and execution of (b...
04/02/2020

Efficient Conformance Checking using Alignment Computation with Tandem Repeats

Conformance checking encompasses a body of process mining techniques whi...
06/06/2022

Verifying generalised and structural soundness of workflow nets via relaxations

Workflow nets are a well-established mathematical formalism for the anal...
10/05/2021

Visualizing Trace Variants From Partially Ordered Event Data

Executing operational processes generates event data, which contain info...
09/16/2017

Process-oriented Iterative Multiple Alignment for Medical Process Mining

Adapted from biological sequence alignment, trace alignment is a process...
11/23/2020

Conformance Checking of Mixed-paradigm Process Models

Mixed-paradigm process models integrate strengths of procedural and decl...
10/21/2020

Shedding Light on Blind Spots: Developing a Reference Architecture to Leverage Video Data for Process Mining

Process mining is one of the most active research streams in business pr...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The use of IT systems for supporting business activities has brought to a large diffusion of process mining techniques and tools that offer business analysts the possibility to observe the current process execution, identify deviations from the model, perform individual and aggregated analysis on current and past executions.

Figure 1: The three types of process mining.

According to the process mining manifesto, all these techniques and tools can be grouped in three basic types: process discovery, conformance checking and process enhancement (see Figure 1), and require in input an event log and, for conformance checking and enhancement, a (process) model. A log, usually described in the IEEE standard XES format111http://www.xes-standard.org/, is a set of execution traces (or cases) each of which is an ordered sequence of events carrying a payload as a set of attribute-value pairs. Process models instead provide a description of the scenario at hand and can be constructed using one of the available Business Process Modeling Languages, such as BPMN, YAWL and Declare.

Event logs are therefore a crucial ingredient to the accomplishment of process mining. Unfortunately, a number of difficulties may hamper the availability of event logs. Among these are partial event logs, where the execution traces may bring only partial information in terms of which process activities have been executed and what data or artefacts they produced. Thus repairing incomplete execution traces by reconstructing the missing entries becomes an important task to enable process mining in full, as noted in recent works such as [17, 8]. While these works deserve a praise for having motivated the importance of trace repair and having provided some basic techniques for reconstructing missing entries using the knowledge captured in process models, they all focus on event logs (and process models) of limited expressiveness. In fact, they all provide techniques for the reconstruction of control flows, thus completely ignoring the data flow component. This is a serious limitation, given the growing practical and theoretical efforts to extend business process languages with the capability to model complex data objects, along with the traditional control flow perspective [6].

In this paper we show how to exploit state-of-the-art planning techniques to deal with the repair of data-aware event logs in the presence of imperative process models. Specifically we will focus on the well established Workflow Nets [20], a particular class of Petri nets that provides the formal foundations of several process models, of the YAWL language and have become one of the standard ways to model and analyze workflows. In particular we provide:

  1. a modeling language DAW-net, an extension of the workflow nets with data formalism introduced in [18] so to be able to deal with even more expressive data (Section 3);

  2. a recast of data aware trace repair as a reachability problem in DAW-net (Section 0.C);

  3. a sound and complete encoding of reachability in DAW-net in a planning problem so to be able to deal with trace repair using planning (Section 5).

The solution of the problem are all and only the repairs of the partial trace compliant with the DAW-net model. The advantage of using automated planning techniques is that we can exploit the underlying logic language to ensure that generated plans conform to the observed traces without resorting to ad hoc algorithms for the specific repair problem. The theoretical investigation presented in this work provides an important step forward towards the exploitation of mature planning techniques for the trace repair w.r.t. data-aware processes.

2 Preliminaries

2.1 The Workflow Nets modeling language

Petri Nets (PN) is a modeling language for the description of distributed systems that has widely been applied to the description and analysis of business processes [1].The classical PN is a directed bipartite graph with two node types, called places and transitions, connected via directed arcs. Connections between two nodes of the same type are not allowed.

Definition 1 (Petri Net)

A Petri Net is a triple where is a set of places; is a set of transitions; is the flow relation describing the arcs between places and transitions (and between transitions and places).

Figure 2: A Petri Net.

The preset of a transition is the set of its input places: . The postset of is the set of its output places: . Definitions of pre- and postsets of places are analogous.

Places in a PN may contain a discrete number of marks called tokens. Any distribution of tokens over the places, formally represented by a total mapping , represents a configuration of the net called a marking. PNs come with a graphical notation where places are represented by means of circles, transitions by means of rectangles and tokens by means of full dots within places. Figure 2 depicts a PN with a marking , , . The preset and postset of are and , respectively.

T1:ask application documents

T2:send student application

T3:send worker application

T4:fill student request

T5:fill worker request

T6:local credit officer approval

T7:senior credit officer approval

T8:bank credit committee approval

T9

T10:send approval to customer

T11:store approval in branch

T12: issue loan

loanType=s

loanType=w

request

request

else
Figure 3: A process as a Petri Net.

Process tasks are modeled in PNs as transitions while arcs and places constraint their ordering. For instance, the process in Figure 3222For the sake of simplicity we only focus here on the, so-called, happy path, that is the successful granting of the loan. exemplifies how PNs can be used to model parallel and mutually exclusive choices, typical of business processes: sequences T2;T4-T3;T5 and transitions T6-T7-T8 are indeed placed on mutually exclusive paths. Transitions T10 and T11 are instead placed on parallel paths. Finally, T9 is needed to prevent connections between nodes of the same type.

The expressivity of PNs exceeds, in the general case, what is needed to model business processes, which typically have a well-defined starting point and a well-defined ending point. This imposes syntactic restrictions on PNs, that result in the following definition of a workflow net (WF-net) [1].

Definition 2 (WF-net)

A PN is a WF-net if it has a single source place , a single sink place , and every place and every transition is on a path from start to end, i.e., for all , and , where is the reflexive transitive closure of .

A marking in a WF-net represents the workflow state of a single case. The semantics of a PN/WF-net, and in particular the notion of valid firing, defines how transitions route tokens through the net so that they correspond to a process execution.

Definition 3 (Valid Firing)

A firing of a transition from to is valid, in symbols , iff

  1. is enabled in , i.e., ; and

  2. the marking is such that for every :

Condition 1. states that a transition is enabled if all its input places contain at least one token; 2. states that when fires it consumes one token from each of its input places and produces one token in each of its output places.

A case of a WF-Net is a sequence of valid firings where is the marking indicating that there is a single token in .

Definition 4 (-safeness)

A marking of a PN is -safe if the number of tokens in all places is at most . A PN is -safe if the initial marking is -safe and the marking of all cases is -safe.

From now on we concentrate on -safe nets, which generalize the class of structured workflows and are the basis for best practices in process modeling [11]. We also use safeness as a synonym of 1-safeness. It is important to notice that our approach can be seamlessly generalized to other classes of PNs, as long as it is guaranteed that they are -safe. This reflects the fact that the process control-flow is well-defined (see [10]).

Reachability on Petri Nets. The behavior of a PN can be described as a transition system where states are markings and directed edges represent firings. Intuitively, there is an edge from to labeled by if is a valid firing. Given a “goal” marking , the reachability problem amounts to check if there is a path from the initial marking to . Reachability on PNs (WF-nets) is of enormous importance in process verification as it allows for checking natural behavioral properties, such as satisfiability and soundness in a natural manner [2].

2.2 Trace repair

One of the goals of process mining is to capture the as-is processes as accurately as possible: this is done by examining event logs that can be then exploited to perform the tasks in Figure 1. In many cases, however, event logs are subject to data quality problems, resulting in incorrect or missing events in the log. In this paper we focus on the latter issue addressing the problem of repairing execution traces that contain missing entries (hereafter shortened in trace repair).

The need for trace repair is motivated in depth in [17], where missing entities are described as a frequent cause of low data quality in event logs, especially when the definition of the business processes integrates activities that are not supported by IT systems due either to their nature (e.g. they consist of human interactions) or to the high level of abstraction of the description, detached from the implementation. A further cause of missing events are special activities (such as transition T9 in Figure 3) that are introduced in the model to guarantee properties concerning e.g., the structure of the workflow or syntactic constraints, but are never executed in practice.

The starting point of trace repair are execution traces and the knowledge captured in process models. Consider for instance the model in Figure 3 and the (partial) execution trace {T3, T7}. By aligning the trace to the model using a replay-based approach or a planning based approach, the techniques presented in [17] and [8] are able to exploit the events stored in the trace and the control flow specified in the model to reconstruct two possible repairs:

Consider now a different scenario in which the partial trace reduces to {}. In this case, by using the control flow in Figure 3 we are not able to reconstruct whether the loan is a student loan or a worker loan. This increases the number of possible repairs and therefore lowers the usefulness of trace repair. Assume nonetheless that the event log conforms to the XES standard and stores some observed data attached to (enclosed in square brackets):

If the process model is able to specify how transitions can read and write variables, and furthermore some constraints on how they do it, the scenario changes completely. Indeed, assume that transition T4 is empowered with the ability to write the variable with a value smaller or equal than 30k (being this the maximum amount of a student loan). Using this fact, and the fact that the request examined by T7 is greater than 30k, we can understand that the execution trace has chosen the path of the worker loan. Moreover, if the model specifies that variable loanType is written during the execution of T1, when the applicant chooses the type of loan she is interested to, we are able to infer that T1 sets variable loanType to w. This example, besides illustrating the idea of trace repair, also motivates why data are important to accomplish this task, and therefore why extending repair techniques beyond the mere control flow is a significant contribution to address data quality problems in event logs.

2.3 The planning language

The main elements of action languages are fluents and actions. The former represent the state of the system which may change by means of actions. Causation statements describe the possible evolution of the states, and preconditions associated to actions describe which action can be executed according to the current state. A planning problem in  [9] is specified using a Datalog-like language where fluents and actions are represented by literals (not necessarily ground). The specification includes the list of fluents, actions, initial state and goal conditions; also a set of statements specifies the dynamics of the planning domain using causation rules and executability conditions. The semantics of borrows heavily from Answer Set Programming (ASP) paradigm. In fact, the system enables the reasoning with partial knowledge and provides both weak and strong negation.

A causation rule is a statement of the form

    caused  if ,, , not , , not 
               after ,, , not , , not .

The rule states that is true in the new state reached by executing (simultaneously) some actions, provided that are known to hold while are not known to hold in the previous state (some of the might be actions executed on it), and are known to hold while are not known to hold in the new state. Rules without the after part are called static.

An executability condition is a statement of the form

    executable  if ,, , not , , not .

Informally, such a condition says that the action is eligible for execution in a state, if are known to hold while are not known to hold in that state.

Terms in both kind of statements could include variables (starting with capital letter) and the statements must be safe in the usual Datalog meaning w.r.t. the first fluent or action of the statements.

A planning domain PD is a tuple where is a finite set of action and fluent declarations and a finite set of rules, initial state constraints, and executability conditions.

The semantics of the language is provided in terms of a transition system where the states are ASP models (sets of atoms) and actions transform the state according to the rules. A state transition is a tuple where are states and is a set of action instances. The transition is said to be legal if the actions are executable in the first state and both states are the minimal ones that satisfy all causation rules. Semantics of plans including default negation is defined by means of a Gelfond-Lifschitz type reduction to a positive planning domain. A sequence of state transitions , , is a trajectory for PD, if is a legal initial state of PD and all , are legal state transitions of PD.

A planning problem is a pair of planning domain PD and a ground goal
, not , , not  that is required to be satisfied at the end of the execution.

3 Framework

In this section we suitably extend WF-nets to represent data and their evolution as transitions are performed. In order for such an extension to be meaningful, i.e., allowing reasoning on data, it has to provide: (i) a model for representing data; (ii) a way to make decisions on actual data values; and (iii) a mechanism to express modifications to data. Therefore, we enhance WF-nets with the following elements:

  • a set of variables taking values from possibly different domains (addressing (i));

  • queries on such variables used as transitions preconditions (addressing (ii))

  • variables updates and deletion in the specification of net transitions (addressing (iii)).

Our framework follows the approach of state-of-the-art WF-nets with data [18, 12], from which it borrows the above concepts, extending them by allowing reasoning on actual data values as better explained in Section 6.

Throughout the section we use the WF-net in Figure 3 extended with data as a running example.

3.1 Data Model

As our focus is on trace repair, we follow the data model of the IEEE XES standard for describing logs, which represents data as a set of variables. Variables take values from specific sets on which a partial order can be defined. As customary, we distinguish between the data model, namely the intensional level, from a specific instance of data, i.e., the extensional level.

Definition 5 (Data model)

A data model is a tuple where:

  • is a possibly infinite set of variables;

  • is a possibly infinite set of domains (not necessarily disjoint);

  • is a total and surjective function which associates to each variable its domain ;

  • ord is a partial function that, given a domain , if is defined, then it returns a partial order (reflexive, antisymmetric and transitive) .

A data model for the loan example is , , , , , with and being total ordered by the natural ordering in .

An actual instance of a data model is simply a partial function associating values to variables.

Definition 6 (Assignment)

Let be a data model. An assignment for variables in is a partial function such that for each , if is defined, i.e., where is the image of , then we have .

We now define our boolean query language, which notably allows for equality and comparison. As will become clearer in Section 0.B.2, queries are used as guards, i.e., preconditions for the execution of transitions.

Definition 7 (Query language - syntax)

Given a data model, the language is the set of formulas inductively defined according to the following grammar:

where and .

Examples of queries of the loan scenarios are or . Given a formula and an assignment , we write for the formula where each occurrence of variable is replaced by .

Definition 8 (Query language - semantics)

Given a data model , an assignment and a query we say that satisfies , written inductively on the structure of as follows:

  • ;

  • iff ;

  • iff and ;

  • iff for some and is defined and ;

  • iff it is not the case that ;

  • iff and .

Intuitively, def can be used to check if a variable has an associated value or not (recall that assignment is a partial function); equality has the intended meaning and evaluates to true iff and are values belonging to the same domain , such a domain is ordered by a partial order and is actually less or equal than according to .

3.2 Data-aware net

We now combine the data model with a WF-net and formally define how transitions are guarded by queries and how they update/delete data. The result is a Data-AWare net (DAW-net) that incorporates aspects (i)–(iii) described at the beginning of Section 3.

Definition 9 (DAW-net)

A DAW-net is a tuple  gd where:

  • is a WF-net;

  • is a data model;

  • , where , and for each , is a function that associates each transition to a (partial) function mapping variables to a finite subset of their domain.

  • is a function that associates a guard to each transition.

Function gd associates a guard, namely a query, to each transition. The intuitive semantics is that a transition can fire if its guard evaluates to true (given the current assignment of values to data). Examples are and . Function wr is instead used to express how a transition modifies data: after the firing of , each variable can take any value among a specific finite subset of . We have three different cases:

  • : nondeterministically assigns a value from to ;

  • : deletes the value of (hence making undefined);

  • : value of is not modified by .

Notice that by allowing in the first bullet above we enable the specification of restrictions for specific tasks. E.g., says that writes the variable and intuitively that students can request a maximum loan of 30k, while says that workers can request up to 500k.

The intuitive semantics of gd and wr is formalized next. We start from the definition of DAW-net state, which includes both the state of the WF-net, namely its marking, and the state of data, namely the assignment. We then extend the notions of state transition and valid firing.

Definition 10 (DAW-net state)

A state of a DAW-net is a pair where is a marking for and is an assignment for .

Definition 11 (DAW-net Valid Firing)

Given a DAW-net , a firing of a transition is a valid firing from to , written as , iff conditions 1. and 2. of Def. 3 holds for and , i.e., it is a WF-Net valid firing, and

  1. ,

  2. assignment is such that, if , :

    • its domain ;

    • for each :

Condition 1. and 2. extend the notion of valid firing of WF-nets imposing additional pre- and postconditions on data, i.e., preconditions on and postconditions on . Specifically, 1. says that for a transition to be fired its guard must be satisfied by the current assignment . Condition 2. constrains the new state of data: the domain of is defined as the union of the domain of with variables that are written (wr), minus the set of variables that must be deleted (del). Variables in can indeed be grouped in three sets depending on the effects of : (i) : variables whose value is unchanged after ; (ii) : variables that were undefined but have a value after ; and (iii) : variables that did have a value and are updated with a new one after . The final part of condition 2. says that each variable in takes a value in , while variables in old maintain the old value .

A case of a DAW-net is defined as a case of a WF-net, with the only difference that the assignment of the initial state is empty, i.e., .

4 Trace repair as reachability

In this section we provide the intuition behind our technique for solving the trace repair problem via reachability. Full details and proofs are contained in Appendices 0.A0.D.

A trace is a sequence of observed events, each with a payload including the transition it refers to and its effects on the data, i.e., the variables updated by its execution. Intuitively, a DAW-net case is compliant w.r.t. a trace if it contains all the occurrences of the transitions observed in the trace (with the corresponding variable updates) in the right order.

As a first step, we assume without loss of generality that DAW-net models start with a special transition and terminate with a special transition . Every process can be reduced to such a structure as informally illustrated in the left hand side of Figure 4 by arrows labeled with (1). Note that this change would not modify the behavior of the net: any sequence of firing valid for the original net can be extended by the firing of the additional transitions and vice versa.

Figure 4: Outline of the trace “injection”

Next, we illustrate the main idea behind our approach by means of the right hand side of Figure 4: we consider the observed events as transitions (in red) and we suitably “inject” them in the original DAW-net. By doing so, we obtain a new model where, intuitively, tokens are forced to activate the red transitions of DAW-net, when events are observed in the trace. When, instead, there is no red counterpart, i.e., there is missing information in the trace, the tokens move in the black part of the model. The objective is then to perform reachability for the final marking (i.e., to have one token in the place and all other places empty) over such a new model in order to obtain all and only the possible repairs for the partial trace.

More precisely, for each event with a payload including transition and some effect on variables we introduce a new transition in the model such that:

  • is placed in parallel with the original transition ;

  • includes an additional input place connected to the preceding event and an additional output place which connects it to the next event;

  • and

  • specifies exactly the variables and the corresponding values updated by the event, i.e. if the event set the value of to , then ; if the event deletes the variable , then .

Given a trace and a DAW-net , it is easy to see that the resulting trace workflow (indicated as ) is a strict extension of (only new nodes are introduced) and, since all newly introduced nodes are in a path connecting the start and sink places, it is a DAW-net, whenever the original one is a DAW-net net.

We now prove the soundness and completeness of the approach by showing that: (1) all cases of are compliant with ; (2) each case of is also a case of and (3) if there is a case of compliant with , then that is also a case for .

Property (1) is ensured by construction. For (2) and (3) we need to relate cases from to the original DAW-net . We indeed introduce a projection function that maps elements from cases of the enriched DAW-net to cases of elements from the original DAW-net. Essentially, maps newly introduced transitions to the corresponding transitions in event , i.e., , and also projects away the new places in the markings. Given that the structure of is essentially the same as that of with additional copies of transitions that are already in , it is not surprising that any case for can be replayed on by mapping the new transitions into the original ones , as shown by the following:

Lemma 1

If is a case of then is a case of .

This lemma proves that whenever we find a case on , then it is an example of a case on that is compliant with , i.e., (2). However, to reduce the original problem to reachability on DAW-net, we need to prove that all the cases compliant with can be replayed on , that is, (3). In order to do that, we can build a case for starting from the compliant case for , by substituting the occurrences of firings corresponding to events in with the newly introduced transitions. The above results pave the way to the following:

Theorem 4.1

Let be a DAW-net and a trace; then characterises all and only the cases of compatible with . That is

  • if is a case of containing then is compatible with ; and

  • if is a case of compatible with , then there is a case of s.t. .

Theorem 4.1 provides the main result of this section and is the basis for the reduction of the trace repair for and to the reachability problem for . In fact, by enumerating all the cases of reaching the final marking (i.e. a token in ) we can provide all possible repairs for the partial observed trace. Moreover, the transformation generating is preserving the safeness properties of the original workflow:

Lemma 2

Let be a DAW-net and a trace of . If is -safe then is -safe as well.

This is essential to guarantee the decidability of the reasoning techniques described in the next section.

5 Reachability as a planning problem

In this section we exploit the similarity between workflows and planning domains in order to describe the evolution of a DAW-net by means of a planning language. Once the original workflow behaviour has been encoded into an equivalent planning domain, we can use the automatic derivation of plans with specific properties to solve the reachability problem. In our approach we introduce a new action for each transition (to ease the description we will use the same names) and represent the status of the workflow – marking and variable assignments – by means of fluents. Although their representation as dynamic rules is conceptually similar we will separate the description of the encoding by considering first the behavioural part (the WF-net) and then the encoding of data (variable assignments and guards).

5.1 Encoding DAW-net behaviour

Since we focus on 1-safe WF-nets the representation of markings is simplified by the fact that each place can either contain 1 token or no tokens at all. This information can be represented introducing a propositional fluent for each place, true iff the corresponding place holds a token. Let us consider the safe WF-net component of a DAW-net system. The declaration part of the planning domain will include:

  • a fluent declaration for each place ;

  • an action declaration for each task .

Since each transition can be fired333Guards will be introduced in the next section. only if each input place contains a token, then the corresponding action can be executed when place fluents are true: for each task , given , we include the executability condition:

    executable  if .

As valid firings are sequential, namely only one transition can be fired at each step, we disable concurrency in the planning domain introducing the following rule for each pair of tasks 444For efficiency reasons we can relax this constraint by disabling concurrency only for transitions sharing places or updating the same variables. This would provide shorter plans.

    caused false after , .

Transitions transfer tokens from input to output places. Thus the corresponding actions must clear the input places and set the output places to true. This is enforced by including

    caused - after .   caused - after .
    caused  after .   caused  after .

for each task and , . Finally, place fluents should be inertial since they preserve their value unless modified by an action. This is enforced by adding for each

    caused p if not - after .

Planning problem. Besides the domain described above, a planning problem includes an initial state, and a goal. In the initial state the only place with a token is the source:

    initially: .

The formulation of the goal depends on the actual instance of the reachability problem we need to solve. The goal corresponding to the state in which the only place with a token is is written as:

    goal: , not , , not ?

where .

5.2 Encoding data

For each variable we introduce a fluent unary predicate holding the value of that variable. Clearly, predicates must be functional and have no positive instantiation for undefined variables.

We also introduce auxiliary fluents to facilitate the writing of the rules. Fluent indicates whether the variable is not undefined – it is used both in tests and to enforce models where the variable is assigned/unassigned. The fluent is used to inhibit inertia for the variable when its value is updated because of the execution of an action.

DAW-net includes the specification of the set of values that each transition can write on a variable. This information is static, therefore it is included in the background knowledge by means of a set of unary predicates as a set of facts:

    (e).

for each , , and .

Constraints on variables. For each variable :

  • we impose functionality

        caused false if (X), (Y), X != Y.
  • we force its value to propagate to the next state unless it is modified by an action ()

        caused (X) if not -(X), not 
                             after (X).
  • the defined fluent is the projection of the argument

        caused  if (X).

Variable updates. The value of a variable is updated by means of causation rules that depend on the transition that operates on the variable, and depends on the value of . For each in the domain of :

  • : delete (undefine) a variable

        caused false if  after t.
        caused  after t.
  • : set with a value nondeterministically chosen among a set of elements from its domain

        caused (V) if (V), not -(V) after t.
        caused -(V) if (V), not (V) after t.
        caused false if not  after t.
        caused  after t.

    If contains a single element , then the assignment is deterministic and the first three rules above can be substituted with555The deterministic version is a specific case of the non-deterministic ones and equivalent in the case that there is a single fact.

        caused (d) after t.

Guards. To each subformula of transition guards is associated a fluent that is true when the corresponding formula is satisfied. To simplify the notation, for any transition , we will use to indicate the fluent . Executability of transitions is conditioned to the satisfiability of their guards; instead of modifying the executability rule including the among the preconditions, we use a constraint rule preventing executions of the action whenever its guard is not satisfied:

    caused false after t, not .

Translation of atoms () is defined in terms of predicates. For instance corresponds to (V), (W), V == W. That is for , and for . For each subformula of transition guards a static rule is included to “define” the fluent :

: caused  if true .
: caused  if  .
: caused  if (,T1), (,T2), T1 == T2 .
: caused  if (,T1), (,T2), ord(T1,T2) .
: caused  if not  .
: caused  if  .

5.3 Correctness and completeness

We provide a sketch of the correctness and completeness of the encoding. Proofs can be found in [4].

Planning states include all the information to reconstruct the original DAW-net states. In fact, we can define a function mapping consistent planning states into DAW-net states as following: with

is well defined because it cannot be the case that with , otherwise the static rule

    caused false if (X), (Y), X != Y.

would not be satisfied. Moreover, 1-safeness implies that we can restrict to markings with range in . By looking at the static rules we can observe that those defining the predicates and are stratified. Therefore their truth assignment depends only on the extension of predicates. This implies that fluents are satisfied iff the variables assignment satisfies the corresponding guard . Based on these observations, the correctness of the encoding is relatively straightforward since we need to show that a legal transition in the planning domain can be mapped to a valid firing. This is proved by inspecting the dynamic rules.

Lemma 3 (Correctness)

Let be a DAW-net and the corresponding planning problem. If is a legal transition in , then is a valid firing of .

The proof of completeness is more complex because – given a valid firing – we need to build a new planning state and show that it is minimal w.r.t. the transition. Since the starting state of does not require minimality we just need to show its existence, while must be carefully defined on the basis of the rules in the planning domain.

Lemma 4 (Completeness)

Let be a DAW-net, the corresponding planning problem and be a valid firing of . Then for each consistent state s.t.  there is a consistent state s.t.  and is a legal transition in .

Lemmata 13 and 12 provide the basis for the inductive proof of the following theorem:

Theorem 5.1

Let be a safe WF-net and the corresponding planning problem. Let be the initial state of – i.e. with a single token in the source and no assignments – and the planning state satisfying the initial condition.

For any case in

there is a trajectory in

such that for each and viceversa.

For each trajectory

in , the following sequence of firings is a case of

Theorem 0.D.1 above enables the exploitation of planning techniques to solve the reachability problem in DAW-net. Indeed, to verify whether the final marking is reachable it is sufficient to encode it as a condition for the final state and verify the existence of a trajectory terminating in a state where the condition is satisfied. Decidability of the planning problem is guaranteed by the fact that domains are effectively finite, as in Definition 9 the wr functions range over a finite subset of the domain.

6 Related Work and Conclusions

The key role of data in the context of business processes has been recently recognized. A number of variants of PNs have been enriched so as to make tokens able to carry data and transitions aware of the data, as in the case of Workflow nets enriched with data [18, 12], the model adopted by the business process community. In detail, Workflow Net transitions are enriched with information about data (e.g., a variable ) and about how it is used by the activity (for reading or writing purposes). Nevertheless, these nets do not consider data values (e.g., in the example of Section 2.2 we would not be aware of the values of the variable that T4 is enabled to write). They only allow for the identification of whether the value of the data element is defined or undefined, thus limiting the reasoning capabilities that can be provided on top of them. For instance, in the example of Section 2.2, we would not be able to discriminate between the worker and the student loan for the trace in (2.2), as we would only be aware that is defined after T4.

The problem of incomplete traces has been investigated in a number of works of trace alignment in the field of process mining, where it still represents one of the challenges. Several works have addressed the problem of aligning event logs and procedural models, without [3] and with [13, 12] data. All these works, however, explore the search space of possible moves in order to find the best one aligning the log and the model. Differently from them, in this work (i) we assume that the model is correct and we focus on the repair of incomplete execution traces; (ii) we want to exploit state-of-the-art planning techniques to reason on control and data flow rather than solving an optimisation problem.

We can overall divide the approaches facing the problem of reconstructing flows of model activities given a partial set of information in two groups: quantitative and qualitative. The former rely on the availability of a probabilistic model of execution and knowledge. For example, in [17]

, the authors exploit stochastic PNs and Bayesian Networks to recover missing information (activities and their durations). The latter stand on the idea of describing “possible outcomes” regardless of likelihood; hence, knowledge about the world will consist of equally likely “alternative worlds” given the available observations in time, as in this work. For example, in 

[5] the same issue of reconstructing missing information has been tackled by reformulating it in terms of a Satisfiability(SAT) problem rather than as a planning problem.

Planning techniques have already been used in the context of business processes, e.g., for verifying process constraints [16] or for the construction and adaptation of autonomous process models [19, 15]. In [7] automated planning techniques have been applied for aligning execution traces and declarative models. As in this work, in [8], planning techniques have been used for addressing the problem of incomplete execution traces with respect to procedural models. However, differently from the two approaches above, this work uses for the first time planning techniques to target the problem of completing incomplete execution traces with respect to a procedural model that also takes into account data and the value they can assume.

Despite this work mainly focuses on the problem of trace completion, the proposed automated planning approach can easily exploit reachability for model satisfiability and trace compliance and furthermore can be easily extended also for aligning data-aware procedural models and execution traces. Moreover, the presented encoding in the planning language , can be directly adapted to other action languages with an expressiveness comparable to  [14]. In the future, we would like to explore these extensions and implement the proposed approach and its variants in a prototype.

Appendix 0.A Preliminaries

0.a.1 Workflow Nets

Definition 12 (Petri Net [12])

A Petri Net is a triple where

  • is a set of places;

  • is a set of transitions;

  • is the flow relation describing the “arcs” between places and transitions (and between transitions and places).

The preset of a transition t is the set of its input places: . The postset of is the set of its output places: . Definitions of pre- and postsets of places are analogous.

The marking of a Petri net is a total mapping .

Definition 13 (WF-net [18])

A Petri net is a workflow net (WF-net) if it has a single source place start, a single sink place end, and every place and every transition is on a path from start to end; i.e. for all , and , where is the reflexive transitive closure of .

The semantics of a PN is defined in terms of its markings and valid firing of transitions which change the marking. A firing of a transition from to is valid – denoted by – iff:

  • is enabled in , i.e., ; and

  • the marking satisfies the property that for every :

A case of PN is a sequence of valid firings

where is the marking where there is a single token in the start place.

Definition 14 (safeness)

A marking of a Petri Net is -safe if the number of tokens in all places is at most . A Petri Net is -safe if the initial marking is -safe and the marking of all cases is -safe.

In this document we focus on 1-safeness, which is equivalent to the original safeness property as defined in [1].666In the following we will use safeness as a synonym of 1-safeness. Note that for safe nets the range of markings is restricted to .

0.a.2 Action Language

The formal definition of can be found in Appendix A of [9]; here, as reference, we include the main concepts.

We assume disjoint sets of action, fluent and type names, i.e., predicate symbols of arity , and disjoint sets of constant and variable symbols. Literals can be positive or negative atoms; denoted by . Given a set of literals , (respectively, ) is the set of positive (respectively, negative) literals in . A set of literals is consistent no atoms appear both positive and negated.

The set of all action (respectively, fluent, type) literals is denoted as (respectively, , ).

Furthermore, , , and .

Definition 15 (Causation rule)

A (causation) rule is an expression of the form

caused  if , not , , not 
         after , not , , not .

were , , , and .

If the rule is called static.

We define , , , ,

Definition 16 (Initial state constraints)

An initial state constraint is a static rule preceded by the keyword initially.

Definition 17 (Executability condition)

An executability condition e is an expression of the form

executable  if , not , , not .

were , , and .

We define , , and

Since in this document we’re dealing with ground plans, for the definition of typed instantiation the reader is referred to the original paper.

Definition 18 (Planning domain, [9] Def. A.5)

An action description consists of a finite set of action and fluent declarations and a finite set of safe causation rules, safe initial state constraints, and safe executability conditions. A planning domain is a pair , where is a stratified Datalog program (the background knowledge) which is safe, and is an action description. We call positive, if no default negation occurs in AD.

The set contains all the literals appearing in PD.

Definition 19 (State, State transition)

A state w.r.t. a planning domain PD is any consistent set of legal fluent instances and their negations. A state transition is any tuple where are states and is a set of legal action instances in PD.

Semantics of plans including default negation is defined by means of a Gelfond–Lifschitz type reduction to a positive planning domain.

Definition 20

Let PD be a ground and well-typed planning domain, and let be a state transition. Then, the reduction of PD by is the planning domain where the set of rules of PD is substituted by obtained by deleting

  1. each ,where either or ,and

  2. all default literals not  () from the remaining .

Definition 21 (Legal initial state, executable action set, legal state transition)

For any planning domain

  • a state is a legal initial state, if is the least set s.t. for all static and initial rules implies ;

  • a set is an executable action set w.r.t. a state , if for each there is an executability condition s.t. , , and ;

  • a state transition is legal if is an executable action set w.r.t. , and is the minimal consistent set that satisfies all causation rules in w.r.t. . A causation rule , is satisfied if the three conditions

    all hold, then and .

Definition 22 (Trajectory)

A sequence of state transitions

, , is a trajectory for PD, if is a legal initial state of PD and all , , are legal state transitions of PD.

If , then the trajectory is empty.

Definition 23 (Planning problem)

A planning problem is a pair of planning domain PD and a ground goal

, not , , not .

where and .

A state satisfies the goal if and .

Definition 24 (Optimistic plan)

A sequence of action sets is an optimistic plan for a planning problem if there is a trajectory establishing the goal , i.e.  satisfies .

Definition 25 (Secure plan)

An optimistic plan is secure if for every legal initial state <