DeepAI
Log In Sign Up

ASP-Based Declarative Process Mining

We put forward Answer Set Programming (ASP) as a solution approach for three classical problems in Declarative Process Mining: Log Generation, Query Checking, and Conformance Checking. These problems correspond to different ways of analyzing business processes under execution, starting from sequences of recorded events, a.k.a. event logs. We tackle them in their data-aware variant, i.e., by considering events that carry a payload (set of attribute-value pairs), in addition to the performed activity, specifying processes declaratively with an extension of linear-time temporal logic over finite traces (LTLf). The data-aware setting is significantly more challenging than the control-flow one: Query Checking is still open, while the existing approaches for the other two problems do not scale well. The contributions of the work include an ASP encoding schema for the three problems, their solution, and experiments showing the feasibility of the approach.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

04/26/2018

Temporal Answer Set Programming on Finite Traces

In this paper, we introduce an alternative approach to Temporal Answer S...
09/17/2021

SaCoFa: Semantics-aware Control-flow Anonymization for Process Mining

Privacy-preserving process mining enables the analysis of business proce...
06/15/2022

Conformance Checking with Uncertainty via SMT (Extended Version)

Logs of real-life processes often feature uncertainty pertaining the rec...
08/17/2020

Temporal Conformance Checking at Runtime based on Time-infused Process Models

Conformance checking quantifies the deviations between a set of traces i...
07/18/2020

An Entropic Relevance Measure for Stochastic Conformance Checking in Process Mining

Given an event log as a collection of recorded real-world process traces...
10/21/2020

Conformance Checking for a Medical Training Process Using Petri net Simulation and Sequence Alignment

Process Mining has recently gained popularity in healthcare due to its p...
01/23/2022

Online Soft Conformance Checking: Any Perspective Can Indicate Deviations

Within process mining, a relevant activity is conformance checking. Such...

Introduction

Process Mining (PM) for Business Process Management (BPM) is a research area aimed at discovering common patters in Business Processes (BP) van der Aalst (2016). The analysis starts from event logs, i.e., sets of traces that record the events associated with process instance executions, and typically assumes a process model, which may be taken as input, manipulated as an intermediate structure, or produced in output. Events describe process activities at different levels of details. In the simplest perspective, here referred to as control-flow-only, events model atomic activity performed by a process instance at some time point; in the most complex scenario, typically referred to as multi-perspective, events also carry a payload including a timestamp and activity data.

Processes can be specified prescriptively, i.e., as models, such as Petri Nets, that generate traces, or declaratively, i.e., through logical formulas representing the constraints that traces must satisfy in order to comply with the process. This is the approach we adopt here. Specifically, we take a (untimed) data-aware perspective where events include the activity and set of attribute-value pairs, the payload.

In Declarative PM, the de-facto standard for expressing process properties is Declare van der Aalst et al. (2009), a temporal logic consisting in a set of template formulas of the Linear-time Temporal Logic over finite traces (ltlDe Giacomo and Vardi (2013); here, we use a strictly more expressive extension, which we call local ltl, i.e., l-ltl. This logic features a simple automata-based machinery that facilitates its manipulation, while retaining the ability to express virtually all the properties of interest in declarative PM. Specifically, l-ltl subsumes Declare and even its multi-perspective variant MP-Declare Burattin et al. (2016) without timestamps and correlation conditions (i.e., conditions that relate the attributes of some event to those of other events), but does not subsume full MP-Declare. Observe that since MP-Declare does not subsume l-ltl either, the two logics are incomparable.

Our goal is to devise techniques for three classical problems in Declarative PM: Event Log Generation Skydanienko et al. (2018), i.e., generating an event log consistent with a declarative model; Query Checking Räim et al. (2014), i.e., discovering hidden temporal properties in an event log; and Conformance Checking Burattin et al. (2016), i.e., checking whether the traces of an event log conform to a process model. The main challenge is dealing with data: in the data-aware framework, Query Checking is still open, while existing tools for the other two problems do not scale well.

We put forward Answer Set Programming (ASP Niemelä (1999)) as an effective solution approach. ASP natively provides data-manipulation functionalities which allow for formalizing data-aware constraints and has experienced over the last years a dramatic improvement in solution performance, thus results in a natural and promising candidate approach for addressing the problems of our interest. We show how such problems can be conveniently modeled as ASP programs, thus solved with any solver. Using the state-of-the-art solver Clingo111potassco.org Gebser et al. (2019), we experimentally compare our approach against existing ones for Log Generation and Conformance Checking, and show effectiveness of the approach for Query Checking in a data-aware setting. Besides providing an actual solution technique, ASP facilitates reuse of specifications: the ASP encodings we propose here, indeed, differ in very few, although important, details.

Previous related work include Wieczorek et al. (2020), where ASP is used to infer a finite-state automaton that accepts (resp. rejects) traces from a positive (negative) input set. This can be seen as a form of Declarative Process Discovery, i.e., the problem of obtaining a (declarative) process specification, which is complementary to the problems we address here. Our approach is similar, in that we use automata to model temporal properties. However, we propose a different automata encoding and show the effectiveness of the approach on three different problems. Another related paper is Heljanko and Niemelä (2003), which shows how ASP can be used to check a Petri Net against an LTL specification, up to a bounded time horizon. Differently from our work, it: (i) deals with LTL over infinite, as opposed to finite, runs; (ii) adopts a prescriptive, as opposed to declarative, approach; and (iii) does not deal with data in events.

From a broader perspective, we finally observe that while we deal with a set of specific problems, the work paves the way for ASP to become a general effective approach to Declarative PM.

The Framework

An activity (signature) is an expression of the form , where is the activity name and each is an attribute name. We call the arity of . The attribute names of an activity are all distinct, but different activities may contain attributes with matching names. We assume a finite set of activities, all with distinct names; thus, activities can be identified by their name, instead of by the whole tuple. Every attribute (name) of an activity is associated with a type , i.e., the set of values that can be assigned to when the activity is executed. For simplicity, we assume that all domains are equipped with the standard relations . All results can be immediately adapted if some relations are absent in some domain.

An event is the execution of an activity (at some time) and is formally captured by an expression of the form , where is an activity name and . The properties of interest in this work concern (log) traces, formally defined as finite sequences of events , with . Traces model process executions, i.e., the sequences of activities performed by a process instance. A finite collection of executions into a set of traces is called an event log.

l-ltl

We adopt a declarative approach to process modeling, meaning that processes are specified through a set of constraints over their executions, i.e., over the traces they produce. The formal language we use to express properties of traces is a variant of the Linear-time Logic over finite traces, ltl De Giacomo and Vardi (2013), adapted to deal with the data attributes of activities. We call such variant Linear-time Logic with local conditions over finite traces, or local ltl for short, and denote it as l-ltl.

Given a finite set of activities , the formulas of l-ltl over are inductively defined as follows:

where: and are attribute names from some activity in , , for some , is an operator from , and is an activity name from . Formulas of the form , , , and are called atomic; formulas not containing the operators and are called event formulas.

The logic is interpreted over positions of finite traces. Formula holds at every position; checks whether activity occurs in the trace at the given position; (resp. ) compares the value assigned to attribute with that of attribute (resp. with that of value ), at the given position; boolean operators combine formulas as usual; the next operator checks whether holds in the suffix starting at the next position; finally, the until operator checks whether is satisfied at some position , and whether holds in all positions that precede , up to the given position.

Formally, we define by induction when a trace satisfies an l-ltl formula at position , written , as follows:

  • ;

  • iff ;

  • iff for and the signature of , it is the case that, for some and , , , and 222Notice that this requires compatibility between the domains and wrt relation .

  • iff for and the signature of , it is the case that, for some , and ;

  • iff and ;

  • iff ;

  • iff and ;

  • iff there exists s.t.  and for every , it is the case that .

Notice that while, in general, the satisfaction of an l-ltl formula at some position of depends on the whole trace, and precisely on the suffix of starting at position , event formulas depend only on the event at .

As in standard ltl, denotes the strong next operator (which requires the existence of a next event where the inner formula is evaluated), while denotes the strong until operator (which requires the right-hand formula to eventually hold, forcing the left-hand formula to hold in all intermediate events). The following are standard abbreviations: ; ; (eventually ); and (globally, or always, ).

Through l-ltl one can express properties of process traces that involve not only the process control-flow but also the manipulated data.

Example 1

The l-ltl formula , a so-called Response constraint, says that whenever activity occurs, it must be eventually followed by activity . A possible data-aware variant of is the formula , which says that whenever activity occurs with attribute less than , it must be followed by activity .

Formulas of ltl, thus l-ltl, have the useful property of being fully characterized by finite-state, possibly nondeterministic, automata. Specifically, for every l-ltl formula there exists a finite-state automaton (FSA) that accepts all and only the traces that satisfy  De Giacomo and Vardi (2013). Such automata are standard FSA with transitions labelled by event formulas. For a fixed set of activities , let be the set of event formulas over . An FSA over a set of activities is a tuple , where:

  • is a finite set of states;

  • is the automaton initial state;

  • is the automaton transition relation;

  • is the set of automaton final states.

Without loss of generality, we assume that formulas labeling transitions are conjunctions of literals. It is immediate to show that every FSA can be rewritten in this way.

A run of an FSA on a trace (over ) is a sequence of states s.t. for all there exists a transition s.t. . A trace over is accepted by iff it induces a run of that ends in a final state. Notice that satisfaction of , this being an event formula, can be established by looking at each event at a time, while disregarding the rest of the trace; thus, in order to construct the induced run , one can proceed in an online fashion, as the next event arrives, by simply triggering, at every step, a transition outgoing from whose label is satisfied by the event.

Example 2

Consider again the formulas and shown above, and the (paramtric) automaton depicted in Fig 1.

Figure 1: Automaton for the Response constraint.

It is easy to see that for , , , , the resulting automaton accepts all and only the traces that satisfy , as well as that for , , , , the obtained automaton accepts all and only the traces that satisfy .

The details about the construction of from are not in the scope of this work, and we refer the interested reader to De Giacomo and Vardi (2013) for more information; we rely on the results therein. We observe that while the automaton construction is time-exponential in the worst-case, wrt the size of the input formula , tools exist, such as Lydia333github.com/whitemech/lydia/releases/tag/v0.1.1 De Giacomo and Favorito (2021), which exhibit efficient performance in practice; this, combined with the fact that the specifications of practical interest are typically small, makes the approaches based on automata construction usually feasible in practice. We can now formalize the problems addressed in this paper.

Event Log Generation.

Given a set of l-ltl formulas over a set of activities and a positive integer , return a trace over of length s.t., for every formula , it is the case that . In words, the problem amounts to producing a trace of length over that satisfies all the input constraints in . A more general version of the problem requires to generate a log of traces of fixed length satisfying the constraints. For simplicity, we consider the first formulation.

Query Checking.

Query Checking takes as input formulas from the extension of l-ltl with activity variables, defined as follows:

where symbols starting with “?” are activity variables and the other symbols are as in l-ltl (given ).

Given an l-ltl formula with activity variables, by assigning an activity (from ) to every variable, we obtain a “regular” l-ltl formula. Formally, for an l-ltl formula (over ), containing a (possibly empty) set of activity variables , an assignment to is a total function . Given and an assignment to its activity variables, denotes the (regular) l-ltl formula obtained by replacing, in , every variable symbol with an activity name from . Observe that if there exists only one assignment and . Given a trace , since is a regular l-ltl formula, we can check whether .

An instance of Query Checking consists in a log and an l-ltl formula with activity variables ; a solution is a set of assignments to s.t. for every assignment and every trace , it holds that .

In words, query checking requires to find a set of assignments , each transforming the input formula into an l-ltl formula satisfied by all the traces of the input log . Observe that variables can only span over activities.

Conformance Checking.

Given a trace and a set of l-ltl formulas, both over the same set of activities , check whether, for all formulas , . The problem can also be defined in a more general form, where is replaced by a log of traces over and the task requires to check whether for all the traces of and all , it holds that .

Answer Set Programming (ASP)

An ASP program consists in a set of rules which define predicates and impose relationships among them. The task of an ASP solver is that of finding a finite model of the program, i.e., an interpretation of the predicates that satisfies the program rules. ASP rules are written in a fragment of (function-free) First-order Logic (FOL) extended with a special negation-as-failure (NAF) operator (in addition to classical negation) which allows for distinguishing facts that are false from facts that are unknown. The presence of this operator, combined with the classical FOL negation, has a huge impact on the programs one can write and the way models are found. Here, we do not discuss these details, referring the interested reader to Gelfond and Lifschitz (1988); Niemelä (1999). For our purposes, it will be sufficient restricting to the class of rules with the NAF operator as the only available negation operator (that is, disallowing classical negation).

Syntax

The basic constructs of ASP programs are: 1. constants, identified by strings starting with a lower-case letter; 2. variables, identified by strings starting with an upper-case letter; 3. terms, i.e., constants or variables; 4. atoms, i.e., expressions of the form , where is a predicate, identified by a string, and each is a term. A predicate is said to have arity if it occurs in an expression of the form . An atom containing only constant terms is said to be ground.

ASP rules are obtained by combining the basic elements through boolean operators and the NAF operator. In this work, we use rules of the following form:

where and each are atoms , denotes the NAF operator, and every variable occurring in the rule also occurs in some atom . The left-hand side is called the rule’s head and is optional. When the head is absent, the rule is called an integrity constraint. The right-hand side is called the body and can be left empty, in which case the symbol is omitted and the rule is called a fact.

Semantics

Intuitively, a model of an ASP program is a set of ground atoms that satisfies all program rules. In general, many models exist. Among these, only those that are minimal wrt set inclusion and that contain a ground atom only “if needed”, i.e., if it occurs as the head of a ground rule, are taken as solutions, called the answer sets of in the ASP terminology. The task of an ASP solver is to compute such sets.

Given an ASP program and a rule , the set of ground instantiations is the set of rules obtained by replacing all the variables in with all the constants mentioned in (the so-called Herbrand universe of ), in all possible ways, so that all rules in contain only ground atoms. Then, the ground instantiation of a program is the union of all the ground instantiations of its rules, i.e., .

An interpretation of a program is a set of ground atoms , where is a predicate of arity occurring in and are constants from the Herbrand universe of . Given a positive (i.e., without occurrences of ) program , an interpretation is a model of if, for every ground rule, in , whenever for , it holds that . An answer set of is a model that is minimal wrt set inclusion.

The semantics of general programs is obtained as a reduction to positive programs. Namely, the reduct of a ground program wrt an interpretation is the positive ground program obtained by:

  • deleting all the rules

    of s.t.  for some ;

  • replacing all the remaining rules

    with .

Intuitively, the first transformation removes a rule, as already satisfied by ; the second transformation removes the so-called negative body of the rule, because it is satisfied. As it can be easily seen, the resulting program does not mention the operator. The interpretation is an answer set of if it is an answer set of .

In this work, we do not discuss the algorithms to compute the answer sets of a program, but focus on how the problems of our interest can be encoded in ASP and then solved by an ASP solver, in such a way that the returned Answer Sets represent the solution to our problems. This is the focus of the next section. For the experiments, we use the state-of-the-art solver Clingo.

ASP for Declarative Process Mining

We encode Log Generation, Conformance Checking, and Query Checking into ASP programs. For every l-ltl formula we deal with, we assume available the corresponding automaton . The three programs share some common parts, such as the automata and the traces, which are modeled through suitable predicates and ASP rules. Each encoding re-uses some of these parts, possibly customized, together with additional fragments used to model problem-specific features.

Activities are captured by the unary predicate , where is the activity name. In the presence of data, activity signatures are modeled by the binary predicate , where is the activity name and is the attribute name. Attributes may be typed by stating the set of values they can take, through predicate , where is the attribute name and one of its possible values. A trace is modeled by the binary predicate , where is the activity and the time point where it occurs. Time points come from predicate , which contains the values , for the trace length. The trace is defined on time points from to . In the presence of data, activity attributes are paired with values through predicate , where is the attribute name, the assigned value, and the time point . Notice that the association is based on the time point (exactly one activity is performed at one time point). Simple integrity constraints are used to ensure that the mentioned attributes belong in fact to the activity and that the sassigned value comes from the corresponding type.

Automata are encoded with predicates , , , and . The first and the second one model the initial state and the accepting states of the automaton, the third one models the existence of a transition from to under the event formula represented by integer , and the last one models satisfaction of (event) formula at time point . In the presence of multiple l-ltl formulas, each automaton is identified by a unique integer value and an additional parameter is added to the above predicates to refer to the various automata.

Example 3

The ASP encoding of the automaton for the ltl formula , shown in Fig. 1, for , , , , is as follows:

where and are activities and each formula () is identified by index in the encoding.

In a data-aware setting, conditions on data can be simply added to the rules for . For example the following rule:

expresses the fact that the event formula holds at time if activity occurs at time in the trace, with a value less than assigned to its attribute .

To capture satisfaction of an l-ltl formula by a trace , we model the execution of the automaton on . To this end, we introduce predicate , which expresses the fact that automaton (with index) is in state at time . Since the automaton is nondeterministic in general, it can be in many states at time point (except for the initial one). The rules defining are the following:

The first one says that at time point 0 every automaton is in its respective initial state. The second one says that the current state of automaton at time point is whenever the automaton is in state at previous time point , the automaton contains a transition from to under some event formula with index and the formula holds at time in the trace.

Finally, the fact that a trace is accepted by all automata, i.e., that the trace satisfies the corresponding formulas, is stated by requiring that, for each automaton, at least one of the final states be accepting ( denotes the length of the trace):

Next, we use these fragments to describe the ASP encodings for the problems of interest. For lack of space, we discuss only the main rules.

Event Log Generation

The encoding schema of Event Log Generation is as follows:

  1. Activities, attributes, attribute types, and trace length are provided as input and formalized as discussed above.

  2. For each input l-ltl constraint , the corresponding automaton is generated and modeled as discussed, using a unique integer value to identify it.

  3. Suitable integrity constraints are defined to ensure that: each time point in the trace has exactly one activity; every attribute is assigned exactly one value; and the attributes assigned at a given time point actually belong to the activity occurring at that time point.

  4. Finally, predicate is defined as above and it is required that every automaton ends up in at least one final state at the last time point.

  5. Predicates and contain the solution, i.e., they model a sequence of activities whose attributes have an assigned value, which satisfies all the input constraints.

Query Checking

The ASP specification of query checking is analogous to that of Log Generation except for the following. Firstly, the problem takes as input a set of fully specified traces. This is dealt with in a simple way, by adding a parameter to predicate representing the (unique) identifier of the trace and, consequently, by adding such parameter to all the predicates that depend on (e.g., , , ). Secondly, the input l-ltl formulas contain activity variables. To deal with them, additional predicates and are introduced to account for, respectively, variables and assignments of value to variable . Besides this, the automata associated with the formulas are obtained by treating activity variables as if they were activity symbols (without affecting the construction, which does not consider the semantics of such objects), thus obtaining automata whose transitions are labelled by event formulas, possibly containing activity variables instead of activity symbols. Such formulas become regular event formulas once values are assigned to variables and can thus be evaluated on the (events of the) input trace. Formally, this requires a slightly different definition of predicate , which must now take into account. To see how this is done, consider the formula The corresponding automaton is the same as that of Fig. 1, where , , , and . For formula , we have the following definition of predicate :

The parameter stands for the trace identifier, as discussed above. The above rule generalizes the corresponding one in Log Generation in the presence of activity variable . As it can be seen, in order to evaluate formula (second parameter in ) of automaton 1 (first parameter), such variable (modeled as ) must be instantiated first, through predicate . Observe that once all variables are assigned a value, the whole formula becomes variable-free, and the corresponding automaton is a regular automaton. The returned extensions of and represent, together, the problem solution.

Conformance Checking

Conformance Checking can be seen as a special case of Query Checking with a single input trace and where all input formulas are variable-free. In this case, the problem amounts to simply checking whether the whole specification is consistent, which is the case if and only if the input trace, together with the assignments to the respective activity attributes, satisfy the input formulas.

We close the section by observing how these problems provide a clear example of how the declarative approach allows for specification reuse. All the specifications, indeed, share the main rules (for trace, automaton, etc.) and are easily obtained as slight variants of each other, possibly varying the (guessed) predicates representing the solution.

Experiments

In this section, we provide both a comparison with state-of-the-art tools for Log Generation and Conformance Checking, based on multi-perspective declarative models, and an estimate of scalability for our query checking tool, for which, instead, no competitors exist. The state-of-the art tool used for Log Generation is the one presented in

Skydanienko et al. (2018), which is based on Alloy444https://alloytools.org/ and tailored for MP-Declare; our results show that our ASP implementation for Log Generation scales much better than that and, at the same time, supports a more expressive data-aware rule language. As to Conformance Checking, we considered the state-of-the-art tool Declare Analyzer Burattin et al. (2016); we obtained comparable execution times but Declare Analyzer is specifically tailored for Declare and optimized to check conformance wrt Declare rules only, while our tool is more general in this respect. The experiments have been carried out on a standard laptop Dell XPS 15 with an intel i7 processor and 16GB of RAM. All execution times have been averaged over 3 runs. Source code, declarative models and event logs used in the experiments are available at https://github.com/fracchiariello/process-mining-ASP.

Log Generation

# constr. 3 5 7 10
Trace len
10 595 614 622 654
15 876 894 904 956
20 1132 1155 1178 1250
25 1364 1413 1444 1543
30 1642 1701 1746 1874
10 249 270 289 340
15 349 390 408 457
20 436 496 538 601
25 519 568 611 712
30 622 666 726 837
10 35975 35786 36464 37688
15 50649 51534 54402 54749
20 69608 70342 73122 73222
25 85127 85598 87065 89210
30 101518 101882 106062 107520
10 18733 18947 19539 20007
15 25700 25723 27344 26897
20 32047 33837 33107 33615
25 39114 38666 40556 41055
30 46207 46706 47613 49410
Table 1: Log Generation (times in ms)
Model (80) BPI2012 DD ID PL PTC RP RT Sepsis
Trace len
10 656 100* 726* 3901 1183 119* 319 460
15 817 887 2865 4538 1820 1069 353 564
20 846 832 3160 4102 2194 813 860 640
25 1061 930 4129 6169 2889 1063 483 780
30 1433 1026 5226 9231 2370 1220 630 923
10 31935 2364* 30762* 59468 65783 2703* 24909 38241
15 43337 58572 152188 85942 97098 66641 34408 57178
20 57596 80665 237777 122511 146420 95005 44608 85808
25 72383 118975 359665 174596 221434 134851 54808 120110
30 86910 181027 563794 236697 330753 187972 63379 174838
Table 2: Log Generation, real life (times in ms)

For testing the Log Generation tools, we have used 8 synthetic models and 8 models derived from real life logs. The experiments with synthetic models allowed us to test scalability of the tools in a controlled environment and over models with specific characteristics. The experiments with real models have been used to test the tools in real environments. For the experiments with synthetic models, we built 8 reference models containing 3, 5, 7, and 10 constraints with and without data conditions. Each model was obtained from the previous one by adding new constraints and preserving those already present. Times are in ms.

The first and second blocks of Table 1 show the execution times for the ASP-based log generator, respectively with and without data conditions; the third and fourth blocks show the results obtained with the Alloy log generator, with and without data. Times refer to the generation of logs with 10000 traces (of length from 10 to 30). Consistent results are obtained also on additional experiments for logs of size between 100 and 5000, not reported here for space reasons.

The results obtained with models containing data conditions show that the ASP-based tool scales very well, requiring less than 2 sec in the worst case. This occurs when a model with 10 constraints is used to generate 10000 traces of length 30. As expected, the execution time increases linearly when the length of the traces in the generated logs increases. The number of constraints in the declarative model also affects the tool performance but with a lower impact.

Without data conditions the results are similar but, as expected, the execution time is lower and increases less quickly when the complexity of the model and of the generated log increases. In the worst case (a model with 10 constraints used to generate 10000 traces of length 30), the execution time is lower than 1 sec.

The results obtained with the Alloy-based tool show similar trends but with execution times almost 60 times higher than those obtained with the ASP-based tool.

The real life logs used in the experiments are taken from the collection available at https://data.4tu.nl/. We used the declarative process model discovery tool presented in Maggi et al. (2018) to extract a process model using the default settings. The models in the real cases are much more complex and include a number of constraints between 10 and 49 for a minimum support of 80. The execution times needed for the Log Generation task with the ASP-based log generator and with the Alloy-based tool are shown, respectively, in the first and second block of Table 2. An asterisk indicates that for the specific model it was not possible to generate 10000 unique traces. The complexity of real life models makes even more evident the significant advantage of using the ASP-based tool with respect to the Alloy-based one. In particular, in the worst case, the ASP-based tool requires around 9 sec (to generate 10000 traces of length 30 for log PL) while the Alloy-based generator almost 4 mins.

Conformance Checking

Tool ASP Declare Analyzer
Trace Len data no data data no data
10 665 635 598 110
15 1100 1035 805 145
20 1456 1354 1092 155
25 2071 1896 1273 177
30 2407 2219 1337 215
Table 3: Conformance Checking (times in ms)
BPI2012 DD ID PL PTC RP RT Sepsis
60 33426 13084 49969 78625 8412 9354 49501 7116
70 33242 13245 46388 55475 5596 9359 35537 3796
80 24482 10176 29969 33775 4699 6836 35483 1778
90 8445 4568 17576 26590 2787 5608 35483 731
60 2882 2771 9800 8521 1549 2122 15262 1194
70 2852 3249 7416 5358 959 2102 10351 705
80 2291 2103 3993 2677 755 1532 11285 318
90 1691 1525 1946 1595 404 1091 10628 250
Table 4: Conformance Checking (times in ms)

Also for Conformance Checking we used synthetic and real life datasets. The former include the same declarative models as those used for Log Generation, plus synthetic logs of 1000 traces of lengths from 10 to 30. Table 3 shows the execution times for the ASP-based tool, with and without data conditions, and for the Declare Analyzer tool for synthetic datasets (times in ms). The results show that in all cases the execution times increase when the model becomes larger and the traces in the log become longer. The execution times obtained with the ASP-based tool and the Declare Analyzer are comparable for data-aware constraints, while, model constraints do not contain data conditions, the Declare Analyzer is around 5 times faster. This might be due to the use of the #max aggregate to compute a trace’s length, which yields performance degradations. A possible solution could be computing the trace length in advance and then provide it in the ASP encoding as a fact.

In the real life experiments, we tested the Conformance Checking tools using models obtained with the discovery tool by varying the minimum support between 60 and 90. The minimum support indicates the minimum percentage of traces in which a constraint should be fulfilled to be added to the discovered model. Clearly, a higher minimum support implies that the discovered models contain less constraints. As expected (see Table 4), the execution times decrease when the minimum support used to discover the reference models increases in size. Also in this case, the Declare Analyzer (second block in Table 4) is faster. However, the ASP-based tool also scales well (first block in Table 4) requiring in the worst case around 1min.

Query Checking

Constraints Existence Responded Response Chain Absence Not Resp. Not Resp. Not Chain
Trace len Existence Response Existence Response
10 521 736 534 503 566 783 602 385
15 704 1113 801 788 784 1180 879 606
20 1321 1675 1143 1128 1373 1821 1304 865
25 1397 3218 1528 1561 1562 2823 1807 1104
30 1674 2878 1824 1906 1905 2784 2028 1301
10 399 658 541 632 441 799 806 772
15 616 1183 824 1057 595 1319 1121 1182
20 903 1778 1339 1550 874 1887 2127 2062
25 1188 2381 1724 2036 1101 3246 3200 2486
30 1461 3278 2066 2632 1333 3391 2766 2846
Table 5: ASP Query Checking (times in ms)

Since for Query Checking no competitor exists in the PM literature, we ran a set of controlled experiments to check how execution times vary under different conditions. We used the same synthetic logs used for Conformance Checking and tested 8 queries corresponding to 8 standard Declare templates, with and without data conditions. The results are shown in Table 5 (with and without data in the first and second block respectively). The execution times are comparable for different types of queries and the presence of data does not affect performance. In addition, as expected, the execution times increase when the traces in the log become longer.

Conclusions

We have devised an ASP-based approach to solve three classical problems from Declarative PM, namely Log Generation, Query Checking and Conformance Checking, in a data-aware setting. Our results include correct ASP-encoding schemata and an experimental evaluation against other approaches. The experimental results show that, for Log Generation, our approach drastically outperforms the state-of-the-art tool from PM. Time performance are slightly worse wrt to the existing ad-hoc Conformance Checker Declare Analyzer, which is optimized for Declare. As to Query Checking, our approach provides the first solution in a data-aware setting, a problem still open so far. We believe that, by showing how the selected problems can be encoded and solved in ASP, we are not only offering a solution technique but, more in general, we are putting forward ASP an effective modeling paradigm for Declarative PM in a data-aware setting. For future work, we plan to extend the approach to deal with actual, non-integer, timestamps in events and to go beyond local ltl by investigating the introduction of across-state quantification to relate the values assigned to attributes at a given time point to those assigned at a different time point.

Acknowledgments

Work partly supported by the ERC Advanced Grant WhiteMech (No. 834228), the EU ICT-48 2020 project TAILOR (No. 952215), the Sapienza Project DRAPE, and the UNIBZ project CAT.

References

  • A. Burattin, F. M. Maggi, and A. Sperduti (2016) Conformance checking based on multi-perspective declarative process models. Expert Syst. Appl. 65, pp. 194–211. External Links: Link, Document Cited by: Introduction, Introduction, Experiments.
  • G. De Giacomo and M. Favorito (2021) Compositional approach to translate ltlf/ldlf into deterministic finite automata. In Proceedings of the Thirty-First International Conference on Automated Planning and Scheduling, ICAPS 2021, Guangzhou, China (virtual), August 2-13, 2021, S. Biundo, M. Do, R. Goldman, M. Katz, Q. Yang, and H. H. Zhuo (Eds.), pp. 122–130. External Links: Link Cited by: l-ltl.
  • G. De Giacomo and M. Y. Vardi (2013) Linear temporal logic and linear dynamic logic on finite traces. In

    IJCAI 2013, Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China, August 3-9, 2013

    , F. Rossi (Ed.),
    pp. 854–860. External Links: Link Cited by: Introduction, l-ltl, l-ltl, l-ltl.
  • M. Gebser, R. Kaminski, B. Kaufmann, and T. Schaub (2019) Multi-shot ASP solving with clingo. Theory Pract. Log. Program. 19 (1), pp. 27–82. External Links: Link, Document Cited by: Introduction.
  • M. Gelfond and V. Lifschitz (1988)

    The stable model semantics for logic programming

    .
    In Proceedings of International Logic Programming Conference and Symposium, R. Kowalski, Bowen, and Kenneth (Eds.), pp. 1070–1080. External Links: Link Cited by: Answer Set Programming (ASP).
  • K. Heljanko and I. Niemelä (2003) Bounded LTL model checking with stable models. Theory Pract. Log. Program. 3 (4-5), pp. 519–550. External Links: Link, Document Cited by: Introduction.
  • F. M. Maggi, C. D. Ciccio, C. D. Francescomarino, and T. Kala (2018) Parallel algorithms for the automated discovery of declarative process models. Inf. Syst. 74 (Part), pp. 136–152. Cited by: Log Generation.
  • I. Niemelä (1999) Logic programs with stable model semantics as a constraint programming paradigm. Ann. Math. Artif. Intell. 25 (3-4), pp. 241–273. External Links: Link, Document Cited by: Introduction, Answer Set Programming (ASP).
  • M. Räim, C. D. Ciccio, F. M. Maggi, M. Mecella, and J. Mendling (2014) Log-based understanding of business processes through temporal logic query checking. In On the Move to Meaningful Internet Systems: OTM 2014 Conferences - Confederated International Conferences: CoopIS, and ODBASE 2014, Amantea, Italy, October 27-31, 2014, Proceedings, pp. 75–92. Cited by: Introduction.
  • V. Skydanienko, C. D. Francescomarino, C. Ghidini, and F. M. Maggi (2018) A tool for generating event logs from multi-perspective declare models. In Proceedings of the Dissertation Award, Demonstration, and Industrial Track at BPM 2018 co-located with 16th International Conference on Business Process Management (BPM 2018), Sydney, Australia, September 9-14, 2018, W. M. P. van der Aalst, F. Casati, A. Kumar, J. Mendling, S. Nepal, B. T. Pentland, and B. Weber (Eds.), CEUR Workshop Proceedings, Vol. 2196, pp. 111–115. External Links: Link Cited by: Introduction, Experiments.
  • W. M. P. van der Aalst, M. Pesic, and H. Schonenberg (2009) Declarative workflows: balancing between flexibility and support. Comput. Sci. Res. Dev. 23 (2), pp. 99–113. External Links: Link, Document Cited by: Introduction.
  • W. M. P. van der Aalst (2016)

    Process mining - data science in action, second edition

    .
    Springer. External Links: Link, Document, ISBN 978-3-662-49850-7 Cited by: Introduction.
  • W. Wieczorek, T. Jastrzab, and O. Unold (2020) Answer set programming for regular inference. Applied Sciences 10 (21), pp. 7700. Cited by: Introduction.