DeepAI
Log In Sign Up

A Monitoring and Discovery Approach for Declarative Processes Based on Streams

08/10/2022
by   Andrea Burattin, et al.
0

Process discovery is a family of techniques that helps to comprehend processes from their data footprints. Yet, as processes change over time so should their corresponding models, and failure to do so will lead to models that under- or over-approximate behavior. We present a discovery algorithm that extracts declarative processes as Dynamic Condition Response (DCR) graphs from event streams. Streams are monitored to generate temporal representations of the process, later processed to generate declarative models. We validated the technique via quantitative and qualitative evaluations. For the quantitative evaluation, we adopted an extended Jaccard similarity measure to account for process change in a declarative setting. For the qualitative evaluation, we showcase how changes identified by the technique correspond to real changes in an existing process. The technique and the data used for testing are available online.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

04/25/2017

Event Stream-Based Process Discovery using Abstract Representations

The aim of process discovery, originating from the area of process minin...
07/09/2019

A Conformance Checking-based Approach for Drift Detection in Business Processes

Real life business processes change over time, in both planned and unexp...
12/02/2022

Agent Miner: An Algorithm for Discovering Agent Systems from Event Data

The problem of process discovery in process mining studies ways to const...
07/12/2017

Process Monitoring on Sequences of System Call Count Vectors

We introduce a methodology for efficient monitoring of processes running...
11/29/2019

Prototype Selection Based on Clustering and Conformance Metrics for Model Discovery

Process discovery aims at automatically creating process models on the b...
06/08/2018

An Integrated Framework for Process Discovery Algorithm Evaluation

Process mining offers techniques to exploit event data by providing insi...
06/23/2021

Online Updating Statistics for Heterogenous Updating Regressions via Homogenization Techniques

Under the environment of big data streams, it is a common situation wher...

1 Introduction

The only constant aspect of processes is their change. Either because of internal organization restructuring or because of variables external to the organization, processes are required to adapt quickly to achieve required outcomes. The COVID-19 pandemic showed us how organizations needed to move from physical work to hybrid or remote production facilities, forcing them to abandon optimized routes toward new flows. In administrative processes, each regulation change will require municipal governments to adapt their processes to preserve compliance. In Denmark, the laws determining the guidelines for case management in the social sector had 4,686 changes between 2009 and 2020 [14].

Process mining approaches promise that given enough data, a control-flow discovery technique will generate a model that is as close to reality as possible. This evidence-based approach has a caveat: one needs to assume that the observations that were used as inputs belong to the same process. Not taking into consideration change might end in under- or over-constrained processes that do not represent the reality of the process. The second assumption is that it is possible to identify complete traces from the event log. This requirement indeed presents considerable obstacles in the organizations in which processes are constantly happening and evolving, either because the starting events are located in legacy systems no longer in use, or because current traces have not finished yet.

Accounting for change is particularly important in declarative processes. Based on a “outside-in” approach, declarative processes describe the minimal set of rules that generate accepting traces. To achieve this goal, declarative models place constraints between activities such that they restrict or enforce only compliant behavior. For process mining, the simplicity of declarative processes has been demonstrated to fit well with real process executions, and declarative miners are at the moment the most precise miners in use

111See https://icpmconference.org/2021/process-discovery-contest/.. However, little research exists regarding how declarative miners are sensitive to process change.

The objective of this paper is to study how declarative miners can give accurate and timely views of incomplete traces (so-called event streams). We integrate techniques of streaming process mining to declarative modeling notations, in particular, DCR graphs [15]. While previous techniques of streaming conformance checking have been applied to other declarative languages (e.g.: Declare [26]), these languages are fundamentally different. Declare provides a predefined set of 18 constraint templates taking inspiration from [13] with an underlying semantics based on LTL formulae on finite traces [9]. Instead, DCR is based on a minimal set of 5 constraints, being able to capture regular and omega-regular languages [11]. The choice of DCR is not fortuitous: DCR is an industrial process language integrated into KMD Workzone, a case management solution used by 70% of central government institutions in Denmark [25]. For the metrics, we present a fast & syntax-driven model-to-model metric and show its suitability in quantifying model changes.

Event streams present challenges for process discovery. Streams are potentially infinite, making memory and time computation complexities major issues. Our technique optimizes these aspects by relying on intermediate representations that will be updated at runtime. Another aspect considered is extensibility: our techniques not only rely on the minimal set of DCR constraints but it can be extended to more complex workflow patterns that combine atomic constraints.

Figure 1: Contribution of the paper

Fig. 1 shows the paper’s contribution: a streaming mining component, capable of continuously generating DCR graphs from an event stream (here we use the plural graphs to indicate that the DCR model could evolve over time, to accommodate drifts in the model that might occur). Towards the long-term goal of a system capable of spotting changes in a detailed fashion, we will also sketch a simple model-to-model metric for DCR, which can be used to compare the results of the stream mining with a catalog or repository of processes. An implementation of our techniques is available in Java and it can be downloaded, together with all the tests and datasets222See https://github.com/beamline/discovery-dcr..

The rest of the paper is structured as follows: related works are presented in Section 2; background on streaming process discovery and DCR graphs is in Section 3. The streaming discovery is presented in Section 4 and the approach is validated in Section 5. Section 6 concludes the paper.

2 Related Work

This paper is the first work aiming at the discovery of DCR graphs from an event stream. In the literature, it is possible to find work referring to either offline discovery for DCR graphs or online discovery for Declare models. In the rest of the section, we will also discuss streaming process mining in general terms.

Offline process discovery techniques. The state of the art in the offline discovery of DCR is represented by the DisCoveR algorithm [3]. In their paper, the authors claim an accuracy of 96,2% with linear time complexity. The algorithm is an extension of the ParNek algorithm [24]

but it uses a highly efficient implementation of DCR, mapping to bit vectors, where each activity corresponds to a particular index of the vector. A more recent approach described in 

[27], presents the Rejection miner which exploits the idea of having both positive and negative examples to produce a better process model.

Related to what we present in this paper are conformance checking [7] and process repair techniques [30]. Both these fields aim at understanding whether executions can be replayed on top of an existing processes model or not. However, in our case, we wanted to separate the identification of the processes (i.e., control-flow discovery) from the calculation of their similarity (i.e., the model-to-model metric) so that these two contributions can be used independently from each other. Conformance checking and process repair, on the other hand, embed the evaluation and the improvement into one “activity”.

Online Discovery for Declarative Models. In [4] a framework for the discovery of Declare models from streaming event data has been proposed. This framework can be used to process events online, as they occur, as a way to deal with large and complex collections of datasets that are impossible to store and process altogether. In [23] the work has been generalized to handle also the mining of data constraints, leveraging the MP-Declare notation [5].

Streaming Process Mining in General. Another important research conducted within the area of online process mining is the work done by van Zelst, in his Ph.D. thesis [33]. Throughout the thesis, the author proposes process mining techniques applicable to process discovery, conformance checking, and process enhancement from event streams. An important conclusion from his research consists of the idea of building intermediate models that can capture the knowledge observed in the stream before creating the final process model.

In [6] the author presents a taxonomy for the classification of streaming process mining techniques. Our techniques constitute a hybrid approach in the categories in  [6], mixing a smart window-based model which is used to construct and maintain an intermediate structure updated, and a problem reduction technique used to transform such structure into a DCR graph.

3 Background

In the following section, we recall basic notions of Directly Follows Graphs [32] and the Dynamic Condition Response (DCR) graphs [15]. While, in general, DCR is expressive enough to capture multi-perspective constraints such as time [16], data [29], sub-processes [10] and message-passing constraints [18], in this paper we use the classical, set-based formulation first presented in [15] that contains only four most basic behavioural relations: conditions, responses, inclusions and exclusions.

Definition 1 (Sets, Events and Sequences)

Let denote the set of possible case identifiers and let denote the set of possible activity names. The event universe is the set of all possible events and an event is an element . Given a set and a target set , a sequence maps index values to elements in . For simplicity we can consider sequences using a string interpretation: where .

With this definition, we can now formally characterize an event stream:

Definition 2 (Event stream)

An event stream is an unbounded sequence mapping indexes to events: .

Our approach for extracting DCR graphs leverages the notion of Extended Directly Follows Grap, which is an extension of DFG:

Definition 3 (Directly Follows Graph (DFG))

A directly follows graph is a graph where nodes represent activities (i.e., ), and edges indicate directly follows relations from source to target activities (i.e., with , so ).

Definition 4 (Extended DFG)

An extended DFG is a graph where is a DFG and contains additional numerical attributes referring to the nodes: , where Attrs is the set of all attribute names. To access attribute for node we use the notation .

In the rest of the paper we will consider the following attributes: avgFO: average first occurrence of the activity among the traces seen so far; noTraceApp: number of traces that the activity has appeared in so far; avgIdx: average occurrence index of the activity; and noOccur: number of occurrences of the activity. A DCR graph consists of a multi-directed graph and a marking.

Definition 5 (DCR Graph)

A DCR graph is a tuple , where is a set of activities, is a marking, and for are relations between activities.

A DCR graph defines processes whose executions are finite and infinite sequences of activities. An activity may be executed several times. The three sets of activities in the marking define the state of a process, and they are referred to as the executed activities (), the pending response ()333We might simply say pending when it is clear from the context. and the included activities (). DCR relations define what is the effect of executing one activity in the graph for its context. Briefly:

  • Condition relations say that the execution of is a prerequisite for , i.e. if is included, then must have been executed for to be enabled for execution.

  • Response relations say that whenever is executed, becomes pending. In a run, a pending event must eventually be executed or be excluded. We refer to as a response to .

  • An inclusion (respectively exclusion) relation (respectively ) means that if is executed, then is included (respectively excluded).

For a DCR graph444We will use DCR graph and DCR model interchangeably in this paper with activities and marking we write for the set of pairs (similarly for any of the relations in ) and we write for the set of activities. Notice that Definition 5 omits the existence of a set of labels and labelling function present in [15]. This has a consequence in the set of observable traces: Assume the DCR graph as well as a set of labels and a labelling function . A possible run of has the shape , which can be generated from 1) two executions of , 2) two executions of or 3) an interleaved execution of and . By removing the labels from the events (or alternatively, assuming an injective surjective labelling function in [15]), we assume that if two events share the same activity name, it is because there was a repetition of the event in the stream.

4 Streaming DCR Miner

This section presents the general structure of the stream mining algorithm for DCR graphs. The general idea of the approach presented in this paper is depicted in Fig. 2: constructing and maintaining an extended DFG structure (cf. Def. 4) starting from the stream and then, periodically, a new DCR graph is extracted from the most recent version of the extended DFG available. The extraction of the different DCR rules starts from the same extended DFG instance.

Figure 2: Conceptual representation of the discovery strategy in this paper.

For readability purposes, we split the approach into two phases. The former (Alg. LABEL:alg:meta) is in charge of extracting the extended DFG, the latter (Algs. LABEL:alg:miner:pattern, LABEL:alg:miner:patternAtomic, LABEL:alg:miner:patternComposite) focuses on the extraction of DCR rules from the extended DFG.

algocf[h!]    

Algorithm LABEL:alg:meta takes as input a stream of events , two parameters referring to the maximum number of traces and events to store and a set of DCR patterns to mine (see Def. 6). The algorithm starts by initializing two supporting map data structures obs and deps as well as an empty extended DCR graph (lines 1-3). obs is a map associating case ids to sequences of partial traces; deps is a map associating case ids to activity names. After initialization, the algorithm starts consuming the actual events in a never-ending loop (line 4). The initial step consists of receiving a new event (line 5). Then, two major steps take place: the first step consists of updating the extended DFG graph; the second consists of transforming the extended DFG into a DCR model. To update the extended DFG the algorithm first updates the set of nodes and extra attributes. If the case id of the new event has been seen before (line 6), then the algorithm refreshes the update time of the case id (line 7, useful to keep track of which cases are the most recent ones) and checks whether the maximum length of the partial trace for that case id has been reached (line 8). If that is the case, then the oldest event is removed and the is updated to incorporate the removal of the event. If this is the first time this case id is seen (line 11), then it is first necessary to verify that the new case can be accommodated (line 12) and, if there is no room, then first some space needs to be created by removing oldest cases and propagating corresponding changes (lines 13-14) and then a new empty list can be created to host the partial trace (line 15). In either situation, the new event is added to the partial trace (line 16) and, if needed, a new node is added to the set of vertices (line 17). The data structure can be refreshed by leveraging the properties of the partial trace seen so far (line 18). To update the relations in the extended DFG (i.e., the component of ), the algorithm checks whether an activity was seen previously for the given case id and, if that is the case, the relation from such activity (i.e., ) to the new activity just seen (i.e., ) is added (lines 19-20). In any case, the activity just observed is now the latest activity for case id (line 21). Finally, according to some periodicity (line 22), the algorithm refreshes the DCR model by calling the procedure that transforms (lines 23-24) the extended DFG into a DCR model (cf. Alg. LABEL:alg:miner:pattern).

Algorithm LABEL:alg:miner:pattern generates a DCR graph from an extended DFG. We do so by (1) defining patterns that describe occurrences of atomic DCR constraints in the extended DFG, and (2) defining composite patterns from (1) that define the most common behavior.

Definition 6 (Pattern Poset)

Given a set of patterns , is the pattern dependency poset where is a (binary) relation over such that reflexivity , transitivity and antisymmetry hold .

Patterns as posets allow us to reuse and simplify the outputs from the discovery algorithm. Consider the inclusion of a DCR pattern describing a sequential composition from to (similar to the flow construct in BPMN). A DCR model that captures a sequential behaviour will require 4 constraints in DCR: . Consider . The pattern poset defines the dependency relations for a miner capable of mining sequential patterns. Additional patterns (e.g. exclusive choices, escalation patterns, etc), can be modelled in the same way. Pattern posets are finite, thus there exists minimal elements. algocf[t]     algocf[t]     algocf[t]     The generation of a DCR model from an extended DFG is described in Algorithm LABEL:alg:miner:pattern. We illustrate the mining of DCR conditions, responses and self-responses, but more patterns are available in [28]. The algorithm takes as input an extended DFG and a pattern poset. It starts by creating an empty DCR graph with activities equal to the nodes in and initial marking , that is, all events are included, not pending and not executed. We then split the processing between atomic patterns (those with no dependencies) and composite patterns. The map Rel stores the relations from atomic patterns, that will be used for the composite miner. We use the merge notation to denote the result of the creation of a DCR graph whose activities and markings are the same as , and whose relations are the pairwise union of the range of Rels and its corresponding relational structure in . Line 8 applies a transitive reduction strategy [3], reducing the number of relations while maintaining identical reachability properties.

The atomic and composite miners are described in Algorithms LABEL:alg:miner:patternAtomic, and LABEL:alg:miner:patternComposite. The atomic miner in Algorithm LABEL:alg:miner:patternAtomic

iterates over all node dependencies in the DFG and the pattern matches with the existing set of implemented patterns. Take the case of response. We will identify a response if the average occurrence of

is before (line 6). This condition, together with the dependency between and in is sufficient to infer a response constraint from to . To detect conditions, the algorithm verifies a different set of properties: given a DFG dependency between and , it checks that the first occurrence of precedes and that and appeared in the same traces (approximated by counting the number of traces containing both activities, line 9). The atomic pattern refers to self exclusions identified via the occurrences of an activity in the stream. Other atomic patterns include precedence, or absence of chain successions referred to in Algorithm LABEL:alg:miner:patternComposite.

The composite miner receives the DFG, a pattern, and the list of mined relations from atomic patterns. We provide an example for the case of include and exclude relations. This pattern is assembled as the combination of self-exclusions, precedence, and not chain successions. As these atomic patterns generate each a set of include/exclude relations, the pattern just takes the set union construction.

Suitability of the Algorithms for Streaming Settings.

Whenever discussing algorithms that can tackle the streaming process mining problem [6], it is important to keep in mind that while a stream is assumed to be infinite, only a finite amount of memory can be used to store all information and that the time complexity for processing each event must be constant. Concerning the memory, an upper bound on the number of stored events in Alg. LABEL:alg:meta is given by which happens when each trace contains more than events and at least traces are seen in parallel. Additionally, please note that the extended DFG is also finite since there is a node for each activity contained in the memory. Concerning the time complexity, Alg. LABEL:alg:meta does not perform any unbounded backtracking but, instead, for each event, it operates using just maps that have amortized constant complexity or on the extended DFG (which has finite, controlled size). The same observation holds for Alg. LABEL:alg:miner:pattern as it iterates on the extended DFG which has a size bounded by the provided parameters (and hence, can be considered constant).

5 Experimental Evaluation

To validate the result of the approach presented in the paper we executed several tests, first to validate quantitatively the streaming discovery on synthetic data, then to qualitatively evaluate the whole approach on a real dataset.

5.1 Quantitative Evaluation of Streaming Discovery

(a) Performance comparison on a simple stream.
(b) Performance comparison on a complex stream.
Figure 3: Performance comparison between the offline DisCoveR miner and the streaming DCR Miner with the same amount of storage available (with a capacity of up to 100, 200, 400 and 500 events). A vertical black bar indicates a drift in the model generating the stream.

Our goal with the first quantitative evaluation is to compare the stability of the streaming DCR miner against sudden changes. We compare with other process discovery algorithms for DCR graphs, in this case, the DisCoveR miner [3]. The tests are performed against a publicly available dataset of events streams [8]. This dataset includes (1) a synthetic stream inspired by a loan application process, and (2) perturbations to the original stream using change patterns [35]. Recall that the DisCoveR miner is an offline miner, thus it assumes an infinite memory model. To provide a fairer evaluation we need to parameterize DisCoveR with the same amount of available memory. We divided the experiment into two parts: a simple stream where the observations of each process instance arrive in an ordered manner (i.e., one complete process instance at a time) and a complex stream where observations from many instances arrive intertwined. As no initial DCR graph exists for this process, we used the DisCoveR miner in its original (offline) setting to generate a baseline graph using the entire dataset. This model (the one calculated with offline DisCoveR) was used to calculate the model-to-model similarity between the DCR stream miner and the DisCoveR miner with memory limits. For the sake of simplicity, in this paper, we considered only the case of sudden drifts, while we discuss other types of drift in future work.

In order to quantify the extent of the similarity between the baseline model and the discovered one, we developed a model-to-model metric capable of quantifying to which extent two DCR graphs are similar. This metric can be used, for example, to identify which process is currently being executed with respect to a repository of candidate processes, or by quantifying the change rate of the same process over time. The metric takes as input two DCR graphs and as well as a weight relation that associates each DCR relation in (cf. Def. 5) with a weight, plus one additional weight for the activities. Then it computes the weighted Jaccard similarity [20] of the sets of relations and the set of activities, similarly to what happens in [1] imperative models:

Definition 7 (DCR Model-to-Model metric)

Given and two DCR graphs, and a weight function in the range such that . The model-to-model similarity metric is defined as:

(1)

The model-to-model metric is a similarity metric comparing the relations in each of the two DCR graphs, thus returning a value between 0 and 1, where 1 indicates a perfect match and 0 stands for no match at all. A brief evaluation of the metric is reported in Appendix 0.A.

The results of the quantitative evaluation are reported in Fig. 2(a) and 2(b) (for simple and complex stream resp.). Each figure shows the performance of the incremental version of DisCoveR and the streaming DCR miner against four different configurations over time. The vertical black bars indicate where a sudden drift occurred in the stream. While the performance for the simple stream is very good for both the DisCoveR and the streaming DCR miners, when the stream becomes more complicated (i.e., Fig. 2(b)), DisCoveR becomes less effective, and, though its average performance increases over time, the presence of the drift completely disrupt the accuracy. In contrast, our approach is much more robust to the drift and much more stable over time, proving its ability at managing the available memory in a much more effective way.

5.2 Qualitative Evaluation of the Entire Approach

For the final evaluation, we run a qualitative analysis on a real stream. We compared the results of the streaming miner against a real process model from one of our partner companies: the Dreyers Fond case [12]. The stream contains activities for a grant application system from December 2013 until May 2015, with 6,470 events, some of which belong to partial (never finished) traces. The data was generated from a DCR graph555The latest version is available at https://www.dcrgraphs.net/tool/main/Graph?id=59a932f8-1011-4232-bbc5-9b39efb1fc18.. We used this data to explore whether the streaming miner is capable to identify actual drifts and validate such drifts with the process owner. We finally studied the computational overhead on data from a real scenario and the number of constraints identified in a realistic case.

(a) Model-to-model similarity calculated between the model mined at each point in time and the normative model, with different memory configurations. The vertical black bars are positioned according to actual drifts in the process. All lines are overlapped.
(b) Time required to process each event (expressed in nanoseconds, log scale) over time on the real stream data, with four different memory configurations.
(c) Number of constraints extracted over time on the real stream data, with four different memory configurations.
Figure 4: Analyses on the real stream referring to the Dreyers Fond.

Fig. 3(a) presents the model-to-model similarity against the normative model. It is important to point out that the normative model includes language features that decrease the number of constraints in the model (e.g. nesting [17] and data events), resulting in a low absolute similarity score. However, two important sudden drifts in the model were detected: one before March 2013, and another in May 2015 (highlighted in the figure as well). We inquired the process owner regarding these changes, and they were both confirmed. In the first case, a testing phase, before the process entered into production (cit. “The Dreyers Fond went live in December 2013 but in reality, did not process any applications until March 2014”), took place. The second drift uncovered a system malfunction (cit. “In May 2015 I recall we had a server crash where we manually had to fix things”). In addition, we enquired regarding the smaller changes during the lifetime of the process (resulting in faint model-to-model similarity changes in 4), according to the process owner, such changes represent seasonal drifts: the process starts receiving applications first, and it gradually changes into observing events regarding their application reviews.

In the second analysis, depicted in Fig. 3(b), we reported the time required to process each event. As the graph shows, processing each event on a consumer laptop (Macbook) requires about 1 millisecond, thus showing the applicability of our technique in real settings.

Finally, Fig. 3(c) shows the number of discovered constraints over time. It is worth noticing that initially, the number of constraints grows consistently for all different configurations, however, after a while, the configuration with the larger memory extracts a smaller number of constraints. This is because more memory available means having more observations and thus more potential counterexamples to the requirements for having a constraint (cf. Alg. LABEL:alg:miner:pattern).

5.3 Discussion

One of the limitations of the approach regards precision with respect to offline miners. A limiting aspect of the approach relies on the choice of the intermediate structure. As recently pointed out in [31, 34], a DFG representation may report confusing model behavior as it simplifies the observations using purely a frequency-based threshold. A DFG is in essence an imperative data structure that captures the most common flows that appear in a stream. This, in a sense, goes against the declarative paradigm as a second-class citizen with respect to declarative constraints. We hypothesize that the choice of the DFG as an intermediate data structure carries out a loss of precision with respect to the DisCoveR miner in offline settings. However, in an online setting, the DFG still provides a valid approximation to observations of streams where we do not have complete traces. This is far from an abnormal situation: IoT communication protocols such as MQTT [19] assume that subscriber nodes might connect to the network after the communications have started, not being able to identify starting nodes. Specifically, in a streaming setting it is impossible to know exactly when a certain execution is complete and, especially in declarative settings, certain constraints describe liveness behaviors that can only be verified after a whole trace has been completely inspected. While watermarking techniques [2] could be employed to cope with lateness issues, we have decided to favor self-contained approaches in this paper, leaving for future work the exploration of watermarking techniques.

6 Conclusion and Future Work

This paper presented a novel streaming discovery technique capable of extracting declarative models, expressed using the DCR language, from event streams. Additionally, a model-to-model metric is reported which allows understanding if and to what extent two DCR models are the same. A thorough experimental evaluation, comprising both synthetic and real data, validated the two contributions separately as well as their combination in a qualitative fashion, which included interviews with the process owner.

We plan to explore several directions in future work. Regarding the miner, we plan to extend its capabilities to the identification of sub-processes, nesting, and data constraints. Regarding the model-to-model similarity, we would like to embed more semantic aspects, such as mentioned in [21]. A possible limitation of the streaming miner algorithm approach followed here relates to the updating mechanism. Currently lines 22–24 of Algorithm LABEL:alg:meta perform updates based entirely on periodic updates triggered by time, which will generate notifications even when no potential changes in the model have been identified. A possibility to extend the algorithm will be to integrate the model-to-model similarity as a parameter to the discovery algorithm, so models only get updated after a given change threshold (a similarity value specified by the user) is reached.

Acknowledgments

We would like to thank Morten Marquard from DCR Solutions for providing valuable information regarding the Dreyers Fond case.

References

  • [1] F. Aiolli, A. Burattin, and A. Sperduti (2011) A business process metric based on the alpha algorithm relations. In BPM Workshop, pp. 141–146. Cited by: §5.1.
  • [2] T. Akidau, E. Begoli, S. Chernyak, F. Hueske, K. Knight, K. Knowles, D. Mills, and D. Sotolongo (2021) Watermarks in Stream Processing Systems: Semantics and Comparative Analysis of Apache Flink and Google Cloud Dataflow. Proceedings of VLDB. Cited by: §5.3.
  • [3] C. O. Back, T. Slaats, T. T. Hildebrandt, and M. Marquard (2021-06-28) DisCoveR: accurate and efficient discovery of declarative process models. International Journal on Software Tools for Technology Transfer. External Links: ISSN 1433-2787 Cited by: §2, §4, §5.1.
  • [4] A. Burattin, M. Cimitile, F. M. Maggi, and A. Sperduti (2015) Online discovery of declarative process models from event streams. IEEE Trans. Serv. Comput. 8 (6). Cited by: §2.
  • [5] A. Burattin, F. M. Maggi, and A. Sperduti (2016) Conformance checking based on multi-perspective declarative process models. Expert Syst. Appl. 65, pp. 194–211. Cited by: §2.
  • [6] A. Burattin (2019) Streaming process discovery and conformance checking. In Encyclopedia of Big Data Technologies, Cited by: §2, §4.
  • [7] J. Carmona, B. van Dongen, A. Solti, and M. Weidlich (2018) Conformance Checking. Springer International Publishing. External Links: Document Cited by: §2.
  • [8] P. Ceravolo, G. M. Tavares, S. B. Junior, and E. Damiani (2020) Evaluation goals for online process mining: a concept drift perspective. IEEE Trans. Serv. Comput.. Cited by: §5.1.
  • [9] G. De Giacomo, R. De Masellis, and M. Montali (2014) Reasoning on ltl on finite traces: insensitivity to infiniteness. In

    AAAI Conference on Artificial Intelligence

    ,
    Cited by: §1.
  • [10] S. Debois, T. Hildebrandt, and T. Slaats (2015) Safety, Liveness and Run-Time Refinement for Modular Process-Aware Information Systems with Dynamic Sub Processes. In Proceedings of FM 2015, pp. 143–160. Cited by: §3.
  • [11] S. Debois, T. T. Hildebrandt, and T. Slaats (2018) Replication, refinement & reachability: complexity in dynamic condition-response graphs. Acta Informatica 55 (6). Cited by: §1.
  • [12] S. Debois and T. Slaats (2015) The analysis of a real life declarative process. In 2015 IEEE Symposium Series on Computational Intelligence, pp. 1374–1382. Cited by: §5.2.
  • [13] M. B. Dwyer, G. S. Avrunin, and J. C. Corbett (1999) Patterns in property specifications for finite-state verification. In Proceedings of ICSE, pp. 411–420. Cited by: §1.
  • [14] B. A. Eiriksson and I. Nordland (2020) Analyse: Regelkompleksitet på det Socialretlige Område. Technical report External Links: Link Cited by: §1.
  • [15] T. Hildebrandt and R. R. Mukkamala (2010) Declarative Event-Based Workflow as Distributed Dynamic Condition Response Graphs. In PLACES, Vol. 69. Cited by: §1, §3, §3.
  • [16] T. T. Hildebrandt, R. R. Mukkamala, T. Slaats, and F. Zanitti (2013) Contracts for cross-organizational workflows as timed dynamic condition response graphs. JLAMP 82 (5-7), pp. 164–185. Cited by: §3.
  • [17] T. T. Hildebrandt, R. R. Mukkamala, and T. Slaats (2011) Nested Dynamic Condition Response Graphs. In FSEN, Vol. 7141, pp. 343–350. Cited by: §5.2.
  • [18] T. T. Hildebrandt, T. Slaats, H. A. López, S. Debois, and M. Carbone (2019-02) Declarative choreographies and liveness. In FORTE, LNCS. Cited by: §3.
  • [19] U. Hunkeler, H. L. Truong, and A. Stanford-Clark (2008) MQTT-S—A publish/subscribe protocol for Wireless Sensor Networks. In Proc. of COMSWARE, Cited by: §5.3.
  • [20] P. Jaccard (1912) The Distribution of the Flora of the Alpine Zone. New Phytologist 11 (2), pp. 37–50. Cited by: §5.1.
  • [21] H. A. López, S. Debois, T. Slaats, and T. T. Hildebrandt (2020) Business process compliance using reference models of law. In FASE, pp. 378–399. Cited by: §6.
  • [22] H. A. López, R. Strømsted, J. Niyodusenga, and M. Marquard (2021) Declarative process discovery: linking process and textual views. In CAiSE Forum, Lecture Notes in Business Information Processing, Vol. 424, pp. 109–117. Cited by: Appendix 0.A.
  • [23] N. Navarin, M. Cambiaso, A. Burattin, F. M. Maggi, L. Oneto, and A. Sperduti (2020) Towards Online Discovery of Data-Aware Declarative Process Models from Event Streams. In Proceedings of IJCNN, Vol. . Cited by: §2.
  • [24] V. Nekrasaite, A. T. Parli, C. O. Back, and T. Slaats (2019) Discovering responsibilities with dynamic condition response graphs. In CAiSE, pp. 595–610. Cited by: §2.
  • [25] L. H. Norgaard, J. B. Andreasen, M. Marquard, S. Debois, F. S. Larsen, and V. Jeppesen (2017) Declarative process models in government centric case and document management. In BPM (Industry Track), CEUR Workshop Proceedings, Vol. 1985, pp. 38–51. Cited by: §1.
  • [26] M. Pesic and W. M. Van der Aalst (2006) A declarative approach for flexible business processes management. In Proc. of BPM, pp. 169–180. Cited by: §1.
  • [27] T. Slaats, S. Debois, and C. O. Back (2021) Weighing the pros and cons: process discovery with negative examples. In Proc. of BPM, pp. 47–64. Cited by: §2.
  • [28] L. Starklit (2021) Online Discovery and Comparison of DCR models from Event Streams using Beamline. Master’s Thesis, DTU. External Links: Link Cited by: §4.
  • [29] R. Strømsted, H. A. López, S. Debois, and M. Marquard (2018) Dynamic evaluation forms using declarative modeling. In BPM (Demos/Industry), pp. 172–179. Cited by: §3.
  • [30] W. M.P. van der Aalst (2016) Process Mining. Second edition, Springer Berlin Heidelberg. External Links: ISBN 9783662498507, Document Cited by: §2.
  • [31] W. M.P. van der Aalst (2019) A practitioner’s guide to process mining: limitations of the directly-follows graph. Procedia Computer Science 164, pp. 321–328. External Links: ISSN 1877-0509 Cited by: §5.3.
  • [32] W. van der Aalst (2016) Process mining. Springer Berlin Heidelberg. Cited by: §3.
  • [33] S. J. van Zelst (2019) Process mining with streaming data. Ph.D. Thesis, Technische Universiteit Eindhoven. External Links: ISBN 9789038646992 Cited by: §2.
  • [34] P. Waibel, L. Pfahlsberger, K. Revoredo, and J. Mendling (2022) Causal process mining from relational databases with domain knowledge. CoRR abs/2202.08314. External Links: 2202.08314 Cited by: §5.3.
  • [35] B. Weber, M. Reichert, and S. Rinderle-Ma (2008) Change patterns and change support features–enhancing flexibility in process-aware information systems.

    Data & knowledge engineering

    66 (3), pp. 438–466.
    Cited by: §5.1.

Appendix 0.A Quantitative Evaluation of Model-to-model Metric

Figure 5: Scatter plot showing the correlation between the model-to-model metric and the number of changes introduced in the model. The colour indicates the density of observations.

To validate the quality of the model-to-model metric, we used a dataset of 28 DCR process models collected from previous mapping efforts [22] and, for each model, we randomly introduced variations such as: adding new activities connected to the existing fragments, adding disconnected activities, deleting existing activities (with corresponding constraints), adding constraints, removing constraints, and swapping activity labels in the process. By systematically applying all possible combinations of variations in a different amount (e.g., adding 1/2/3 activities and nothing else; adding 1/2/3 activities and removing 1/2/3 constraints) we ended up with a total of 455,826 process models with a quantifiable amount of variation from the 28 starting processes.

Fig. 5 shows each variation on a scatter plot where the axis refers to the number of variations introduced in the model and the axis refers to the model-to-model similarity. The color indicates the number of models in the proximity of each point (since multiple processes have very close similarity scores). For identifying the optimal weights we solve an optimization problem, aiming at finding the highest correlation between the points, ending up with: which leads to a Pearson’s correlation of -0.56 and a Spearman’s correlation of -0.55. These values indicate that our metric is indeed capable of capturing the changes. As the metric is very compact (value in ) and operates just on the topological structure of the model, it cannot identify all details. However, the metric benefits from the fast speed of computation.