A User Evaluation of Automated Process Discovery Algorithms

06/08/2018 ∙ by Fabrizio Maria Maggi, et al. ∙ University of Tartu Sapienza University of Rome 0

Process mining methods allow analysts to use logs of historical executions of business processes in order to gain knowledge about the actual behavior of these processes. One of the most widely studied process mining operations is automated process discovery. An event log is taken as input by an automated process discovery method and produces a business process model as output that captures the control-flow relations between tasks that are described by the event log. In this setting, this paper provides a systematic comparative evaluation of existing implementations of automated process discovery methods with domain experts by using a real-life event log extracted from an international software engineering company and four quality metrics. The evaluation results highlight gaps and unexplored trade-offs in the field and allow researchers to improve the lacks in the automated process discovery methods in terms of usability of process discovery techniques in industry.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Today’s competitive business environment combined with digital technologies pose a choice to companies, namely to improve or fade away. To improve, they need to increase the efficiencies of their business processes. For decades, analysts have relied on manually modeling the processes and analyzing them so to identify improvement opportunities. Nowadays, by combining business process thinking and data analytics into process mining techniques, companies are in a position to take process analysis and improvement to new levels.

According to [23] process mining can be divided into three main branches, process discovery [2, 24, 3], conformance checking [1, 10, 18, 11] and process enhancement [19, 20, 13]. Process discovery has been and remains the most common and widely studied use case [3]. With an event log (capturing unique case ids, activities, and timestamps) as input, every process discovery method produces a business process model. Nevertheless, there are some challenges in this field. Indeed, the models generated must manage the trade-off of four metrics [8]: fitness, generalization, precision, and simplicity. Over the past decade, impressive advancements have been made in this field [3]. Despite this, automated process discovery methods suffer from two recurrent deficiencies when applied to real-life logs [12]: (i) they produce large and spaghetti-like models; and (ii) they produce models that do not manage to find the right trade-off of the four metrics mentioned above. If such models are difficult to understand or perceived as imprecise, they fail to become the valuable tool they are designed to be.

The evaluation of discovery algorithms is done by using logs (most commonly real-life industry logs) where the generated models are assessed using different metrics. However, the models are rarely evaluated by the process participants or domain experts. In light of this, we seek to investigate how domain experts view and perceive process mining algorithms for process discovery. The research question of this paper is therefore “which discovery algorithm is perceived as the best one by domain experts?”. In so doing, we conduct a systematic literature review aiming at identifying the main discovery algorithms available today. The last overview of this field was conducted by De Weerdt et.al. [12] in 2012. We take their results as input and focus on research published after 2012. We then apply the selected algorithms on an issue tracking system from the software development department of an ERP company. The models produced are then evaluated using surveys and interviews with the domain experts of that company. As such, the contribution of this paper is (i) an updated overview of discovery algorithms, and (ii) a comparative evaluation of discovery algorithms from the perspective of domain experts.

The rest of the paper is structured as follows. Section 2

presents the systematic literature review methodology and classifies the approaches identified. Section

3 discusses the set up and method of evaluation while Section 4 discusses the evaluation and its results. Finally, Section 5 concludes the paper and outlines future work directions.

2 Systematic Literature Review

In order to identify and analyze relevant studies (and underlying methods) related to automated (business) process discovery from event logs, we conducted a Systematic Literature Review (SLR) through a scientific, rigorous and replicable approach as specified by Kitchenham [16]. For the sake of space, in this paper we will briefly present the major steps of our SLR and the most interesting results derived from its enactment. Interested readers can refer to [5] for a detailed discussion of the SLR.

(1) Formulation of the research questions. Since the goal of the SLR was to select methods that produce process models from event logs, we scoped the search by formulating five research questions aimed at: (i) identifying existing studies proposing methods to perform automated process discovery; (ii) categorizing the output of a method on the basis of the type of process model discovered (i.e., imperative, declarative or hybrid), and the specific language employed (e.g., Petri nets, BPMN, Declare); (iii) delving into the specific language constructs supported by a method (e.g., exclusive choice, parallelism, loops); (iv) exploring what tool support the different methods have; and (v) investigating how the methods have been evaluated and in which application domains.

(2) Search strings development and data sources selection. Next, we developed four search strings by building combinations of keywords derived from our knowledge of the subject matter. The keywords employed to perform the search were: (i) “process discovery”; (ii) “workflow discovery”; (iii) “process learning”; and (iv) “workflow learning”. We intentionally excluded the term “automated” in the search strings, because it is often not explicitly used and led to retrieving many more studies than those that actually focus on automated process discovery. We applied each of the search strings to seven popular academic databases: Scopus, Web of Science, IEEE Xplore, ACM Digital Library, SpringerLink, ScienceDirect and Google Scholar, and retrieved studies based on the occurrence of one of the search strings in the title, the keywords or the abstract of a paper. If a query on a specific data source returned more than one thousand results, we refined it by combining the search string with the term “business” or “process mining” to obtain more focused results, e.g., “process discovery AND process mining” The search was completed in December 2017.

(3) Definition of inclusion criteria. Then, we defined inclusion criteria to draw the borders of our search scope and ensure an unbiased selection of relevant studies. To be selected, a study must satisfy all the inclusion criteria, which allowed us to retain studies that: (i) propose a method for automated process discovery from event logs; (ii) propose a method that has been implemented and evaluated; (iii) are peer-reviewed, written in English and published in 2011 or later (earlier studies have been evaluated by De Weerdt et al. in [6]).

(4) Study selection. Finally, we used the search strings to conduct a search on the selected data sources. By only applying the last inclusion criterion, the search queries initially yielded a total of 2820 potentially relevant studies. Then, we analyzed title, abstract, introduction, conclusion and evaluation of these studies to exclude those studies that were clearly not compliant with the other inclusion criteria. As result of the iterations, we found 86 studies matching all the inclusion criteria. However, many of these studies refer to the same automated process discovery method, i.e., some studies are either extensions, optimization, preliminaries or generalization of another study. For such reason, we decided to group the studies by either the last version or the most general one. At the end of this process, 35 main groups of discovery methods were identified.

(5) Study classification. Driven by the research questions, we also classified the methods underlying these studies on the basis of the following dimensions: (i) model type (procedural, declarative, hybrid) and language (e.g., Petri nets, BPMN, Declare) supported; (ii) semantics captured in procedural models, e.g., parallelism (AND), exclusive choice (XOR), inclusive choice (OR) and loop; (iii) type of implementation (standalone or plug-in, and tool accessibility); (iv) type of evaluation data (real-life, synthetic or artificial log) and (v) domain of application (e.g., insurance, banking, healthcare, etc.). Collectively, this information is summarized in Table 2 of work [5], where each entry refers to the main study of the 35 groups found.

3 Evaluation

In this section, we describe the set up and method of our evaluation. In section 3.1, we give a general description of the log used and specify the list of miners used. In section 3.2, we describe the preprocessing applied to the original log to create a refined the dataset to be used for model discovery. In section 3.3, we specify the setup for the user evaluation. In section 3.4, we present the methods for conducting our statistical analysis.

3.1 Experimental Setting

In our evaluation, we use a log from a software company pertaining to a software development process for developing an ERP system. The log contains data spanning over one year. It has events and cases. The log is extracted from their own software used for functional enhancement and bugs. The input log does not contain explicit information about activities performed. Therefore, in section 3.2, we describe how this information was extracted from the log so to make it suitable for process discovery. After applying this procedure, 29 unique activities were identified.

In the evaluation, we used a selection of the methods surveyed in [5]. Assessing all the methods that resulted from the literature review would not be possible due to the heterogeneous nature of the inputs required and the outputs produced. Hence, we decided to focus on the largest subset of comparable methods. The methods considered were the ones satisfying the following criteria:

  • an implementation of the method is publicly accessible;

  • the output of the method is a BPMN model or a model seamlessly convertible into BPMN (i.e., process trees and Petri nets).

The second criterion is a requirement dictated by the fact that evaluation was performed with business users from industry, which are often non-expert of the technical base formalisms used in the BPM field. Using these criteria, the following miners were identified:

  • BPMNMiner [9], which is a method for the automated discovery of BPMN models containing sub-processes, activity markers such as multi-instance and loops, and interrupting and non-interrupting boundary events (to model exception handling). The method is robust to noise in event logs.

  • Causal Net Mining [14], which encodes causal relations gathered from an event log and if available, background knowledge in terms of precedence constraints over the topology of the resulting process models. The discovery algorithm is formulated in terms of reasoning problems over precedence constraints.

  • alpha$-algorithm [15], which can discover invisible tasks involved in non-free-choice constructs. The algorithm is an extension of the well-known algorithm, one of the very first algorithms for automated process discovery, originally presented in [2].

  • Heuristics Miner [25], which is a method that can discover process models containing non-trivial constructs, but with a low degree of block-structuredness. At the same time, the method can cope well with noise in event logs.

  • Hybrid ILP Miner [26]

    , which is based on hybrid variable-based regions. Through hybrid variable-based regions, it is possible to vary the number of variables used within the ILP (Integer Linear Programming) problems being solved. Using a different number of variables has an impact on the average computation time for solving the ILP problem.

  • Structured Miner [4], which is an improvement of the Heuristics Miner algorithm to separate the objective of producing accurate models and that of ensuring their structuredness and soundness. Instead of directly discovering a structured process model, the approach first discovers accurate, possibly unstructured (and unsound) process models, and then transforms the resulting process model into a structured (and sound) one.

  • Inductive Miner [17], which is based on the extraction of process trees from an event log. It efficiently drops infrequent behavior, still ensuring that the discovered model is behaviorally correct (sound) and highly fitting.

  • Evolutionary Tree Miner (ETMd) [7]

    , which is based on a genetic algorithm that allows the user to drive the discovery process based on preferences with respect to the four quality dimensions of the discovered model: fitness, precision, generalization and complexity.

3.2 From the event log to the process models

As mentioned in section 3.1, the log was preprocessed for the evaluation. The activities were not explicitly recorded in the log but the information about the actors/process participants was. However, the company from where the log comes, has an organizational structure where one department performs quality assurance and another documentation. By using the role of the actors, such as EE Senior Coder, the activity could be deduced. This step was conducted together with the company to ensure that the activities were correctly captured.

Miner Size CNC Density
alpha$ 145 1.490 0.010
BPMN Miner 25 1.760 0.073
CNM 122 2.155 0.017
ETMd 124 1.411 0.011
HILP 65 1.600 0.025
HM 97 1.990 0.021
IM 42 1.571 0.038
Structured Miner 54 1.982 0.037
Table 1: Complexity of the discovered models before filtering
Miner Size CNC Density
alpha$ 71 1.408 0.020
BPMN Miner 15 1.600 0.114
CNM 52 1.411 0.020
ETMd 84 1.274 0.015
HILP 34 1.471 0.045
HM 52 1.865 0.037
IM 28 1.429 0.053
Structured Miner 33 1.454 0.045
Table 2: Complexity of the discovered models after filtering

When applying the identified methods to the original log, all the BPMN models discovered were highly complex and spaghetti-like. In Table 1, we show the metrics measuring the complexity of the models discovered from the original log. In particular, size is the number of nodes, CNC is the Coefficient of Network Connectivity (CNC), i.e., the ratio between the number of arcs and the number of nodes and density is the ratio between the actual number of arcs and the maximum possible number of arcs in any model with the same number of nodes.

To improve the understandability of the models, we decided to filter the original log to isolate frequent behaviors. In particular, we created nine separate logs ranging from a log containing all behavior, to a log containing the behavior shared by at least 2 cases up to a log containing the behavior shared by at least 9 cases. These logs were then used to produce a BPMN model. In Table 2, we show the metrics measuring the complexity of the models discovered from the log containing the behavior shared by at least 9 cases. For the user evaluation, we selected the BPMN model mined by Evolutionary Tree Miner represented in Figure 1 since this model has the lowest CNC and density111Model A produced many redundant elements which we manually removed to make the model more readable for the domain experts. The second model chosen was the one obtained by the Structured Miner shown in Figure 3 due to its smallest size. Finally, we also included the model generated by the Inductive Miner as shown in Figure 2. For anonymization purposes during the evaluation, we referred to the Evolutionary Tree Miner as model A, the Structured Miner as model B, and the Inductive Miner as model C. We discarded the model discovered with BPMN Miner since, even if very simple, this model was very imprecise (a “flower” model allowing any behavior).

Figure 1: BPMN model mined by Evolutionary Tree Miner (model A)
Figure 2: BPMN model mined by Structured Miner (model B)
Figure 3: BPMN model mined by Inductive Miner (model C)

3.3 Evaluation set-up

The evaluation was conducted in two steps both of which involved the domain experts of the company. Having produced the models, we set up several meetings where employees from development teams, product managers, testers, documenters, and all team leads were invited to. In total, 18 domain experts participated in the survey which constitute 72% of process participants at the company (excluding administrative staff). In the first step, the participants were given a printed copy of the papers and asked to fill out a questionnaire (via Google Forms). The questions asked concerned their familiarity with process models and work with such models over the past 12 months. Next, they were asked to compare the models from different process model quality metrics. The questions asked were as follows.

  1. Rate how easy it is for you to understand the process models (1 means very difficult, 7 means very easy).

  2. Take one path and follow it from the beginning to the end. Rate how easy it is for you to follow your chosen path (1 means very difficult, 7 means very easy).

  3. Rate how easy it is for you to distinguish the paths in models (1 means very difficult, 7 means very easy).

  4. Can you recognize any processes you work with in the models? (1 means not at all, 7 means yes, clearly, everything is there).

  5. In your estimation, rate how well the models describe your processes (1 means that the model is too specific so to exclude some paths that are possible in reality, 7 means that the model is too general so to allow process paths that are not possible in reality).

  6. If you were to improve your business processes, which model would you find most useful for this purpose? (1 means useless, 7 means very useful).

The above questions correspond to the following process model quality metrics:

  • understandability - Questions 1, 2 and 3;

  • correctness - Question 4;

  • precision - Question 5;

  • usefulness - Question 6.

For each question we had three variants as A, B or C representing the specific miner used to create the model. We restricted ourselves to three variants also to avoid getting meaningless results due relativity low amount of participants. If we would have more than three models, the answers could be distributed in a way where it wouldn’t be possible to make any statistically strong conclusions.

In the second stage, which was performed after first stage, we carried out a workshop. It was in open form allowing participants to discuss and express their perceptions and offer qualitative feedback about the models compared. The discussions did not follow a strict structure but we used the following targeted questions to moderate the workshop.

  • Which of the models were the best? Why?

  • How did the models look like in general?

  • What could be developed?

  • Did the models filled your expatiations?

  • Would you consider using these algorithms in your company? And process discovery?

  • What lacks are present in the models?

3.4 Description of statistical analysis methods used

With statistical analysis we wanted to discover if there is any differences of the ratings between the models. Before the analyses, the data was formatted to be suitable for data analysis with the free software R. The answers of the questionnaire was extracted from Google Forms as .CSV file. At first the name of the questions in the header of the .CSV file were renamed to correspond to the respective metrics. If there was more than one question for a metric, the ratings of the subquestions were divided by the number of subquestions and then summed to reflect the general rating of the respective process model quality metrics.

For the evaluation of domain experts perception of the models, we formulated the following hypothesis pair:

  • The null hypothesis

    : There is no difference in the mean rating of the models.

  • The alternative hypothesis: There is at least one model that is different from the others.

The hypotheses were tested using the two-way ANOVA. If we assume the independence of the observations, then the ANOVA model additionally assumes that the residuals are normally distributed for each combination of the groups and the residuals have the same variance (homogeneity of variances) for each combination of the groups. Normality assumption was assessed with Shapiro-Wilk test and QQ-plots, homogeneity of variances was assessed with Levene’s test. A significant ANOVA test was followed by Tukey HSD test to perform multiple pairwise-comparison between the means to determine the statistically significant pairs of groups. All the analysis was done in R by using RStudio and the figures were made with ggplot2. Violin plots were used due their expressiveness of median value, interquartile range, and kernel density estimations.

4 Evaluation results

The research question of this paper is about how discovery algorithm are perceived by domain experts. In answering this question, we investigate the perceived understandability, correctness, precision, and usefulness of the models by the domain experts. Size and complexity together with perceived understandability correspond to “simplicity”. Correctness aims at assessing perceived “fitness” whereas precision captures perceived “generalization” and “precision”. In this section, we describe the results of the evaluation from two perspectives. The first are results obtained from surveys filled in by the domain experts. Secondly, we summarize the main results gathered from discussions with the domain experts. The algorithm generating the models are referred to as model A (Evolutionary Tree Miner), model B (Structured Miner), and Model C (Inductive Miner).

4.1 Domain Expert Survey Results

The domain experts were given a link to a survey where did saw the three models accompanied with 6 questions. These questions aimed at measuring the perceived understandability, precision, correctness, and usefulness. The first three questions aimed at assessing perceived understandability. These questions asked about how easy it was understand the model, to follow one path from the beginning to the end, and to distinguish between the different paths. The respondents gave a rating from 1 (very difficult) to 7 (very easy) for each of the selected models (model A, B, and C). In regards to understanding the models, A had a mean of 3.28 whereas B and C both had 4.72. As to the ease by which a path could be followed, A had the lowest mean value of 3.61, B the second lowest with 5.11 and C the highest with 5.39. The experts also found model A to be most difficult when distinguish between different paths. The mean values for this question are 3.33 for A, 5.39 for B, and 5.22 for C. The results are also depicted in Figure 4 (the bold line denotes the median). As can be seen, model A is perceived as the least understandable for all three questions whereas model B and C are comparable. The difference between model A and the others is most significant in regards to the ease by which different paths can be distinguished in the model. It is interesting to note that the respondents perceived it slightly easier to follow a path from beginning to end in model C as compared to model B.

Figure 4: Results for Understandability

The results for understandability seem to show a distinct preference for model B and C over model A, in particular for distinguishing the different paths in the models.

The fourth question of the survey aimed at capturing the perceived correctness of the models. This was achieved by asking the respondents if they could recognize processes they work with in the models. For each model, they gave a rating from 1 (not at all) to 7 (all of them).

Figure 5: Results for Correctness

The results 5 show that models are quite similar in regards to perceived correctness. However, there is a slight preference for model C over B and A. The interquartile range and density for model C puts it as the model perceived to be most correct by the domain experts. This is also reflected in the mean values where model A has 4.11, B with 4.89, and finally C with 5.28.

The fifth question concerned precision. The question asked for the respondents estimation of how well the models described their processes. The rating option ranged from 1 (too specific meaning excluding paths that are possible) to 7 (too general meaning allowing paths not possible in reality).

Figure 6: Results for Precision

As can be seen from figure 6, model A and B are very similar in regards to perceived precision. The interquartile range for model A and B (both have a mean value of 4.44) are very similar whereas the results for model C show a wider distribution of responses. However, considering the distribution of the respondents results for model C (mean value of 5.22), it seems that model C is perceived to be more general as compared to model A and B.

The final question aimed at assessing perceived usefulness. As processes are often discovered for process enhancement, the respondents were asked about the usefulness of the models for improving the processes. The rating scale was from 1 (useless) to 7 (very useful).

Figure 7: Results for Usefulness

The results show that model C was perceived as more useful as can be seen in Figure 7. The difference between model A as compared with model B and C is significant. Model A has a mean value of 3.44 whereas model B has 4.17. Model C has a value of 5.22 but the difference between model B and C are less distinct. Even so, it seems that model C is perceived to be slightly more useful if it was to be used for process improvement.

Our null hypothesis was defined as “there is no difference in the mean rating of the models” and the alternative was that there is at least one model that is different from the others. We reject the null hypothesis (p-value of 9.297e-07) and therefore confirm that at least one model is different. Model A is clearly different from B and C. However, we could not conclude that there are statistically significant difference between model B and C.

4.2 Domain Expert Discussion Results

Following the survey, we engaged in discussions with the domain experts about the different models. The discussions were semi-structured where a set of topical questions were asked but the discussion did not follow a strict order. The discussion questions raised were about how they perceived the models in regards to quality and understandability, how they could be improved, and if they could see any use for the models in regards to process improvement. It should be noted that usefulness of process mining techniques for discovery was not under questions. Rather, the aim of the discussions was to better understand how they perceived the generated models.

The most common observation presented by the domain experts concerned overlaying the models with additional data. Adding data about path frequency, frequency of activities, and performance data (such as different time metrics) would improve the usability and understandability of the models considerably. The reasons presented were that such data would allow distinguishing the most commonly executed paths, deviations, and clarify which paths were not taken at all (but was possible in the model). They also noted that in model C, it is possible to reach the end of the process without passing through any activity (two possible paths allow this). As such, it is needed to see which paths are actually present in the log. Such information would, when using these models for process improvement, aid in focusing efforts for further analysis.

The team leads expressed more positive sentiment in using the models for improving the processes. While they saw opportunities, the developers found the models interesting but of limited value for improving their own work. This point was quite expected as developers’ work is mostly confined to one or few activities. The domain experts also expressed interest in better seeing loops in the process. The models, as they currently are, do not show where in the process loops occur. It was also noted that infrequent paths should be represented but the models could be made more simple. Understandability of the models would be improved if sub-processes would be introduced. Furthermore, the domain experts shared that the models contained more gateways and nodes than perceived necessary. In regards to the gateways, they also saw the need of annotating the gateways with numerical data such as frequency or at least default path.

The gradually gowning consensus was that the models were fairly accurate, they did capture most of the processes existing in the company, and that model c best reflected the company’s everyday work. The domain experts also expressed the usefulness of such techniques for understanding the current state but wished to see simpler models (by use of sub-processes and fewer gateways) enhanced with data. It should be noted that commercial products for process discovery such as Disco 222https://fluxicon.com/disco/ or Celonis 333https://www.celonis.com do provide simpler models, overlay the models with data, and allow for filtering based on activities and paths. However, open source plug-ins to ProM does not currently offer such functionality.

4.3 Summary

Taking all three aspects considered in the comparative evaluation, we note that model B and C are perceived as better as compared to model A. The metrics for size show that model A is clearly larger than both B and C. The number of elements (84) is clearly above the threshold suggested by research [21] [22]. It is therefore perhaps not surprising that domain experts found this model to be the least understandable.

Model A Model B Model C
Size 84 33 28
CNC 1.28 1.45 1.43
Density 0.015 0.045 0.053
Understandability Q1 3.28 4.72 4.72
Understandability Q2 3.61 5.11 5.39
Understandability Q3 3.22 5.39 5.22
Correctness 4.11 4.89 5.28
Precision 4.44 4.44 5.22
Usefulness 3.44 4.17 5.22
Table 3: Rating table

In regards to correctness, precision and usefulness, the domain experts clearly favor model B and C over model A. While the values for model C are slightly higher than for B, the difference is not significant enough to draw conclusions as can be seen from Figure 8.

Figure 8: Comparison of Expert Opinions

One might reason that the difference may be attributed to individual preference. Finally, the domain experts emphasized the benefits of having additional data captured along the models (frequency and performance metrics).

5 Conclusion

This paper has presented a comparative evaluation of existing implementations of automated process discovery methods using a real-life event log from an international software engineering company. From the statistical analysis we discovered that there exist a model that is statistically different from the others, the model A. Domain experts found the Inductive Miner (model C) to be the best one, closely followed by Structured Miner (model B), but the difference is not statistically significant between these two.

Finally, domain experts found that the automated process discovery methods at current state are not as useful as they could be for their goals. Commercial packages have come further in making automatically discovered process models useful for industry. Despite the accumulated amount of research in this field, there is room for further improvements. This opens up new directions for the research, like adding frequency information to the models, splitting the models, adding tracking scales, and adding time information. In domain experts opinion these make models more usable and bring more value to them.

References

  • [1] van der Aalst, W.M.P., Adriansyah, A., van Dongen, B.F.: Replaying history on process models for conformance checking and performance analysis. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery 2(2), 182–192 (2012)
  • [2] van der Aalst, W.M.P., Weijters, T., Maruster, L.: Workflow mining: Discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9) (2004)
  • [3]

    Van der Aalst, W.M.: Process mining: data science in action. Springer (2016)

  • [4]

    Augusto, A., Conforti, R., Dumas, M., La Rosa, M.: Automated Discovery of Structured Process Models From Event Logs: The Discover-and-Structure Approach. Data and Knowledge Engineering (to appear) (2017)

  • [5] Augusto, A., Conforti, R., Dumas, M., Rosa, M.L., Maggi, F.M., Marrella, A., Mecella, M., Soo, A.: Automated Discovery of Process Models from Event Logs: Review and Benchmark. CoRR abs/1705.02288 (2017), http://arxiv.org/abs/1705.02288
  • [6] vanden Broucke, S.K.L.M., Weerdt, J.D., Vanthienen, J., Baesens, B.: A comprehensive benchmarking framework (CoBeFra) for conformance analysis between procedural process models and event logs in ProM. In: IEEE Symposium on Computational Intelligence and Data Mining, CIDM. pp. 254–261. IEEE (2013)
  • [7] Buijs, J.C., van Dongen, B.F., van der Aalst, W.M.: Quality dimensions in process discovery: The importance of fitness, precision, generalization and simplicity. International Journal of Cooperative Information Systems 23(01), 1440001 (2014)
  • [8] Buijs, J.C., Van Dongen, B.F., van Der Aalst, W.M., et al.: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery. In: OTM Conferences (1). vol. 7565, pp. 305–322 (2012)
  • [9] Conforti, R., Dumas, M., García-Bañuelos, L., La Rosa, M.: Bpmn miner: Automated discovery of bpmn process models with hierarchical structure. Information Systems 56, 284–303 (2016)
  • [10] De Giacomo, G., Maggi, F.M., Marrella, A., Patrizi, F.: On the Disruptive Effectiveness of Automated Planning for LTLf

    -Based Trace Alignment. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. pp. 3555–3561 (2017)

  • [11] De Giacomo, G., Maggi, F.M., Marrella, A., Sardiña, S.: Computing Trace Alignment against Declarative Process Models through Planning. In: Proceedings of the Twenty-Sixth International Conference on Automated Planning and Scheduling, ICAPS 2016, London, UK, June 12-17, 2016. pp. 367–375 (2016)
  • [12] De Weerdt, J., De Backer, M., Vanthienen, J., Baesens, B.: A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs. Information Systems 37(7), 654–676 (2012)
  • [13] Fahland, D., van der Aalst, W.M.: Repairing process models to reflect reality. In: International Conference on Business Process Management. pp. 229–245. Springer (2012)
  • [14] Greco, G., Guzzo, A., Lupia, F., Pontieri, L.: Process discovery under precedence constraints. ACM Trans. on Know. Discovery from Data (TKDD) 9(4),  32 (2015)
  • [15] Guo, Q., Wen, L., Wang, J., Yan, Z., Philip, S.Y.: Mining invisible tasks in non-free-choice constructs. In: International Conference on Business Process Management. pp. 109–125. Springer (2015)
  • [16] Kitchenham, B.: Procedures for performing systematic reviews. Keele, UK, Keele University 33(2004), 1–26 (2004)
  • [17] Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering Block-Structured Process Models from Incomplete Event Logs. In: Application and Theory of Petri Nets and Concurrency: 35th International Conference, PETRI NETS 2014. Springer (2014)
  • [18] de Leoni, M., Marrella, A.: Aligning Real Process Executions and Prescriptive Process Models through Automated Planning. Expert Syst. Appl. 82, 162–183 (2017)
  • [19] Maggi, F.M., Corapi, D., Russo, A., Lupu, E., Visaggio, G.: Revising process models through inductive learning. In: zur Muehlen, M., Su, J. (eds.) Business Process Management Workshops. pp. 182–193. Springer Berlin Heidelberg, Berlin, Heidelberg (2011)
  • [20] Maggi, F.M., Di Francescomarino, C., Dumas, M., Ghidini, C.: Predictive monitoring of business processes. In: Advanced Information Systems Engineering. pp. 457–472. Springer International Publishing, Cham (2014)
  • [21] Mendling, J., Neumann, G., Van Der Aalst, W.: Understanding the occurrence of errors in process models based on metrics. In: OTM Confederated International Conferences” On the Move to Meaningful Internet Systems”. Springer (2007)
  • [22]

    Mendling, J., Sánchez-González, L., García, F., La Rosa, M.: Thresholds for error probability measures of business process models. Journal of Systems and Software 85(5), 1188–1197 (2012)

  • [23] Van Der Aalst, W., Adriansyah, A., De Medeiros, A.K.A., Arcieri, F., Baier, T., Blickle, T., Bose, J.C., van den Brand, P., Brandtjen, R., Buijs, J., et al.: Process mining manifesto. In: International Conference on Business Process Management. pp. 169–194. Springer (2011)
  • [24] Van Dongen, B.F., De Medeiros, A.A., Wen, L.: Process mining: Overview and outlook of petri net discovery algorithms. In: Transactions on Petri Nets and Other Models of Concurrency II, pp. 225–242. Springer (2009)
  • [25] Weijters, A., Ribeiro, J.: Flexible heuristics miner (FHM). In: 2011 IEEE Symp. on Computational Intelligence and Data Mining (CIDM). IEEE (2011)
  • [26] van Zelst, S.J., van Dongen, B.F., van der Aalst, W.M.P.: ILP-Based Process Discovery Using Hybrid Regions. In: International Workshop on Algorithms & Theories for the Analysis of Event Data, ATAED 2015. CEUR Workshop Proceedings, vol. 1371, pp. 47–61. CEUR-WS.org (2015)