Log In Sign Up

Control Flow Information Analysis in Process Model Matching Techniques

by   Christopher Klinkmüler, et al.

Online Appendix to: "Analyzing Control Flow Information to Improve the Effectiveness of Process Model Matching Techniques" by the same authors.


page 1

page 2

page 3

page 4


Modular Information Flow through Ownership

Statically analyzing information flow, or how data influences other data...

Online Soft Conformance Checking: Any Perspective Can Indicate Deviations

Within process mining, a relevant activity is conformance checking. Such...

Understanding the Dynamics of Information Flow During Disaster Response Using Absorbing Markov Chains

This paper aims to derive a quantitative model to evaluate the impact of...

Embedded Design of Automatic Pesticide Spraying Robot Control System

In agriculture, crops need to apply pesticide spraying flow control prec...

BCFA: Bespoke Control Flow Analysis for CFA at Scale

Many data-driven software engineering tasks such as discovering programm...

Hungarian Layer: Logics Empowered Neural Architecture

Neural architecture is a purely numeric framework, which fits the data a...

Robustness of Regional Matching Scheme over Global Matching Scheme

The paper has established and verified the theory prevailing widely amon...

1 Introduction

Process model matchers automate the identification of correspondences between process models, i.e., activities that represent similar functionality in different models. This way they support a variety of tasks in business process management, e.g., the management of process model collections, the consolidation of processes, or the re-use of process fragments at design time. To detect correspondences, matchers generally compare activity labels which provide brief natural language descriptions. Additionally, some matchers attach importance to the control flow in the process models and analyze the temporal dependencies between the activities. Yet, the process model matching contests in 2013 and 2015 [Cayoglu et al., 2013, Antunes et al., 2015] revealed that the effectiveness of state-of-the-art matchers is low, i.e., their results only contain a few existing and many irrelevant correspondences.

With that in mind, we aim to provide guidance for the development of more effective matchers. In particular, we here examine the question: how can process model matchers benefit from the analysis of control flow information?

To answer this question, we review the process model matching literature in order to identify options for using control flow information in the matching process. As we are interested in improving the matcher effectiveness, we further assess the evidence that was given towards the successful contribution of control flow information to the matching process. We limit our focus to control flow information, because the comparison of labels has been extensively studied in natural language processing

[Manning and Schütze, 1999], information retrieval [Manning et al., 2008], as well as schema and ontology matching [Euzenat and Shvaiko, 2013, Rahm and Bernstein, 2001].

The remainder of this report is organized as follows. We first outline how we identified process model matching literature in Section 2. Next, we provide a brief overview of the identified literature in Section 3. Then, we present a classification of analysis options that we derived from the literature and discuss the evidence given towards them in Section 4. Finally, Section 5 concludes the report.

2 Identification of Literature

In this section, we outline the literature search process that was applied to identify relevent publications. This search process was carried out by the first author in the context of his PhD thesis [Klinkmüller, 2017] where he reviewed the process model matching literature with regard to the general applicability of matchers, their effectiveness, and the applied research approaches. In this work, we extend this review by analyzing the use of control flow information in those matchers.

To identify relevant publications, guidelines for literature reviews [vom Brocke et al., 2009] were applied. In the first step, the matching contests [Cayoglu et al., 2013, Antunes et al., 2015] which were considered as a source of relevant works were analyzed. Here, six additional publications that introduced process model matching techniques were identified and an initial set of eight publications was obtained. This set served as a basis for the verification of the search strategy in a later step. Then, the database search was prepared. As the overall goal was the identification of papers that introduce matching techniques, “process model matching” was chosen as the basic search string. Yet, this term is quite specific and might not cover all relevant publications. Hence, it was split into the terms “process model” and “matching” for each of which synonyms were considered. That is, works that deal with process models or workflows, and refer to matching, mapping, or alignment were considered, too. Therefore, the combined search string is [(“process model” OR “workflow”) AND (“match*” OR “align*” OR “map*”)]. Moreover, the search was restricted to scientific, peer-reviewed, English literature that was published between January 2000 and May 2016, where the former date is associated with the rise of modern BPM and the latter marks the time at which the review was completed.

Next, the database search was carried out which relied on the following databases that comprise publications from the information systems domain: Springer Link, ACM Digital, IEEE Xplore, Science Direct, Emerald Insight, and Google Scholar.111Each database uses a proprietary format for search criteria, to which we adapted the format of the search string and cut-off dates. Querying the databases resulted in lists that comprised hundreds or thousands of papers, where papers with a low rank were highly likely to be irrelevant. Given the specific scope of the search, each list was scanned stepwise starting from the highest rank, and as per common practice stopped scanning when the distance to the last relevant paper exceeded 20. At a minimum, the first 50 entries were considered. In total, 2261 entries were reviewed and 47 papers were marked as relevant based on a title and abstract scan.

At this point, the search strategy was evaluated by checking if publications known to be relevant were included. For this check, the eight initially identified works were considered. When checking whether this eight papers were part of the 48 results, it was found that, except for [Antunes et al., 2015], all publications were included. This paper was not found since it was published in the Lecture Notes in Informatics through the German society for computer science (Gesellschaft für Informatik) which were not indexed by any of the databases at this time. This result was considered to verify the strategy, but its completeness is still limited by the completeness of the employed databases. To mitigate the risk of overlooking papers, the search was complemented by a backward search over the references in the identified papers. Here, two more relevant publications were identified.

Finally, the identified papers were filtered and only those that were relevant with respect to the scope were selected. For this step, the inclusion criterion was that the papers introduced a process model matcher. On the contrary, papers were excluded if they (i) discussed process model matching; (ii) addressed aspects of model collection management; (iii) discussed support for process model design; or (iv) referred to Business-IT alignment. Additionally, the publications [Gerth et al., 2011, Gerth, 2014] were identified as duplicates. The remaining 19 papers all propose a matcher. Tables 1 and 2 summarize the classification and source of all 49 publications that were considered during the search process.

Reference Topic First Source
[Zhuge, 2002] Collection Management Springer
[Wombacher et al., 2003] Collection Management IEEE Explore
[Wombacher et al., 2004] Collection Management IEEE Explore
[Brockmans et al., 2006] Model Matching [Dijkman et al., 2009]
[Suwannopas and Senivongse, 2006] Collection Management Google Scholar
[Lei et al., 2007] Collection Management Springer
[Nejati et al., 2007] Model Matching [Dijkman et al., 2009]
[Deutch and Milo, 2009] Collection Management IEEE Explore
[Dijkman et al., 2009] Model Matching Matching Contest
[Gacitua-Decar and Pahl, 2009] Collection Management IEEE Explore
[Gao and Zhang, 2009] Collection Management ACM Digital
[Jung, 2009] Collection Management ACM Digital
[Zhu and Pung, 2009] Collection Management IEEE Explore
[Akkiraju and Ivan, 2010] Collection Management Springer
[Gacitua-Decar and Pahl, 2010] Collection Management IEEE Explore
[Gater et al., 2010b] Collection Management IEEE Explore
[Gater et al., 2010a] Model Matching ACM Digital
[Kim and Suhh, 2010] Design ACM Digital
[Niedermann et al., 2010] Design IEEE Explore
[Sakr and Awad, 2010] Collection Management ACM Digital
[Tonella and Di Francescomarino, 2010] Design ACM Digital
[Weidlich et al., 2010] Model Matching Matching Contest
[Dijkman et al., 2011] Collection Management Google Scholar
[Gater et al., 2011] Model Matching IEEE Explore
[Gerth et al., 2011] Model Matching IEEE Explore
[Gerth, 2014]
[Abbas and Seba, 2012] Collection Management IEEE Explore
[Belhoul et al., 2012] Collection Management IEEE Explore
[Branco et al., 2012] Model Matching Matching Contest
[Chan et al., 2012] Design Google Scholar
[Leopold et al., 2012] Model Matching Springer Link
[Belhoul et al., 2013] Collection Management IEEE Explore
[Dahman et al., 2013] Business-IT Alignment ACM Digital
[Klinkmüller et al., 2013] Model Matching Matching Contest
[Weidlich et al., 2013a] Model Matching Springer Link
[Weidlich et al., 2013b] Model Matching Matching Contest
Table 1: Publications considered during the literature (part I)
Reference Topic First Source
[Baumann et al., 2014a] Model Matching Springer Link
[Cayoglu et al., 2013] Model Matching Matching Contest
[Fengel, 2014] Model Matching Emeral Insight
[Kacimi and Tari, 2014] Similarity Search IEEE Explore
[Klinkmüller et al., 2014] Model Matching Matching Contest
[Ling et al., 2014] Model Matching Springer Link
[Baumann et al., 2014b] Model Matching Springer Link
[Belhoul et al., 2015] Collection Management IEEE Explore
[La Rosa et al., 2015] Collection Management ACM Digital
[Sebu and Ciocârlie, 2015] Collection Management IEEE Explore
[Ternai et al., 2015] Design Springer Link
[Tsagkani, 2014] Discussion Springer Link
[Antunes et al., 2015] Process Model Matching Matching Contest
[Beheshti et al., 2016] Discussion Springer Link
Table 2: Publications considered during the literature (part II)

3 Overview of Identified Literature

Next, we briefly summarize the identified publications. To ensure that relations between these techniques are comprehensible, they are presented in historical order.

Semantic Alignment of Business Processes [Brockmans et al., 2006]. A generic approach to identify elementary correspondences between elements of PrT nets is proposed in [Brockmans et al., 2006]. The approach identifies correspondences between transitions as well as other elements. First, the user determines the types of elements and properties that the approach should consider. Next, similarity scores for all possible property pairs are calculated based on manually provided ontologies. These property scores are then aggregated into similarity scores per element pair. Finally, corresponding element pairs are selected and, if necessary, another run can be triggered to refine the results. The approach is demonstrated with regard to an example, but it is not evaluated.

Matching Statecharts Specifications [Nejati et al., 2007]. Nejati et al. [2007] present a matcher that is tailored to statecharts. Rather than matching transitions, the matcher computes correspondences between states. There are two types of basic matchers: static matchers rely on state labels and positions; and behavioral matchers examine whether two states depend on or transition into similar states. The evaluation resides on three model pairs for which the matcher yields recall values between and and precision values between and .

Aligning Business Process Models [Dijkman et al., 2009]. A configurable matching technique is introduced in [Dijkman et al., 2009]. It calculates the similarity of activities based on a syntactic measure where stemming can be optionally applied to unify words. The first matcher variant proposes all activity pairs with a label similarity score higher than a threshold as a correspondence. The second variant determines alignments by optimizing their overall similarity score. In particular, a graph edit distance is defined that relies on label similarity scores as well as the number of matched activities and edges. The detection of alignments is based on the idea of minimizing the graph edit distance. Here, a greedy search or an A-star search can be used to stepwise add activities to the alignment. In an optional post-processing step, the matcher tries to extend elementary correspondences into complex correspondences. The variants are assessed with regard to 17 model pairs from Dutch municipalities. The according macro f-measures vary from for the first variant with stemming to for the second variant with a greedy search, but no post-processing.

Complex Mapping Discovery [Gater et al., 2010a]. Similar to [Dijkman et al., 2009] the approach in [Gater et al., 2010a] relies on a graph edit distance and is tailored to block-structured models. It is assumed that activities do not only possess a label, but that they are also annotated with input and output objects. Then, the set of potential elementary correspondences comprises all activity pairs with compatible input and output objects. Further, only pairs with a weighted similarity score that is sufficiently high are considered. This score takes labels as well as the input and output objects into account. Next, a set of 1:n-correspondence candidates is derived by composing elementary correspondences where the activites from one of the models need to occur in the same sequence, parallel, or exclusive block. Finally, the alignment comprises those elementary and 1:n-correspondence candidates that together optimize a graph edit distance. There are no evaluation results.

The ICoP framework [Weidlich et al., 2010]. The ICoP framework [Weidlich et al., 2010] is a configurable and extendable matching process. First, searchers

which rely on similarity measures and heuristics are used to identify correspondence candidates. Then,

boosters are employed to narrow down the candidate set and to unify similarity scores. Next, selectors construct the final set of correspondences by evaluating the similarity scores or by consulting an evaluator. Evaluators calculate overall scores for potential alignments and might rely on properties derived from the process models. Different configurations of the framework that comprise specific components for each of the four types are evaluated. In this regard, the 17 model pairs from [Dijkman et al., 2009] are reused and three additional model pairs are introduced. The matcher from [Dijkman et al., 2009] is used as a baseline. The f-measures for all matchers differ only marginally and are located close to a value of .

Summary-Based Process Model Matching [Gater et al., 2011]. Gater et al. [2011] extend their matcher from [Gater et al., 2010a]. Like the original matcher, the extension is tailored to block-structured models. The matcher starts with composing parallel and alternative blocks as well as sequences into a single activities. In this step, it also composes the labels as well as the input and output annotations. The alignment is the sub-set of these summarized candidates that maximizes a graph edit distance. Finally, the matcher aggregates unmatched activities into adjacent correspondences and decomposes m:n-correspondences into elementary and 1:n-correspondences. Two variants are compared based on 1,200 model pairs and achieve an overall f-measure of about .

Precise Mappings in Versioning Scenarios [Gerth et al., 2011]. An approach to match an original process model with two updated versions of this model is presented in [Gerth et al., 2011, Gerth, 2014]. When a version is introduced by copying the original, the alignment is automatically established and all updates of the version result in respective alignment updates. Yet, it is possible that the alignment between the original and the version does not contain all correspondences. Thus, it is completed by matching activities with similar labels and edges with corresponding sources and targets. Finally, the models are structurally decomposed into fragments whose descriptions are a combination of the labels of their activities. By comparing these descriptions corresponding fragments are detected. Alignments between the two versions are initially inferred from the alignments between the versions and the original. Then, they are completed following the outlined procedure. The effectiveness of this approach was not assessed.

Matching Processes Across Abstraction Layers [Branco et al., 2012]. The matcher from [Branco et al., 2012] first identifies all nodes with identical types and labels as elementary correspondences. Then, the models are decomposed into fragment hierarchies from which complex correspondences are derived through a top-down traversal. In this regard, for two fragments a syntactic similarity score over the union of the activity labels in the fragments is calculated. The technique achieves a macro f-measure of on 110 model pairs stemming from the Bank of Northeast Brazil. However, while the approach detects 400 of the 416 elementary correspondences, it only identifies 38 out of the 222 complex correspondences.

Semantic Process Model Matching [Leopold et al., 2012]. Labels typically comprise an action, a business object, and additional information. Thus, Leopold et al. [2012] decompose labels into these components and determine a weighted component similarity score per activity pair. To construct an alignment based on these scores, Markov logic networks are applied. In this regard, the construction process can be configured via different constraints that allow for including or excluding complex correspondences and for ensuring alignment consistency based on the execution semantics. The effectiveness is assessed with regard to the publicly available university admission dataset222, accessed: 5/4/2017. and compared to a configuration of the ICoP framework. Here, the ICoP framework performs slightly worse ( vs. ) than the best configuration of the proposed approach.

The Bag-of-Words Technique [Klinkmüller et al., 2013]. We introduced our purely label-based bag-of-words technique in [Klinkmüller et al., 2013]. For each possible activity pair it determines a similarity score by aggregating the similarity scores for the words in the activities’ labels. To account for differences in label specificity pruning can be used to eliminate words from the larger label before determining the similarity score. Finally, all activity pairs with a similarity score higher or equal to a threshold are suggested as corresponding. We evaluated the technique on the university admission dataset where it achieved a maximum f-measure of which is an improvement over the values from [Leopold et al., 2012]. Additionally, we analyzed the false positives and negatives to identify matching challenges. Here, we found that the correct interpretation of labels is primarily challenged by different levels of detail or abstraction, compound words as well as implicit objects and roles.

The Prediction of Matching Quality [Weidlich et al., 2013a]. The automatic selection of matchers is discussed in [Weidlich et al., 2013a]. Here, the idea is to correlate the matchers effectiveness to process model and activity properties based on a set of model pairs for which the ground truth is known. The derived prediction model can then be used to select matchers for unmatched model pairs. The authors suggest a set of properties for models and activities based on the labels and control flow information, but do not present an evaluation.

Matching Based on Positional Language Models [Weidlich et al., 2013b]. The matcher in [Weidlich et al., 2013b]

compares activities based on their labels and, if available, accompanying documenation. First, each model is transformed into a text document by traversing a structural decomposition of the model. Whenever an activity is reached during the traversal, a passage that consists of the label and the documentation is added to the text document. Next, for each term in the text documents the probability of occuring in a passage is determined. The probabilities are then used to compute similarity scores for the activity pairs and the most similar pairs are added to the alignment. Here, different selection strategies can be applied. The evaluation comprises four different sets of model pairs including those from

[Branco et al., 2012] and [Weidlich et al., 2010]. The f-measures for different variants vary from to on all sets.

The Process Model Matching Contest 2013 [Cayoglu et al., 2013]. In the first matching contest, the approaches from [Dijkman et al., 2009, Klinkmüller et al., 2013, Weidlich et al., 2010, 2013a] as well as three additional approaches participated. The Triple-S technique adds all activity pairs with a sufficiently high similarity score to the alignment. The similarity scores rely on a comparison of the labels as well as the number of incoming and outgoing edges. The RefMod-Mine/NSCM (RMM/NSCM) technique filters activities that potentially represent states or gateways through label analysis and calculates label-based similarity scores for all remaining activity pairs. It then determines correspondences by clustering all activities in a model collection. The RefMod-Mine/ESGM (RMM/ESGM) technique adapts the graph edit distance approach from [Dijkman et al., 2009]. It additionally incorporates dictionary lookups to compare labels and completes alignments by adding activity pairs with a similarity score higher than a predefined threshold to the alignment. The university admission [Leopold et al., 2012] and the also publicly available birth registration dataset333, accessed: 5/4/2017. are used in the evaluation. Here, all approaches yielded a low effectiveness with the highest f-measures at about .

Multi-Perspective Matching [Baumann et al., 2014a]. An extension of the matcher from [Dijkman et al., 2009] is presented in [Baumann et al., 2014a]. It is limited to process models that represent sequences and differs from the original matcher in that the activity similarity measure does not only consider labels. It also is based on the activities’ position in relation to all other correspondences, the ratio of data objects shared by the activities, and the roles responsible for the execution of the activities. The approach is demonstrated based on one exemplary model pair.

Resource-Aware Process Matching [Baumann et al., 2014b]. An extension of [Baumann et al., 2014a] is presented in [Baumann et al., 2014b]. Here, a more fine-grain comparison of the organizational perspective, i.e., the roles that are responsible for activity execution, is considered. The authors discuss practical limitations, but do not evaluate their matching technique.

Semantic Model Alignment [Fengel, 2014]. [Fengel, 2014]

introduces an approach that solely relies on activity labels. It comprises checks for label equality, shared words, synonyms, and negation words as well as a label based similarity. Based on the determined similarity scores, the activity pairs are classified as an exact, a close, a loose, or a low correspondence. Eight model pairs are used to evaluate the approach and a macro f-measure of

is reported.

Adpative Label-based Matching based on Expert Feedback [Klinkmüller et al., 2014]. In [Klinkmüller et al., 2014] we studied the adaptation of the matching process based on the analysis of expert feedback in terms of (in-)validated correspondences. First, we pursued the idea of learning a classifier from expert feedback that correlates activity similarity scores to the classes of corresponding and non-corresponding activity pairs. However, an empirical analysis based on the university admission and birth registration datasets revealed that similarity scores which are based on control flow properties do not possess a discriminative power that is high enough to separate corresponding from non-corresponding activity pairs. Thus, we examined the idea of improving the effectiveness of our bag-of-words technique by adjusting the word similarities based on expert feedback. We showed that this strategy leads to effectiveness improvements on both datasets of up to . In particular, we achieved an f-measure of on the birth registration and of on the university admission dataset.

Fast Discovery of Complex Matches [Ling et al., 2014]. Another variant of the matcher by Dijkman et al. [2009] is introduced in [Ling et al., 2014]. To identify correspondences the matcher considers all activities as well as activity sets that it derives from structural decompositions of the process models as correspondence candidates. Then, the greedy strategy from [Dijkman et al., 2009] in combination with a modified graph edit distance is used to determine the correspondences. The authors report an f-measure of achieved on 20 model pairs.

The Process Model Matching Contest 2015 [Antunes et al., 2015]. In the second matching contest, the matchers from [Weidlich et al., 2013b, Cayoglu et al., 2013] participated. Additionally, nine new techniques were submitted. The first matcher is an adapted version of the AML ontology matcher from [Faria et al., 2013]. It determines correspondences based on three label similarities. The KnoMa-Proc matcher first joins adjacent activities to determine complex correspondence candidates. From the union of the activities and the candidates an alignment is constructed based on a label-based confidence measure. The Match-SSS and the Know-Match-SSS compare the words in the labels to compute an overall label similarity score. Both approaches differ with regard to word similarities they rely on. The RefMod-Mine/VM2 (RMM/VM2) matcher suggests equally-labeled activity pairs, pairs whose labels contain similar words in a different order, and pairs with a high label similarity score that is based on word co-occurrences in the model pair. The RefMod-Mine/NCHM (RMM/NCHM) is an updated version of the RMM/NSCM technique from [Cayoglu et al., 2013] which contains an additional post-processing step to filter activity pairs with different roles. The RefMod-Mine/NLM (RMM/NLM) computes label similarities based on word relations in a dictionary and proposes activity pairs with a sufficiently high similarity score as correspondences. The RefMod-Mine/SMSL (RMM/SMSL) also investigates such word relations, but additionally optimizes the respective similarity scores based on the gold standard alignments. Lastly, the pPalm-DS matcher considers word occurrences in Wikipedia444 to assess the label similarity and to suggest correspondences. In addition to an updated version of the university admission and the original birth registration dataset, the contest evaluation comprised the asset management dataset555, accessed: 5/4/2017 which we made available to the contest. The best f-measure scores on each dataset rank in between and .

4 Analysis Results

Table 3 summarizes the results of our literature review as a matrix, where each publication is represented by a row. By following guidelines for inductive category formation [Mayring, 2000] we classified the analysis options for incorporating control flow information into the matching process. In total, we derived eight abstract analysis options which comprise most columns in the matrix. They are grouped into three use cases, and refer to a specific encoding. The first use case is to compare activities (Comp. Act.), either directly with regard to control flow properties or by incorporating structural context into the label similarity. Second, control flow information can be used to detect fragments (Det. Frag.), where activity sets are derived from the model structure and considered as candidates for complex correspondences. Third, matchers check the consistency (Check Consis.) to assess if control flow dependencies between activities in one model resemble those of the corresponding activities in another model. Per use case, there are up to three types of encoding: Matchers might analyze path relations in the process graph (G), properties of nested fragment hierarchies (H), or execution semantics (ES). The cells in the columns of Table 3 capture which abstract analysis options occur in which publications (occurring/not as “+”/“-”). Note that our identification did not require matchers to rely on control flow information, hence four of the papers contain none.

Comp. Act. Det. Frag. Check Consis. Paper
Source G H ES G H G H ES Class
[Brockmans et al., 2006] - - - - - - - - Illustrated
[Nejati et al., 2007] + - - - - - - + Compared
[Dijkman et al., 2009] - - - + - + - - Compared
[Gater et al., 2010a] - - - - - + - - Proposed
[Weidlich et al., 2010] + + - + + + + - Compared
[Gater et al., 2011] - - - + + + - - Compared
[Gerth et al., 2011, Gerth, 2014] - - - + + - - - Proposed
[Branco et al., 2012] - - - - + - - - Compared
[Leopold et al., 2012] - - - - - - - + Compared
[Cayoglu et al., 2013] + + - + + + + + Evaluated
[Klinkmüller et al., 2013] - - - - - - - - Compared
[Weidlich et al., 2013a] - - - - - - - - Proposed
[Weidlich et al., 2013b] + + - - - - - - Compared
[Baumann et al., 2014a] + - - + - + - - Illustrated
[Baumann et al., 2014b] - - - - - + - - Proposed
[Fengel, 2014] - - - - - - - - Evaluated
[Klinkmüller et al., 2014] + + + - - - - - Analyzed
[Ling et al., 2014] + - - - + + - - Evaluated
[Antunes et al., 2015] + - - - - + - - Evaluated
Table 3: Overview of options to integrate control flow information into the matching process and the empirical evidence

The last column characterizes the empirical evidence that is given towards the proposed matchers and thus towards the successful use of control flow information. As we are interested in the use of control flow information, we discard the four publications that solely exploit labels and focus on the 15 publications where control flow information is considered in the following. Three papers only propose matchers but provide no empirical evidence, and another paper uses one synthetic example to illustrate how the matcher is supposed to work. Among the remaining 11 publications, there are three papers [Cayoglu et al., 2013, Antunes et al., 2015, Ling et al., 2014] that evaluate matchers as black boxes. Such an evaluation assesses the effectiveness of entire matchers, but does not study the influence that the matchers’ components have on the effectiveness. The matcher in [Ling et al., 2014] comprises components that compute label similarity scores, investigate the graph neighborhood, detect fragments, and check the consistency. Clearly, the reported overall effectiveness allows no insights into the contribution of each component. Similarly, the contests [Cayoglu et al., 2013, Antunes et al., 2015] compare the effectiveness of various matchers, but they do not provide any insights on how the results of individual matchers are influenced by their components. Another seven papers compare the effectiveness of different matcher variants. However, as e.g., discussed in [Salzberg, 1997, Demšar, 2006], such results need to be interpreted with care and typically have a limited validity. That is because without further statistical analyses differences might have been observed simply by chance – especially as the reported difference are rather small, e.g., the f-measures in [Dijkman et al., 2009] differ by and in [Weidlich et al., 2010] by . Moreover, the results of all variants are typically dependent on a basic variant. This entails the risk that the relative performance of the variants and thus the contribution of the propositions changes, if the basic variant is modified. For example, consistency checks improve the effectiveness in [Leopold et al., 2012], whereas they reduce it in [Weidlich et al., 2010]. Finally, our prior work [Klinkmüller et al., 2014] is the only paper that explicitly analyzes the validity of control flow propositions – but only for the first use case.

Overall, the review revealed that despite the pervasive use of control flow information in the literature, the evidence towards the positive impact of exploiting such information is limited. Thus, to comprehensively assess the validity of the analysis options, additional analysis is warranted.

5 Summary

In this report, we examined publications that introduced process model matching techniques. In particular, we focused on understanding, if and how control flow information contributes to the identification of corresponding activities between process models. Our analysis revealed that there are three dominant use cases for the analysis of control flow information and that three different types of encodings are used. However, while many matchers rely on control flow information, the evidence given towards the successful contribution of this information to the matching process is limited. That is because the majority of the publications focused the effectiveness evaluation, but did not carefully study how the analysis of control flow information contributes towards the effectiveness. There is one notable exception. In our own work [Klinkmüller et al., 2014] we studied the comparison of activities with regard to control flow properties and revealed that such a comparison is not suited for informing matching techniques. Overall, this result shows that more research is needed in order to understand the effects of relying on control flow information.


  • Abbas and Seba [2012] Sonia Abbas and Hamida Seba. A module-based approach for structural matching of process models. In IEEE 5th International Conference on Service-Oriented Computing and Applications, pages 1–8, Taipei, Taiwan, 2012.
  • Akkiraju and Ivan [2010] Rama Akkiraju and Anca Ivan. 8th international conference on service-oriented computing. In Discovering Business Process Similarities: An Empirical Study with SAP Best Practice Business Processes, pages 515–526, San Francisco, CA, USA, 2010.
  • Antunes et al. [2015] Goncalo Antunes, Marzieh Bakhshandeh, Jose Borbinha, Joao Cardoso, Sharam Dadashnia, Chiara Di Francescomarino, Mauro Dragoni, Peter Fettke, Avigdor Gal, Chiara Ghidini, Philip Hake, Abderrahmane Khiat, Christopher Klinkmüller, Elena Kuss, Henrik Leopold, Peter Loos, Christian Meilicke, Tim Niesen, Catia Pesquita, Timo Peus, Andreas Schoknecht, Eitam Sheetrit, Andreas Sonntag, Heiner Stuckenschmidt, Tom Thaler, Ingo Weber, and Matthias Weidlich. The process model matching contest 2015. In Proceedings of the 6th Int. Workshop on Enterprise Modelling and Information Systems Architectures, pages 127–155. GI, 2015.
  • Baumann et al. [2014a] Michael Heinrich Baumann, Michaela Baumann, Stefan Schönig, and Stefan Jablonski. Towards multi-perspective process model similarity matching. In Enterprise and Organizational Modeling and Simulation: 10th International Workshop, EOMAS 2014, Held at CAiSE 2014, Thessaloniki, Greece, June 16-17, 2014, Selected Papers, pages 21–37. Springer, 2014a.
  • Baumann et al. [2014b] Michaela Baumann, Michael Heinrich Baumann, Stefan Schönig, and Stefan Jablonski. Resource-aware process model similarity matching. In Service-Oriented Computing - ICSOC 2014 Workshops: WESOA; SeMaPS, RMSOC, KASA, ISC, FOR-MOVES, CCSA and Satellite Events, Paris, France, November 3-6, 2014, Revised Selected Papers, pages 96–107. Springer, 2014b.
  • Beheshti et al. [2016] Seyed-Mehdi-Reza Beheshti, Boualem Benatallah, Sherif Sakr, Daniela Grigori, Hamid Reza Motahari-Nezhad, Moshe Chai Barukh, Ahmed Gater, and Seung Hwan Ryu. Process Analytics: Concepts and Techniques for Querying and Analyzing Process Data. Springer, Cham, Switzerland, 2016.
  • Belhoul et al. [2012] Yacine Belhoul, Mohammed Haddad, Eric Duchene, and Hamamache Kheddouci. String comparators based algorithms for process model matchmaking. In IEEE 9th International Conference on Services Computing, pages 649–656, Honolulu, Hawaii, USA, 2012.
  • Belhoul et al. [2013] Yacine Belhoul, Mohammed Haddad, Ahmed Gater, Daniela Grigori, Hamamache Kheddouci, and Mokrane Bouzeghoub. Spectral graph approach for process model matchmaking. In IEEE 10th International Conference on Services Computing, pages 408–415, Santa Clara, California, 2013.
  • Belhoul et al. [2015] Yacine Belhoul, Said Yahiaoui, Mohammed Haddad, Ahmed Gater, Hamamache Kheddouci, and Mokrane Bouzeghoub. A graph approach for enhancing process models matchmaking. In IEEE 12th International Conference on Services Computing, pages 773–776, New York, NY, USA, 2015.
  • Branco et al. [2012] Moisés Castelo Branco, Javier Troya, Krzysztof Czarnecki, Jochen Küster, and Hagen Völzer. Matching business process workflows across abstraction levels. In Proceedings of the 15th International Conference on Model Driven Engineering Languages and Systems, pages 626–641. Springer, 2012.
  • Brockmans et al. [2006] Saartje Brockmans, Marc Ehrig, Agnes Koschmider, Andreas Oberweis, and Rudi Studer. Semantic alignment of business processes. In Proceedings of the 8th Int. Conference on Enterprise Information Systems, pages 191–196. INSTICC Press, 2006.
  • Cayoglu et al. [2013] Ugur Cayoglu, Remco Dijkman, Marlon Dumas, Peter Fettke, Luciano García-Bañuelos, Philip Hake, Christopher Klinkmüller, Henrik Leopold, André Ludwig, Peter Loos, Jan Mendling, Andreas Oberweis, Andreas Schoknecht, Eitam Sheetrit, Tom Thaler, Meike Ullrich, Ingo Weber, and Matthias Weidlich. Report: The process model matching contest 2013. In Business Process Management Workshops, Beijing, China, August 26, 2013, Revised Paper, pages 442–463. Springer, 2013.
  • Chan et al. [2012] Nguyen Ngoc Chan, Walid Gaaloul, and Samir Tata. Assisting business process design by activity neighborhood context matching. In 10th International Conference on Service-Oriented Computing, pages 541–549, Shanghai, China, 2012.
  • Dahman et al. [2013] Karim Dahman, Francois Charoy, and Claude Godart. Alignment and change propagation between business processes and service-oriented architectures. In IEEE 10th International Conference on Services Computing, pages 168–175, Santa Clara Marriott, CA, USA, 2013.
  • Demšar [2006] Janez Demšar. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1–30, 2006.
  • Deutch and Milo [2009] Daniel Deutch and Tova Milo. Evaluating top-k queries over business processes. In 25th International Conference on Data Engineering, pages 1195–1198, Shanghai, China, 2009.
  • Dijkman et al. [2009] Remco Dijkman, Marlon Dumas, Luciano Garcia-Banuelos, and Reina Kaarik. Aligning business process models. In 2009 IEEE International Enterprise Distributed Object Computing Conference, pages 45–53. IEEE, 2009.
  • Dijkman et al. [2011] Remco M. Dijkman, Beat Gfeller, Jochen Küster, and Hagen Völzer. Identifying refactoring opportunities in process model repositories. Information and Software Technology, 53(9):937–948, 2011.
  • Euzenat and Shvaiko [2013] Jérôme Euzenat and Pavel Shvaiko. Ontology Matching. Springer, Berlin, 2013.
  • Faria et al. [2013] Daniel Faria, Catia Pesquita, Emanuel Santos, Isabel F. Cruz, and Francisco M. Couto. Agreement maker light results for oaei 2013. In 8th International Conference on Ontology Matching, pages 101–108, 2013.
  • Fengel [2014] Janina Fengel. Semantic technologies for aligning heterogeneous business process models. Business Process Management Journal, 20(4):549–570, 2014.
  • Gacitua-Decar and Pahl [2009] Veronica Gacitua-Decar and Claus Pahl.

    Automatic business process pattern matching for enterprise services design.

    In 4th International Workshop on Service- and Process-Oriented Software Engineering, pages 111–118, Bangalore, India, 2009.
  • Gacitua-Decar and Pahl [2010] Veronica Gacitua-Decar and Claus Pahl. Towards reuse of business processes patterns to design services. In Walter Binder and Schahram Dustdar, editors, Emerging Web Services Technology Volume III, pages 15–36. Birkhäuser Basel, Basel, Switzerland, 2010.
  • Gao and Zhang [2009] Juntao Gao and Li Zhang. On measuring semantic similarity of business process models. In 5th International Conference on Interoperability for Enterprise Software and Applications, pages 289–293, Beijing, China, 2009.
  • Gater et al. [2011] A. Gater, D. Grigori, M. Haddad, M. Bouzeghoub, and H. Kheddouci. A summary-based approach for enhancing process model matchmaking. In 2011 IEEE International Conference on Service-Oriented Computing and Applications (SOCA), pages 1–8. IEEE, 2011.
  • Gater et al. [2010a] Ahmed Gater, Daniela Grigori, and Mokrane Bouzeghoub. Complex mapping discovery for semantic process model alignment. In Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services, pages 317–324. ACM, 2010a.
  • Gater et al. [2010b] Ahmed Gater, Daniela Grigori, and Mokrane Bouzeghoub. Owl-s process model matchmaking. In 2010 IEEE International Conference on Web Services, pages 640–641, Miami, FL, USA, 2010b.
  • Gerth [2014] Christian Gerth. Business Process Models. Change Management. Springer, Heidelberg, Germany, 2014.
  • Gerth et al. [2011] Christian Gerth, Markus Luckey, Jochen M. Küster, and Gregor Engels. Precise mappings between business process models in versioning scenarios. In 2011 IEEE International Conference on Services Computing, pages 218–225. IEEE, 2011.
  • Jung [2009] Jason J. Jung. Semantic business process integration based on ontology alignment. Expert Systems with Applications, 36(8):11013–11020, 2009.
  • Kacimi and Tari [2014] Farid Kacimi and Abdelkamel Tari. Vectorial signature for matching business process graphs. In International Conference on Advanced Networking Distributed Systems and Applications, pages 93–98, Bejaia, Algeria, 2014.
  • Kim and Suhh [2010] Gunwoo Kim and Yongmoo Suhh. Ontology-based semantic matching for business process management. SIGMIS Database, 41(4):98–118, 2010.
  • Klinkmüller [2017] Christopher Klinkmüller. Adaptive Process Model Matching – Improving the Effectiveness of Label-based Matching Through Automated Configuration and Expert Feedback. PhD thesis, Leipzig University, Leipzig, 4/ 2017.
  • Klinkmüller et al. [2013] Christopher Klinkmüller, Ingo Weber, Jan Mendling, Henrik Leopold, and André Ludwig. Increasing recall of process model matching by improved activity label matching. In Business Process Management (BPM): 11th International Conference, Beijing, China, August 26-30, 2013. Proceedings, pages 211–218. Springer, 2013.
  • Klinkmüller et al. [2014] Christopher Klinkmüller, Henrik Leopold, Ingo Weber, Jan Mendling, and André Ludwig. Listen to me: Improving process model matching through user feedback. In Business Process Management (BPM): 12th International Conference, Haifa, Israel, September 7-11, 2014. Proceedings, pages 84–100. Springer, 2014.
  • La Rosa et al. [2015] Marcello La Rosa, Marlon Dumas, Chathura C. Ekanayake, Luciano García-Bañuelos, Jan Recker, and ter Hofstede, Arthur H.M. Detecting approximate clones in business process model repositories. Information Systems, 49:102–125, 2015.
  • Lei et al. [2007] Lihui Lei, Zhunhua Duan, and Bin Yu. Semantic matching of web services for collaborative business processes. In 10th International Conference on Computer Supported Cooperative Work in Design, pages 479–488, Nanjing, China, 2007.
  • Leopold et al. [2012] Henrik Leopold, Mathias Niepert, Matthias Weidlich, Jan Mendling, Remco M. Dijkman, and Heiner Stuckenschmidt. Probabilistic optimization of semantic process model matching. In International Conference on Business Process Management, pages 319–334. Springer, 2012.
  • Ling et al. [2014] Jimin Ling, Li Zhang, and Qi Feng. Business process model alignment: An approach to support fast discovering complex matches. In Kai Mertins, Frédérick Bénaben, Raúl Poler, and Jean-Paul Bourrières, editors, Enterprise Interoperability VI, pages 41–51. Springer International, 2014.
  • Manning and Schütze [1999] Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA, 1999.
  • Manning et al. [2008] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, 2008.
  • Mayring [2000] Philipp Mayring. Qualitative content analysis. Forum Qual. Soc. Res., 1(2), 2000.
  • Nejati et al. [2007] S. Nejati, M. Sabetzadeh, M. Chechik, S. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In 29th International Conference on Software Engineering, pages 54–64. IEEE, 2007.
  • Niedermann et al. [2010] F. Niedermann, S. Radeschutz, and B. Mitschang. Design-time process optimization through optimization patterns and process model matching. In 12th IEEE Conference on Commerce and Enterprise Computing, pages 48–55, Shanghai, China, 2010.
  • Rahm and Bernstein [2001] Erhard Rahm and Philip A. Bernstein. A survey of approaches to automatic schema matching. VLDB J., 10(4):334–350, 2001.
  • Sakr and Awad [2010] Sherif Sakr and Ahmed Awad. A framework for querying graph-based business process models. In 19th International Conference on World Wide Web, pages 1297–1300, Raleigh, NC, USA, 2010.
  • Salzberg [1997] Steven L. Salzberg. On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1(3):317–328, 1997.
  • Sebu and Ciocârlie [2015] Maria Laura Sebu and Horia Ciocârlie. Business process similarity metric supporting one-to-many relationship. In IEEE 10th Jubilee International Symposium on Applied Computational Intelligence and Informatics, pages 429–435, Timisoara, Romania, 2015.
  • Suwannopas and Senivongse [2006] Piya Suwannopas and Twittie Senivongse. Discovering semantic web services with process specifications. In 6th International IFIP Conference on Distributed Applications and Interoperable Systems, pages 113–127, Bologna, Italy, 2006.
  • Ternai et al. [2015] Katalin Ternai, Marjan Khobreh, and Fazel Ansari. An ontology matching approach for improvement of business process management. In Madjid Fathi, editor, Integrated Systems: Innovations and Applications, pages 111–130. Springer, Cham, Switzerland, 2015.
  • Tonella and Di Francescomarino [2010] Paolo Tonella and Chiara Di Francescomarino. Supporting ontology-based semantic annotation of business processes with automated suggestions. International Journal of Information System Modeling and Design, 1(2):59–84, 2010.
  • Tsagkani [2014] Christina Tsagkani. Graph-based process model matching. In Doctoral Consortium at the 12th International Conference on Business Process Management, pages 573–577, Eindhoven, The Netherlands, 2014.
  • vom Brocke et al. [2009] Jan vom Brocke, Alexander Simons, Bjoern Niehaves, Bjorn Niehaves, Kai Reimer, Ralf Plattfaut, and Anne Cleven. Reconstructing the giant: On the importance of rigour in documenting the literature search process. In 17th European Conference on Information Systems, ECIS 2009, Verona Italy, 2009, pages 2206–2217. AISeL, 2009.
  • Weidlich et al. [2010] Matthias Weidlich, Remco Dijkman, and Jan Mendling. The icop framework: Identification of correspondences between process models. In Advanced Information Systems Engineering: 22nd International Conference, CAiSE 2010, Hammamet, Tunisia, June 7-9, 2010. Proceedings, pages 483–498. Springer, 2010.
  • Weidlich et al. [2013a] Matthias Weidlich, Tomer Sagi, Henrik Leopold, Avigdor Gal, and Jan Mendling. Predicting the quality of process model matching. In Business Process Management (BPM): 11th International Conference, Beijing, China, August 26-30, 2013. Proceedings, pages 203–210. Springer, 2013a.
  • Weidlich et al. [2013b] Matthias Weidlich, Eitam Sheetrit, Moisés C. Branco, and Avigdor Gal. Matching business process models using positional passage-based language models. In Conceptual Modeling: 32th International Conference, ER 2013, Hong-Kong, China, November 11-13, 2013. Proceedings, pages 130–137. Springer, 2013b.
  • Wombacher et al. [2003] A. Wombacher, P. Fankhauser, B. Mahleko, and E. Neuhold. Matchmaking for business processes. In IEEE International Conference on E-Commerce, pages 7–11, Newport Beach, CA, USA, 2003.
  • Wombacher et al. [2004] A. Wombacher, P. Fankhauser, B. Mahleko, and E. Neuhold. Matchmaking for business processes based on choreographies. In IEEE International Conference on e-Technology, e-Commerce and e-Service, pages 359–368, Taipei, Taiwan, 2004.
  • Zhu and Pung [2009] J. Zhu and H. K. Pung. Process matching: A structural approach for business process search. In Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns, pages 227–232, Athens, Greece, 2009.
  • Zhuge [2002] Hai Zhuge. A process matching approach for flexible workflow process reuse. Information and Software Technology, 44(8):445–450, 2002.