Learning Domain-Specific Edit Operations from Model Repositories with Frequent Subgraph Mining

08/02/2021 ∙ by Christof Tinnes, et al. ∙ Siemens AG Universität Saarland Humboldt-Universität zu Berlin 0

Model transformations play a fundamental role in model-driven software development. They can be used to solve or support central tasks, such as creating models, handling model co-evolution, and model merging. In the past, various (semi-)automatic approaches have been proposed to derive model transformations from meta-models or from examples. These approaches require time-consuming handcrafting or recording of concrete examples, or they are unable to derive complex transformations. We propose a novel unsupervised approach, called Ockham, which is able to learn edit operations from model histories in model repositories. Ockham is based on the idea that meaningful edit operations will be the ones that compress the model differences. We evaluate our approach in two controlled experiments and one real-world case study of a large-scale industrial model-driven architecture project in the railway domain. We find that our approach is able to discover frequent edit operations that have actually been applied. Furthermore, Ockham is able to extract edit operations in an industrial setting that are meaningful to practitioners.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Software and systems become increasingly complex. Various languages, methodologies, and paradigms have been developed to tackle this complexity. One widely-used methodology is Model-Driven Engineering (MDE) [52], which uses models as first class entities and facilitates generating documentation and (parts of the) source code from these models. Usually, Domain-Specific Modeling Languages are used and tailored to the specific needs of a domain. This reduces the cognitive distance between the domain and the used language. A key ingredient of many tasks and activities in MDE are model transformations [57].

In this paper, we focus on edit operations as an important subclass of model transformations. An edit operation is an in-place model transformation and usually represents regular evolution [63] of the models. For example, when moving a method from one class to another in a class diagram, also a sequence diagram that uses the method in message calls between object lifelines needs to be adjusted (e.g., by changing the receiver of a message accordingly). To perform this in a single edit step, one can create an edit operation that executes the entire change, including the class and sequence diagram changes. Some tasks can even be completely automatized and reduced to the definition of edit operations. Edit operations are used for model repair, quick-fix generation, auto completion [48, 24, 39], model editors [62, 19], operation-based merging [38], model refactoring [17, 4], model optimization [12], meta-model evolution and model co-evolution [53, 3, 25], artifact co-evolution in general [21, 41], semantic lifting of model differences [33, 32, 8, 42, 37], model generation [50], and many more.

In general, there are two main problems involved in the specification of model transformations than can be used as edit operations. Firstly, creating the necessary transformations for the task and the Domain-Specific Modeling Languages at hand using a dedicated transformation language requires a deep knowledge of the Domain-Specific Modeling Language’s meta-model and the underlying paradigm of the transformation language. It might even be necessary to define project-specific edit operations, which causes a large overhead for many projects or tool providers [31, 17, 30]. Secondly, for some tasks, the domain-specific transformations are only a form of tacit knowledge [51], and it will be hard for domain experts to externalize this knowledge.

Because, on the one hand, model transformations play such a central role in MDE, but, on the other hand, it’s not easy to specify them, attempts have been made to support their manual creation or even (semi-)automated generation. As for manual support, visual assistance tools [5] and transformation languages derived from a modeling language’s concrete syntax [1, 26] have been proposed to release domain experts from the need of stepping into the details of meta-models and model transformation languages. However, they still need to deal with the syntax and semantics of certain change annotations, and edit operations must be specified in a manual fashion. To that end, generating edit operations automatically from a given meta-model has been proposed [34, 44]. However, besides elementary consistency constraints and basic well-formedness rules, meta-models do not convey any domain-specific information on how models are edited. Thus, the generation of edit operations from meta-models is limited to rather primitive operations as a matter of fact. Following the idea of model transformation by-example (MTBE) [11, 61, 30], initial sketches of more complex and domain-specific edit operations can be specified using standard model editors as a macro recorder. However, these sketches require manual post-processing to be turned into general specifications, mainly because an initial specification is derived from only a single transformation example. Some MTBE approaches [31, 17] aim at getting rid of this limitation by using a set of transformation examples as input which are then generalized into a model transformation rule. Still, this is a supervised approach which requires sets of dedicated transformation examples that need to be defined by domain experts in a manual fashion. As discussed by Kehrer et al. [31], a particular challenge is that domain experts need to have, at least, some basic knowledge on the internal processing of the MTBE tool in order to come up with a reasonable set of examples. Moreover, if only a few examples are used as input for learning, Mokaddem et al. [17] discuss how critical it is to carefully select and design these examples.

To address these limitations of existing approaches, we propose a novel unsupervised approach, Ockham, for mining edit operations from existing models in a model repository, which is typically available in large-scale modeling projects (cf. Section II). Ockham is based on an Occam’s razor argument, that is, the “useful” edit operations are the ones that “compress” the model repository. In a first step, Ockham discovers frequent change patterns using frequent subgraph mining on a labeled graph representation of model differences. It then uses a compression metric to filter and rank these patterns. We evaluate our approach using two experiments with simulated data and one real-world large-scale industrial case study from the railway domain. In the simulated cases, we can show that Ockham is able to discover the edit operations that have been actually applied in the simulation, even when we apply some “perturbation”. In the real-world case study, we find that our approach is able to scale to real-world model repositories and to derive edit operations. We evaluated Ockham by comparing the results to randomly generated edit operations in five interviews with practitioners of the product line. We find that the edit operations represent typical edit scenarios and are meaningful to the practitioners.

In a summary, we make the following contributions:

  • We propose an unsupervised approach based on frequent subgraph mining to derive edit operations out of model repositories, without requiring any further information (e.g., labeling).

  • We evaluate our approach empirically based on two controlled simulated experiments and show that the approach is able to discover the actually applied edit operations.

  • We evaluate the approach using an interview with five experienced system engineers and architects from a real-world industrial setting in the railway domain with more than 200 engineers, 300GB of artifacts and more than 6 years of modeling history. We show that our approach is able to detect meaningful edit operations in this industrial setting and to scale to real-world repositories.

Ii Motivation: An Industrial Scenario

Our initial motivation to automatically mine edit operations from model repositories arose from a long-term collaboration with practitioners from a large-scale industrial model-driven software product line in the railway domain. The modeling is done in MagicDraw [27] using SysML, and there is an export to the Eclipse Modeling Framework (EMF) which focuses on the SysML parts required for subsequent MDE activities (e.g., code generation). Modeling tools such as MagicDraw come with support for model versioning. In our case, the models are versioned in the MagicDraw Teamwork Server. We therefore have access to a large number of models and change scenarios.

During discussing major challenges with the engineers of the product line, we observed that some model changes appear very often together in this repository. For example, when the architect creates an interface between two components, s/he will usually add some Ports to Components and connect them via the ConnectorEnds of a Connector. Expressed in terms of the meta-model, there are 17 changes to add this interface. We are therefore interested if we can automatically detect these patterns in the model repository. More generally, our approach, Ockham, is based on the assumption that it should be possible to derive “meaningful” patterns from the repositories.

These patterns could then be used for many applications [48, 39, 62, 4, 21, 32, 8, 42, 37]. In our case study, the models have become huge over time (approx. 1.2 million elements split into 100 submodels) and also model differences between different products have become huge (up to 190.000 changes in a single submodel). The analysis of these differences, e.g., for quality assurance of the models, or domain analysis has become time-consuming. To speed-up the analysis of the model differences, it would be desirable to reduce the “perceived” size of the model difference by grouping fine-grained differences to higher-level, more coarse-grained and more meaningful changes. For this semantic lifting of model differences, the approach by Kehrer et al. [33], which uses a set of edit operations as configuration input, can be used. These large model differences have actually been our main motivation to investigate how we can derive the required edit operations (semi-)automatically.

We will use the data from this real-world project to evaluate Ockham in Section V.

Iii Background

In this section, we provide basic definitions that are important to understand our approach presented in Section IV.

Iii-a Graph theory

As usual in MDE, we assume that a meta-model specifies the abstract syntax and static semantics of a modeling language. We conceptually consider a model as a typed graph (aka. abstract syntax graph), in which the types of nodes and edges are drawn from the meta-model. Figure 1 illustrates how a simplified excerpt from an architectural model of our case study from Section II in concrete syntax is represented in its abstract syntax, typed over the given meta-model.

Fig. 1: We consider models as labeled graphs, where labels represent types of nodes and edges defined by a meta model. For the sake of brevity, types of edges are omitted in the figure.

Since we further assume models to be correctly typed, in our notion of a graph used throughout the paper, we abstain from a formal definition of typing using type graphs and type morphisms [10]. Instead, to keep our basic definitions as simple as possible, we work with a variant of labeled graphs where a fixed label alphabet represents node and edge type definitions of a meta-model. Given a label alphabet , a labeled directed graph is a tuple , where is a finite set of nodes, is a subset of , called the edge set, and is the labeling function, which assigns a label to nodes and edges. If we are only interested in the structure of a graph and typing is irrelevant, we will omit the labeling and only refer to the graph as .

Given two graphs and , is called a subgraph of , written , if , , and for each . A (weakly) connected component (component, for short) is an induced subgraph of in which every two vertices are connected by a path, that is, , where is the set of all reversed edges, that is, becomes .

Iii-B Frequent Subgraph Mining

We will use frequent subgraph mining as the main ingredient for Ockham. We distinguish between graph-transaction-based frequent subgraph mining and single-graph-based frequent subgraph mining. Graph-transaction-based frequent subgraph mining uses a collection (aka. database) of graphs, while single-graph-based frequent subgraph mining looks for subgraphs of a single graph. We are considering graph-transaction-based frequent subgraph mining in this work. A subgraph mining algorithm typically takes a database of graphs and a threshold as input. It then outputs all the subgraphs with, at least, occurrences in the database. An overview of the frequent subgraph mining algorithms can be found in the literature [28]. A general introduction to graph mining is given by Cook and Holder [13], who also proposed a compression-based subgraph miner called Subdue [36]. Subdue has also been one of our main inspirations for a compression-based approach. Ockham is based on Gaston [47], which mines frequent subgraphs by first focusing on frequent paths, then extending to frequent trees, and finally extending the trees to cyclic graphs.

Iii-C Model Transformations and Edit Operations

The goal of Ockham is to learn domain-specific edit operations from model histories. In general, edit operations can be informally understood as editing commands which can be applied to modify a given model. In turn, a difference between two model versions can be described as a (partially) ordered set of applications of edit operations, transforming one model version into the other. Comparing two models can thus be understood as determining the edit operation applications that transform one model into the other. A major class of edit operations are model refactorings, which induce syntactical changes without changing a models’ semantics. Other classes of edit operations are recurring bug fixes and evolutionary changes (i.e., adding new functionality).

In the classification given by Visser et al. [63], edit operations can describe regular evolution [63], that is, “the modeling language is used to make changes”, but are not meant to describe meta-model evolution, platform evolution or abstraction evolution. More technically, in Mens et al.’s taxonomy [45]

, edit operations can be classified as endogenous (i.e., source and target meta-model are equal), in-place (i.e., source and target model are equal) model transformations. For the purpose of this paper, we define an edit operation as an in-place model transformation which represents regular model evolution.

The model transformation tool Henshin [3] supports the specification of in-place model transformations in a declarative manner. It is based on graph transformation concepts [18], and provides a visual language for the definition of transformation rules, which is used, e.g., in the last step of Figure 2. Roughly speaking, transformation rules specify graph patterns to be found and created or deleted.

Iv Approach

We address the problem of automatically identifying edit operations from a graph mining perspective. As already discussed in Section III, we will work with labeled graphs instead of typed graphs. There are some limitations related to this decision, which are discussed in Section VI-A.

Ockham consists of the five steps illustrated with a running example in Figure 2 and outlined below. Our main technical contributions are Step 2 and Step 4. For Step 1, Step 3, and Step 5 we apply existing tooling: SiDiff, Gaston, and Henshin.

Fig. 2: The 5 step process for mining edit operations.

Step 1: Compute Structural Model Differences: To learn a set of edit operations in an unsupervised manner, Ockham analyzes model changes which can be observed in a model’s development history. In this first step, for every pair of successive model versions and in a given model history, we calculate a structural model difference to capture these changes. Since we do not assume any information (e.g., persistent change logs) to be maintained by a model repository, we use a state-based approach to calculate a structural difference , which proceeds in two steps [35]. First, the corresponding model elements in the model graphs and are determined using a model matcher [40]. Second, the structural changes are derived from these correspondences: All the elements in that do not have a corresponding partner in are considered to be deleted, whereas, vice versa, all the elements in that do not have a corresponding partner in are considered to be created.

For further processing in subsequent steps, we represent a structural difference in a graph-based manner, referred to as difference graph [48]. A difference graph is constructed as a unified graph over and . That is, corresponding elements being preserved by an evolution step from version to appear only once in (indicated by the label prefix “preserved_”), while all other elements that are unique to model and are marked as deleted and created, respectively (indicated by the label prefixes “delete_” and “create_”).

To give an illustration, assume that the architectural model shown in Figure 1 is the revised version of a version by adding the ports along with the connector and its associated requirement. Figure 2 illustrates a matching of the abstract syntax graphs of the model versions and . For the sake of brevity, only correspondences between nodes in and are shown in the figure, while two edges are corresponding when their source and target nodes are in a correspondence relationship. The derived difference graph is illustrated in Figure 2. For example, the corresponding nodes of type Component occur only once in , and the nodes of type Port are indicated as being created in version . Our implementation is based on the Eclipse Modeling Framework. We use the tool SiDiff [55] to compute structural model differences. Our requirements on the model differencing tool are support for EMF, the option to implement a custom matcher, and an approach to semantically lift model differences based on a set of given edit operations. Modeling tools such as MagicDraw usually provide IDs for every model element, which can be used by a custom matcher to calculate matches based on existing IDs. We intend to use the semantic lifting approach for the compression of differences in the project from Sec. II. Other tools such as EMFCompare could also be used for the computation of model differences and there are no other criteria to favour one over the other. An overview of the different matching options is given by Kolovos et al. [40]; a survey of model comparison approaches is given by Stephan and Cordy [60].

Step 2: Derive Simple Change Graphs: Real-world models maintained in a model repository, such as the architectural models in our case study, can get huge. It is certainly fair to say that, compared to a model’s overall size, only a small number of model elements is actually subject to change in a typical evolution step. Thus, in the difference graphs obtained in the first step, the majority of difference graph elements represent model elements that are simply preserved. To that end, before we continue with the frequent subgraph mining in step 3, in step 2, difference graphs are reduced to simple change graphs (SCGs) based on the principle of locality relaxation that only changes that are “close” to each other can result from the application of a single edit operation. We discuss the implications of this principle in Section VI-A. By “close”, we mean that the respective difference graph elements representing a change must be directly connected (i.e., not only through a path of preserved elements). Conversely, this means that changes being represented by elements that are part of different connected components of a simple change graph are independent of each other (i.e., they are assumed to result from different edit operation applications).

More formally, given a difference graph , a simple change graph is derived from in two steps. First, we select all the elements in representing a change (i.e., nodes and edges that are labeled as “delete_*” and “create_*”, respectively). In general, this selection does not yield a graph, but just a graph fragment , which may contain dangling edges when the source or target node of a changed edge is a preserved node not included in . In a second step, these preserved nodes are also selected to be included in the simple change graph. Formally, the simple change graph is constructed as the boundary graph of , which is the smallest graph completing to a graph [35]. The derivation of a simple change graph from a given difference graph is illustrated in the second step of Figure 2. In this example, the simple change graph comprises only a single connected component. In a realistic setting, however, a simple change graph typically comprises a larger set of connected components, like the one illustrated in step 3 of Figure 2.

Step 3: Apply Frequent Connected Subgraph Mining: When we apply the first two steps to a model history, we obtain a set of simple change graphs , where N is the number of revisions in the repository. In this set, we want to identify recurring patterns and therefore find some frequent connected subgraphs. A small support threshold might lead to a huge number of frequent subgraphs. This not only causes large computational effort but also makes it difficult to find the relevant subgraphs. As it would be infeasible to recompute the threshold manually for every dataset, we pre-compute it by running an approximate frequent subtree miner for different thresholds up to some fixed size of the frequent subtrees. We fix the range of frequent trees and adjust the threshold accordingly. Alternatively, a relative threshold could be used, but we found in a pilot study that our pre-computation works better in terms of average precision. We discuss the effect of the support threshold in Section V-C. Then, we run the frequent subgraph miner for the threshold found via the approximate tree miner. Step 3 of Figure 2 shows this for our running example. We start with a set of connected components and the graph miner returns a set of frequent subgraphs, namely with . We use Gaston [47] graph miner, since it performed best (in terms of runtime) among the miners that we experimented with (gSpan, Gaston and DIMSpan) in a pilot study. In our pilot study, we ran the miners on a small selection of our datasets and experimented with the parameters of the miners. For many datasets, gSpan and DIMSpan did not terminate at all (we canceled the execution after 48h). Gaston (with embedding lists) was able to terminate in less then 10s on most of our datasets but consumes a lot of memory, typically between 10GB-25GB, which was not a problem for our 32GB machine in the pilot study. To rule out any effects due to approximate mining, we considered only exact miners. Therefore, we also could not use Subdue [36], which directly tries to optimize compression. Furthermore, Subdue was not able to discover both edit operations in the second experiment (see Section V), without iterative mining and allowing for overlaps. Enabling these two options, Subdue did not terminate on more than 75% of the pilot study datasets. For frequent subtree mining, we use Hops [65] because it provides low error rates and good runtime guarantees.

Step 4: Select the most relevant subgraphs: Motivated by the minimum description length principle, which has been successfully applied to many different kinds of data [23], the most relevant patterns should not be the most frequent ones but the ones that give us a maximum compression for our original data [15]. That is, we want to express the given SCGs by a set of subgraphs with the property that the description length for the subgraphs together with the length of the description of the SCGs in terms of the subgraphs becomes minimal. This can be understood by looking at the corner cases. A single change has a large frequency but is typically not interesting. The entire model difference is large in terms of changes but has a frequency of only one and is typically also not an interesting edit operation. “Typical edit operations” are therefore somewhere in the middle. We will use our experiments in Section V to validate whether this assumption holds. We define the compression value by where is the support of in our set of input graphs (i.e., the number of components in which the subgraph is contained). The “” in the definition of the compression value comes from the intuition that we need to store the definition of the subgraph, in order to decompress the data again. The goal of this step is to detect the subgraphs from the previous step with a high compression value. The subgraphs are organized in a subgraph lattice, where each graph has pointers to its direct subgraphs. Most of the subgraph miners already compute a subgraph lattice, so we do not need a subgraph isomorphism test here. Due to the downward closure property of the support, all subgraphs of a given (sub-)graph have, at least, the same frequency (in transaction-based mining). When sorting the output, we need to take this into account, since we are only interested in the largest possible subgraphs for some frequency. We therefore prune the subgraph lattice. The resulting list of recommendations is then sorted according to the compression value. Other outputs are conceivable, but in terms of evaluation, a sorted list is typical for a recommender system [56].

More technically, let be the set of subgraphs obtained from step 3. We then remove all the graphs in the set

Our list of recommendations is then , sorted according to the compression metric.

For our running example in step 4 of Figure 2, assume that the largest subgraph occurs 15 times (without overlaps). Even though the smaller subgraph occurs twice as often, we find that provides the best compression value and is therefore ranked first. The subgraph will be pruned, since it has the same support as its supergraph but a lower compression value. We implement the compression computation and pruning using the NetworkX Python library.

Step 5: Generate edit operations: As a result of step 4, we have an ordered list of “relevant” subgraphs of the SCGs. We need to transform these subgraphs into model transformations that specify our learned edit operations. As shown in step 5 of Figure 2, the subgraphs can be transformed to Henshin transformation rules in a straightforward manner. We use Henshin because it is used for the semantic lifting approach in our case study from Sec. II. In principle, any transformation language that allows us to express endogenous, in-place model transformations could be used instead. A survey of model transformation tools is given by Kahani et al. [29].

V Evaluation

V-a Research Questions

We evaluate Ockham w.r.t. the following research questions:

  • RQ 1: Is the approach able to identify edit operations that have actually been applied in model repositories? If we apply some operations to models, the approach should be able to discover these from the data. Furthermore, when different edit operations are applied and overlap, it should be possible to discover them.

  • RQ 2: Is the approach able to find typical edit operations or editing scenarios in a real-world setting? Compared to the first research question, the approach should also be able to find typical scenarios in practice when we do not know which operations have been actually applied to the data. Furthermore, it should be possible to derive these edit operations in a real-world setting with large models and complex meta-models.

  • RQ 3: What are the main drivers for the approach to work or fail? We want to identify the characteristics of the input data or parameters having a major influence on the approach.

  • RQ 4: What are the main parameters for the performance of the frequent subgraph mining? Frequent subgraph mining has a very high computational complexity for general cyclic graphs. We want to identify the characteristics of the data that influence the mining time.

For RQ 1, we want to rediscover the edit operations from our ground truth, whereas in RQ 2, the discovered operations could also be some changes that are not applied in “only one step” but appear to be typical for a domain expert. We refer to the actually applied edit operations and the ones considered as typical by a domain expert as “meaningful”.

V-B Experiments

We conduct three experiments to evaluate our approach. In the first two experiments, we run the algorithm on synthetic model repositories. We know the “relevant edit operations”, since we define them, and apply them to sample models. We can therefore use these experiments to answer RQ 1. Furthermore, since we can control many properties of our input data for these simulated repositories, we can also use them to answer RQ3 and RQ4. In the third experiment, we apply Ockham to the dataset from our case study presented in Section II to answer RQ 2. The first two experiments help us to find the model properties and the parameters the approach is sensible to. Their purpose is to increase the internal validity of our evaluation. In addition, to increase external validity, we apply the approach in a real-world setting. None of the experiments alone can provide sufficient internal or external validity [59] but the combination of all experiments is suitable to assess whether Ockham can discover relevant edit operations.

Experiment 1: As a first experiment, we simulate the application of edit operations on a simple component model. The meta-model is shown in Figure 1.

Setup: For this experiment, we only apply one kind of edit operation (the one from our running example in Figure 2) to a random model instance. The Henshin rule specifying the operation consists of a graph pattern comprising 7 nodes and 7 edges. We create the model differences as follows: We start with an instance of the simple component meta-model with 87 Packages, 85 Components, 85 SwImplementations, 172 Ports, 86 Connectors and 171 Requirements. Then, the edit operation is randomly applied times to the model to obtain a new model revision . This procedure is then applied iteratively times to obtain the “model history” Each evolution step yields a difference

. To each application of the edit operation, we apply a random perturbation. More concretely, a perturbation is another edit operation that we apply with a certain probability

. This perturbation is applied such that it overlaps with the application of the main edit operation. We use the tool Henshin [10] to apply model transformations to one model revision. We then build the difference of two successive models as outlined in Section IV. In our experiment, we control the following parameters for the generated data.

  • : The number of differences in each simulated model repository. For this experiment, .

  • : The number of edit operations to be applied per model revision in the repository, that is, how often the edit operation will be applied to the model. For this experiment, .

  • : The probability that the operation will be perturbed. For this experiment, we use .

This gives us 2000 (= 2x100x10) datasets for this experiment. A characteristics of our datasets is that increasing , the probability of changes to overlap increases. Eventually, adding more changes even decreases the number of components in the SCG while increasing the average size of the components.

Our algorithm suggests a ranking of the top subgraphs (which eventually yield the learned edit operations). In the ranked suggestions of the algorithm, we then look for the position of the “relevant edit operation” by using a graph isomorphism test. To evaluate the ranking, we use the “mean average precision at k” (MAP@k) which is commonly used as an accuracy metric for recommender systems [56]:

where is the family of all datasets (one dataset represents one repository) and AP@k is defined by

where P() is the precision at , and rel() indicates if the graph at rank is relevant. For this experiment, the number of relevant edit operations (or subgraphs to be more precise) is always one. Therefore, we are interested in the rank of the correct edit operation. Except for the case that the relevant edit operation does not show up at all, MAP@ gives us the mean reciprocal rank and therefore serves as a good metric for that purpose.

For comparison only, we also compute the MAP@k scores for the rank of the correct edit operations according to the frequency of the subgraphs. Furthermore, we investigate how the performance of the subgraph mining depends on other parameters of Ockham. We are also interested in how average precision (AP), that is, AP@, depends on the characteristics of the datasets. Note that for the first two experiments, we do not execute the last canonical step of our approach (i.e., deriving the edit operation from a SCG), but we directly evaluate the resulting subgraph from step 4 against the SCG corresponding to the edit operation. We run the experiments on an Intel® Core™ i7-5820K CPU @ 3.30GHz × 12 from which we use 3 cores per dataset and 31.3 GiB RAM.

To evaluate the performance of the frequent subgraph miner on our datasets, we fixed the relative threshold (i.e., the support threshold divided by the number of components in the graph database) to . We re-run the algorithm for this fixed relative support threshold and .

center MAP@1 MAP@5 MAP@10 MAP@ Compression 0.967 0.974 0.975 0.975 Frequency 0.016 0.353 0.368 0.368

TABLE II: The MAP@k scores for the results using compression and frequency for the second experiment.

center MAP@2 MAP@5 MAP@10 MAP@ Compression 0.955 0.969 0.969 0.969 Frequency 0.013 0.127 0.152 0.190

TABLE I: The MAP@k scores for the results using compression and frequency for the first experiment.

Results: See Table II for the MAP@k scores for all datasets in the experiment. Table IV shows the spearman correlation of the independent and dependent variables. If we look only on datasets with a large number of applied edit operations, , the spearman correlation for average precision vs. and average precision vs. becomes 0.25 (instead of 0.12) and -0.14 (instead of -0.07), respectively. The mean time for running Gaston for our datasets was 1.17s per dataset.

Observations: We observe that increasing the number of edit operations has a negative effect on the average precision. Increasing the perturbation has a slightly negative effect, which becomes stronger for a high number of applied edit operations and therefore when huge connected components start to form. The number of differences (i.e., having more examples) has a positive effect on the rank, which is rather intuitive. We also observe a strong spearman correlation of the mining time with the number of applied edit operations (0.89) and implicitly also the average number of nodes per component (0.83). If we only look at the edit operations with rank , we can also see a strong negative correlation of with the average precision (not shown in Table IV). This actually means that large mining times usually come with a bad ranking.

center p Mining e d #Nodes Time per Comp AP -0.07 -0.24 -0.23 0.12 -0.21 AP (for ) -0.14 -0.19 -0.19 0.25 -0.03 Mining Time 0.12 - 0.89 0.26 0.83

TABLE IV: The Spearman correlation matrix for the second experiment.

center p Size at Mining e #Nodes Threshold Time per Comp AP -0.31 -0.05 -0.25 -0.07 -0.19 p - 0.20 0.27 0 0.30 Size at Threshold - - 0.53 0.51 0.58 Mining Time - - - 0.87 0.92 e - - - - 0.92

TABLE III: Spearman correlations for the first experiment.

Experiment 2: In contrast to the first experiment, we want to identify more than one edit operation in a model repository. We therefore extend the first experiment by adding another edit operation and apply each of the operations with the same probability. In order to test if Ockham also detects edit operations with smaller compression than the dominant (in terms of compression) edit operation, we choose the second operation to be smaller, its Henshin rule graph pattern comprises 4 nodes and 5 edges. It corresponds to adding a new Component with its SwImplementation and a Requirement to a Package.

Setup: Since the simulation of model revisions currently consumes a lot of compute resources, we fixed and considered only for this experiment. The rest of the experiment is analogous to the first experiment.

Results: In Table II we give the MAP@k scores for this experiment. Table IV shows the correlation matrix for the second experiment.

Observations: We can see that our compression-based approach clearly outperforms the frequency-based approach used as a baseline. From Table IV, we can observe a strong dependency of the average precision on the perturbation parameter and the mining time.

Experiment 3: Of course, the power of the simulation to mimic a real-world model evolution is limited. Especially, the assumption of random and independent applications of edit operations is questionable. Therefore, for the third experiment, we use a real-world model repository from the railway software development domain (see Section II). Here, we do not know the operations that have actually been applied. We therefore compare the mined edit operations with edit operations randomly generated from the meta-model, and want to show that the mined edit operations are significantly more “meaningful” than the random ones. We will use the results from this interview to answer RQ2.

Setup: For this experiment, we mined 546 pairwise differences, with 4109 changes on average, which also contain changed attribute values (one reason for that many changes is that the engineering language has changed from German to English). The typical model size in terms of their abstract syntax graphs is 12081 nodes and, on average, 50 out of 83 meta-model classes are used as node types.

To evaluate the quality of our recommendations, we conducted a semi-structured interview with five domain experts of our industry partner: 2 system engineers working with one of the models, 1 system engineer working cross-cutting, 1 chief system architect responsible for the product line approach and the head of the tool development team. We presented them 25 of our mined edit operations together with 25 edit operations that were randomly generated out of the meta-model. The edit operations were presented in the visual transformation language of Henshin which we introduced to our participants. Using a 5-point Likert scale, we asked whether the edit operation represents a typical edit scenario (5), can make sense but is not typical (3), and does not make sense at all (1). We compare the means of the Likert score for the population of random edit operations and mined edit operations to determine whether the mined operations are typical or meaningful.

Null hypothesis : The mined edit operations do not present a more typical edit scenario than random edit operations on average.

We set the significance level to . If we can reject the null hypothesis, we conclude that the mined edit operations more likely present typical edit scenarios than the random ones. In addition, we discussed the mined edit operations with the engineers that have not been considered to be typical.

Results: We found some operations that are typical to the modeling language SysML, for example, one which is similar to the simplified operation in Figure 2. We also found more interesting operations, for example, the addition of ports with domain specific port properties. Furthermore, we were able to detect some rather trivial changes. For example, we can see that typically more than just one swimlane is added to an activity, if any. We also found simple refactorings, such as renaming a package (which also leads to changing the fully qualified name of all contained elements) or also some refactorings that correspond to conventions that have been changed, for example, activities were owned by so called “system use cases” before but have been moved into “packages”. Table V shows the results for the Likert values for the mined and random edit operations for the five participants of our study. We can see that for all participants, the mean Likert score for the mined operations is significantly higher than the mean for the random operations. After their rating, when we confronted the engineers with the true results, they stated that the edit operations obtained by Ockham represent typical edit scenarios. According to one of the engineers, some of the edit operations “can be slightly extended” (see also Section V-C). Some of the edit operations found by Ockham but not recognized by the participants where identified “to be a one-off refactoring that has been performed some time ago”.

center Participant mean mean p-value p-value mined random


(Wilcoxon) P1 3.20 1.68 P2 4.04 2.76 P3 4.32 2.60 P4 4.32 1.08 P5 4.48 1.60 Total 4.072 1.944

TABLE V: Statistics for the Likert values of the mined and random edit operations.

Observations: The edit operations found by Ockham obtained significantly higher (mean) Likert scores than the random edit operations. We can therefore reject the null hypothesis and conclude that, compared to random ones, our mined edit operations can be considered as typical edit scenarios on average. Furthermore a mean Likert score of almost 4.1 shows that the edit operations are considered as typical on average. In Section V-C, we take a closer look at the edit operations that where not considered as typical edit scenario by the participants.

V-C Discussion

In the first two experiments, we can see from the high MAP@k values that Ockham is able to recover the edit operations that have been applied. Furthermore, the third experiment shows that Ockham provides meaningful edit operations in a real-world setting. The observations from the first experiment suggests that the main driver for the performance of the frequent subgraph mining is the average number of nodes of our SCGs and the number of edit operations applied in the evolution steps yielding our model differences. To answer RQ3, we have to take a closer look at the datasets for which our approach gives non-optimal results.

V-C1 Reasons for non-optimal results

We have to distinguish between the two cases that (1) the correct edit operation is not detected at all and (2) the correct edit operation has a low rank, i.e., appears later in the ranked list.

Edit operation has not been detected: For the second experiment, in 22 out of 800 examples, Ockham was not able to detect both edit operations. In 10 of these cases the threshold has been set too high. To mitigate this problem, in the real-world setting, the threshold parameters could be manually adjusted until the results are more plausible. In the automatic approach, further metrics have to be integrated. Other factors that cause finding the correct edit operations to fail are the perturbation, average size of component and the size of the component “at threshold”, as can be seen from Table VI.

center Average Size of p a Component Size at Mining (# of Nodes) Threshold Time Overall Mean 0.55 57.6 8.20 1.26 Mean for un- detected operation 0.79 109.0 10.03 2.55

TABLE VI: The main drivers for Ockham to fail in detecting the correct subgraph in experiment 1.

Given a support threshold , the size at threshold is the number of nodes of the -largest component. The intuition behind this metric is the following: For the frequent subgraph miner, in order to prune the search space, a subgraph is only allowed to appear in, at most, components. Therefore, the subgraph miner needs to search for a subgraph, at least, in one component with size greater than the size at threshold. Usually, the size of a component plays a major role in the complexity of the subgraph mining. When the -largest component is small, we could always use this component (or smaller ones) to guide the search through the search space and therefore we will not have a large search space. So, a large size of the component at threshold could be an indicator for a complicated dataset.

We clearly see that perturbation, average size of a component, and the size at threshold are increased for the datasets for which our approach does not perform well. We looked deeper into the results of the datasets from the first experiment for which the correct subgraph has not been identified. We can see that, for some of these subgraphs, there is a supergraph in our recommendations that is top ranked. Usually this supergraph contains one or two additional nodes. Since we have a rather small meta-model and we only use four other edit operations for the perturbation, it can happen rarely, that these larger graphs occur with the same frequency as the actual subgraph. The correct subgraphs are then pruned.

Edit operation has a low rank: First, note that we observe a low rank (rank ) only very rarely. For the first experiment, it happened in 7 out of 2000 datasets, while for the second experiment, it did not happen at all. In Table VII, we list the corresponding datasets and the values for drivers of a low rank.

center Average d e p #Nodes per Size at Average Rank Component Threshold Precision 10 92 0.3 142.2 13 0.13 8 10 67 0.4 91.0 16 0.14 7 10 78 0.8 87.3 14 0.14 7 10 98 0.8 127.7 14 0.067 15 20 81 0.1 227.0 16 0.13 8 20 99 0.1 272.2 19 0.010 99 20 100 0.1 272.7 17 0.013 78

TABLE VII: Possible drivers for a low rank ().

One interesting observation is that, for some of the datasets with low ranked correct subgraph, we can see that the correct graph appears very early in the subgraph lattice, for example, first child of the best compressing subgraph but rank 99 in the output, or first child of the second best subgraph but rank 15 in the output. This suggests that this is more a presentation issue which is due to the fact that we have to select a linear order of all subgraph candidates for the experiment.

V-C2 Qualitative results

We only found two mined edit operations that received an average Likert score below 3 from the five practitioners in the interviews. The first one was a refactoring that was actually performed but that targeted only a minority of all models. Only two of the participants where aware of this refactoring and one of them did not directly recognize it due to the abstract presentation of the refactoring. The other edit operation that was also not considered as a typical edit scenario was adding a kind of document to another document. This edit operation was even considered as illegal by 3 out of the 5 participants. The reason for this is the internal modeling of the relationship between the documents, which the participants were not aware of. So, it can also be attributed to the presentation of the results in terms of Henshin rules, which require an understanding of the underlying modeling language’s meta-model.

For four of the edit operations, some of the participants mentioned that the edit operation can be extended slightly. We took a closer look at why Ockham was not able to detect the extended edit operation, and it turned out that it was due to our simplifications of the locality relaxation and also due to the missing type hierarchies in our graphs. For example, in one edit operation, one could see that the fully qualified name (name + location in the containment hierarchy) of some nodes has been changed, but the actual change causing this name change was not visible, because it was a renaming of a package a few levels higher in the containment hierarchy that was not directly linked to our change. Another example was a “cut off” referenced element in an edit operation. The reason why this has been cut off was that the element appeared as several different sub-classes in the model differences and each single change alone was not frequent.

When looking at the mined edit operations it became clear, that the approach was able to implicitly identify constraints which where not made explicit in the meta-model.

V-C3 Result summary

RQ 1: Is this approach able to identify relevant edit operations in model repositories? We can answer this question with a “yes”. Experiment 1 and 2 show high MAP scores. Only for a large number of applied operations and a large size of the input graphs, the approach fails in finding the applied edit operations.

RQ 2: Is this approach able to find typical edit operations or editing scenarios in a real-world setting? We could show that the approach is able to detect typical edit scenarios. The approach is therefore sound to a large extend, and incomplete edit operations can be adjusted manually. We cannot state yet that the approach is also complete (i.e., is able to find all relevant edit scenarios), though.

RQ 3: What are the main drivers for the approach to work or fail? The main drivers for the approach to fail are a large average size of a component and the size of the component at threshold (see definition in Section V-C1). The average size is related to the number of edit operations applied per model difference. In a practical scenario, huge differences can be excluded when running edit operation detection. The size of the component at threshold can, of course, be reduced by increasing the support threshold parameters of the frequent subgraph mining. With higher threshold, we increase the risk of missing some less frequent edit operations but the reliability for detecting the correct (more frequent) operations is increased. Having more examples improves the results of our approach.

RQ 4: What are the main parameters for the performance of the frequent subgraph mining? The main driver for the performance of the frequent subgraph mining is the number of applied edit operations per difference, which is related to the average number of nodes per component. Furthermore, we have a strong dependence between the average precision and the time spent for the frequent subgraph mining.

Vi Limitations and Threats to Validity

Vi-a Limitations

Locality relaxation:

One limitation of our approach is the locality relaxation, which limits our ability to find patterns that are scattered across more than one connected component of the SCG. As we have seen in our railway case study, this usually leads to incomplete edit operations. Another typical example for violating the relaxation are naming conventions. In the future, we plan to use natural language processing techniques like semantic matching to augment the models by further references.

No attribute information: For this study, we did not take attribute information into account. Attributes (e.g., the name of a component) could also be integrated into the edit operation as preconditions or to extract the parameters of an edit operation. For the purpose of summarizing a model difference or identifying violations in a model difference, preconditions and parameters are not important, though, but only the presence of structural patterns.

Application to simplified graphs: An edit operation generally is a model transformation. Model transformation engines such as Henshin provide also features to deal with class inheritance or multi-object structures (roughly speaking, for each expressions in model transformations). In our approach, we are not handling these features yet. They can be integrated into the approach in a post-processing step. For example, one possibility would be to feed the example instances of patterns discovered by Ockham into a traditional MTBE approach [31].

Transient effects: We also do not take so-called transient effects into account yet. One applied edit operation can invalidate the pre- or post-conditions of another edit operation. However, we have seen in our experiments that it only causes problems in cases where we apply only a few “correct” edit operations with high perturbation. In the practical scenario, the “perturbations” will more likely cancel each other out. When a transient effect occurs very frequently, a new pattern will be discovered. That is, when two (or more) operations are always applied together, we want to find the composite pattern and not the constituent ones.

Focus on single subgraphs instead of sets: Another limitation is the fact that we focused the optimization on single edit operations but not a complete set of edit operations. One could detect only the most-compressing edit operation and then substitute this in the model differences and re-run the mining to discover the second most-compressing edit operation and so on. Another solution would be to detect a set of candidate edit operations using Ockham

and then select an optimal set using a meta-heuristic search algorithm and optimizing the

total compression. We leave this for further research.

Vi-B Threats to validity

Internal validity: The first two experiments were designed so that we can control input parameters of interested and observe their effect on the outcome. Ockham makes assumptions such as the locality relaxation, which could risk the real-world applicability. Because of this and since we can not claim that the results from the first two experiments also hold true in a real-world setting, we additionally applied our approach to an industrial case study. We can therefore be confident that Ockham also gives reasonable results in a practical scenario. In our simulations, we applied the edit operation randomly to a meta-model. To reduce the risk of observations that are only a result of this sampling, we created many example models. In the real-world setting, we compared the mined edit operations to random ones to rule out “patternicity” [58] as an explanation for high Likert rankings. None of our participants reported problems in understanding Henshin’s visual notation, which gives us a high confidence regarding their judgements. The participants of the interviews in the third experiment were also involved in the project where the model history was taken from. There might be the risk that the interviewees have only discovered operations they have “invented”. In any case, because of the huge project size and because 22 out of 25 of the edit operations were recognized as typical by more than one of the participants, this is unlikely.

External validity: Some of the observations in our experiments could be due to the concrete set of edit operations in the example or even due to something in the meta-models. In the future, Ockham has to be tested for further meta-models to increase the external validity of our results. We have validated our approach in a real-world setting, which increases our confidence in its practicality, though. Since we have used an exact subgraph miner, we can be sure that the discovered edit operation are independent of the subgraph mining algorithm.

Vii Related Work

Several approaches have been proposed to (semi-)automatically learn model transformations in the field of Model Transformation By Example. In the first systematic approach of MTBE, Varró [64] proposes an iterative procedure which tries to derive exogenous (i.e., source and target meta-model are different) model transformations by examples. Appropriate examples need to be provided for the algorithm to work. Many approaches to learning exogenous model transformations have been proposed until now. For example, Berramla et al. [9]

use statistical machine translation and language models to derive the transformations. Or Baki and Sahraoui

[6] apply simulated annealing to learn the operations. Regarding exogenous transformations there is also an approach by Saada et al. [54] which uses graph mining techniques to learn concepts which are then used to identify new transformation patterns.

However, as already mentioned in the introduction, most closely related to our approach is the area of MTBE for endogenous model transformations. Compared to exogenous MTBE, there are only a few studies available for endogenous MTBE. Brosch et al. [11] present a tool called the Operation Recorder, which is a semi-automatic approach to derive model transformations by recording all transformation steps. A similar approach is presented by Yun et al. [61], who also infer complex model transformations from a demonstration. Alshanqiti et al. [2] learn transformation rules from a set of examples by generalizing over pre- and postcondition graphs. Their approach has been applied to the derivation of edit operations, including negative application conditions and multi-object patterns [31]. Instead of learning a single operation, Mokaddem et al. [17]

use a genetic algorithm to learn a set of refactoring rule pairs of examples before and after the application of refactoring. The creation of candidate transformations that conform to the meta-model is done by the use of a “fragment type graph”, which allows them to grow candidate patterns which conform to the meta-model. Their algorithm optimizes a model modification and preservation score. Ghannem et al.

[22] also use a genetic algorithm (i.e., NSGA-II) to learn model refactorings from a set of “bad designed” and “good designed” models. Their approach distinguishes between structural similarity and semantic similarity and tries to minimize structural and semantic similarity between the initial model and the bad designed models and to maximize the similarity between the initial and the well designed models.

All of these approaches for learning endogenous model transformations are (semi-)supervised. Either a concrete example is given (which only contains the transformation to be learned) or a set of positive and negative examples is given. In the case of Mokaddem et al.’s genetic approach, it is assumed that all transformations that can be applied are actually applied to the source models. For the meta-model used in our real-world case study, we do not have any labeled data. In general, we are not aware of any completely unsupervised approach to learn endogenous model transformations. To reduce the search space, we make use of the evolution of the models in the model repository, though. We do not directly work on the models as in the approaches above but work on structural model differences.

Furthermore, there is related work in the source code domain. Regarding one of our motivations for mining edit operations, namely to simplify differences, there are several approaches in the source code domain [66, 43]. These approaches are more comparable to the approach of semantic lifting [33], to aggregate or filter model differences according to given patterns but they are not learning the patterns themselves. There are also approaches to mine change patterns in source code. For example, Dagit et al. propose an approach based on the abstract syntax tree (AST) [14], and Nguyen et al. mine patterns based on a so called fine-grained program dependence graph [46]. There is also some work that focuses on mining design patterns from source code [49, 7, 20, 16]. The idea behind these approaches, that is, learning (change) patterns from a version history, is comparable to ours. Other than these approaches, Ockham works on a kind of abstract syntax graph which already includes domain knowledge given by the meta-model. Furthermore, we do not use a similarity metric to detect change groups or frequent changes but use an (exact) subgraph mining approach instead. In model-driven engineering, one often has some kind of identifiers for the model elements, which makes the differencing more reliable and removes the need for similarity-based differencing methods.

Viii Conclusion and Outlook

We proposed an approach, Ockham, for automatically deriving edit operations specified as in-place model transformations from model repositories, based on the idea that a meaningful edit operation will be one which provides a good compression for the model differences. Ockham uses frequent subgraph mining on labeled graph representation of model differences to discover frequent patterns in the model differences. The patterns are then filtered and ranked based on a compression metric to get a list of recommendations for meaningful edit operations. To the best of our knowledge, Ockham is the first approach for learning domain-specific edit operations in a fully unsupervised manner, i.e., without relying on any manual intervention or input from a developer or domain expert.

We have successfully evaluated Ockham on two case studies using synthetic ground-truth EMF models and on a large-scale real-world case study in the railway domain. We find that our approach is able to extract edit operations that have actually been applied from the model differences and also discovers meaningful edit operations in a real-world setting. Too large connected components in the differences is the main driver for the approach to fail in discovering actually applied edit operations. Performance mostly depends on the number of applied edit operations in a model difference. Our approach can be applied to models of any Domain-Specific Modeling Language for which model histories are available. New effective edit operations that are performed by the users can be learned at runtime and recommendations can be made.

For our future research, we plan to extend Ockham by a meta-heuristic search to identify the optimal set of operations. Another alternative approach which we want to study in the future is to use a clustering algorithm and then feed the clusters into the frequent subgraph mining step of our approach. This will allow us also to deal with examples where the connected components of the difference graph are huge.


  • [1] V. Acreţoaie, H. Störrle, and D. Strüber (2018) VMTL: a language for end-user model transformation. Software & Systems Modeling 17 (4), pp. 1139–1167. Cited by: §I.
  • [2] A. M. Alshanqiti, R. Heckel, and T. A. Khan (2012) Learning minimal and maximal rules from observations of graph transformations. Electronic Communication of the European Association of Software Science and Technology 47. Note: Uses positive and negative examples to learn rules (we don’t have this). But the also derive minimal rules (endogeneous). This is pretty similar to what we want. Main difference. They use sepecific examples. We have a much more complicated approach where we have a whole diff of two revisions. ”There is much more pollution”. But i think this scenario is more realistic for the input in our case. In the other case, we actually have the edit operations already (the learning them does not really make sense (does it?)).
  • [109] Same mind set as I have. We also make the assumption of the ”connected rules” but want to extend this approach later.
  • [110] Funny Fact: Their minimal rules don’t contain some necessary preconditions (due to their approach). They learn these using negative examples. We would have the same problem, when only considering the diff objects and their anchors, but using a bigger context (at least all references between all necessary nodes -> bookingInformation in the example) we can also learn the preconditions via Frequent subgraph mining.
  • [111] Case Stduy with hotels we could also use!
  • Cited by: §VII.
  • [3] T. Arendt, E. Biermann, S. Jurack, C. Krause, and G. Taentzer (2010) Henshin: Advanced concepts and tools for in-place EMF model transformations. In International Conference on Model Driven Engineering Languages and Systems (MODELS), pp. 121–135. Cited by: §I, §III-C.
  • [4] T. Arendt and G. Taentzer (2013) A tool environment for quality assurance based on the Eclipse Modeling Framework. In International Conference on Automated Software Engineering (ASE), Vol. 20, pp. 141–184. External Links: Document, ISSN 09288910 Cited by: §I, §II.
  • [5] I. Avazpour, J. Grundy, and L. Grunske (2015) Specifying model transformations by direct manipulation using concrete visual notations and interactive recommendations. Journal of Visual Languages and Computing 28, pp. 195–211. Note: Outplace trafos…! External Links: Document, ISSN 1045926X Cited by: §I.
  • [6] I. Baki and H. Sahraoui (2016) Multi-step learning and adaptive search for learning complex model transformations from examples. ACM Transactions on Software Engineering and Methodology 25 (3), pp. 1–36. Note: Exogenous transformaitons External Links: Document, ISSN 15577392 Cited by: §VII.
  • [7] Z. Balanyi and R. Ferenc (2003) Mining design patterns from c++ source code. In International Conference on Software Maintenance (ICSM), pp. 305–314. Cited by: §VII.
  • [8] A. ben Fadhel, M. Kessentini, P. Langer, and M. Wimmer (2012) Search-based detection of high-level model changes. In International Conference on Software Maintenance (ICSM), pp. 212–221. Cited by: §I, §II.
  • [9] K. Berramla., E. A. Deba., J. Wu., H. Sahraoui., and A. Benyamina. (2020) Model transformation by example with statistical machine translation. In International Conference on Model-Driven Engineering and Software Development (MODELSWARD), pp. 76–83. External Links: Document, ISBN 978-989-758-400-8, ISSN 2184-4348 Cited by: §VII.
  • [10] E. Biermann, C. Ermel, and G. Taentzer (2012) Formal foundation of consistent EMF model transformations by algebraic graph transformation. Software and Systems Modeling 11 (2), pp. 227–250. External Links: Document, ISSN 16191366 Cited by: §III-A, §V-B.
  • [11] P. Brosch, P. Langer, M. Seidl, K. Wieland, M. Wimmer, G. Kappel, W. Retschitzegger, and W. Schwinger (2009) An example is worth a thousend words: composite operation modeling by-example. In International Conference on Model Driven Engineering Languages and Systems (MODELS), Vol. 5795, pp. 271–285. External Links: Document, ISBN 9783642044243, ISSN 03029743, Link Cited by: §I, §VII.
  • [12] A. Burdusel, S. Zschaler, and D. Strüber (2018) MDEOptimiser: A search based model engineering tool. In International Conference on Model Driven Engineering Languages and Systems (MODELS): Companion Proceedings, pp. 12–16. Cited by: §I.
  • [13] D. J. Cook and L. B. Holder (2006) Mining graph data. John Wiley & Sons. Cited by: §III-B.
  • [14] J. Dagit and M. J. Sottile (2013) Identifying change patterns in software history. CoRR abs/1307.1719. External Links: Link, 1307.1719 Cited by: §VII.
  • [15] S. Djoko (1994) Substructure discovery using minimum description length principle and background knowledge.

    Proceedings of the National Conference on Artificial Intelligence

    2, pp. 1442.
    Cited by: §IV.
  • [16] J. Dong, Y. Zhao, and T. Peng (2009) A review of design pattern mining techniques.

    International Journal of Software Engineering and Knowledge Engineering

    19 (06), pp. 823–855.
    Cited by: §VII.
  • [17] C. eddine Mokaddem, H. Sahraoui, and E. Syriani (2018) Recommending model refactoring rules from refactoring examples. International Conference on Model Driven Engineering Languages and Systems (MODELS), pp. 257–266. Note: Goes exactly in our direction but focusing on refactorings and more or less random generation of the mutation. We could use the existing diff to make the mutations. External Links: Document, ISBN 9781450349499 Cited by: §I, §I, §I, §VII.
  • [18] H. Ehrig, U. Prange, and G. Taentzer (2004) Fundamental theory for typed attributed graph transformation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3256, pp. 161–177. External Links: Document, ISSN 03029743 Cited by: §III-C.
  • [19] K. Ehrig, C. Ermel, S. Hänsgen, and G. Taentzer (2005) Generation of visual editors as eclipse plug-ins. In International Conference on Automated Software Engineering (ASE), pp. 134–143. Cited by: §I.
  • [20] R. Ferenc, A. Beszedes, L. Fulop, and J. Lele (2005)

    Design pattern mining enhanced by machine learning

    In International Conference on Software Maintenance (ICSM), pp. 295–304. Cited by: §VII.
  • [21] S. Getir, L. Grunske, A. van Hoorn, T. Kehrer, Y. Noller, and M. Tichy (2018) Supporting semi-automatic co-evolution of architecture and fault tree models. Journal of Systems and Software 142, pp. 115–135. Cited by: §I, §II.
  • [22] A. Ghannem, M. Kessentini, M. S. Hamdi, and G. El Boussaidi (2018) Model refactoring by example: A multi-objective search based software engineering approach. Journal of Software: Evolution and Process 30 (4), pp. 1–20. External Links: Document, ISSN 20477481 Cited by: §VII.
  • [23] P. D. Grünwald and A. Grunwald (2007) The minimum description length principle. MIT press, Cambridge, MA, USA. Cited by: §IV.
  • [24] Á. Hegedüs, Á. Horváth, I. Ráth, M. C. Branco, and D. Varró (2011) Quick fix generation for DSMLs. In Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 17–24. Cited by: §I.
  • [25] M. Herrmannsdoerfer, S. Vermolen, and G. Wachsmuth (2010) An extensive catalog of operators for the coupled evolution of metamodels and models. In Software Language Engineering, pp. 163–182. Cited by: §I.
  • [26] K. Hölldobler, B. Rumpe, and I. Weisemöller (2015) Systematically deriving domain-specific transformation languages. In International Conference on Model Driven Engineering Languages and Systems (MODELS), pp. 136–145. Cited by: §I.
  • [27] N. M. Inc MagicDraw hompage. Note: https://www.nomagic.com/products/magicdraw Cited by: §II.
  • [28] C. Jiang, F. Coenen, and M. Zito (2013) A survey of frequent subgraph mining algorithms. Knowledge Engineering Review 28 (1), pp. 75–105. External Links: Document, ISBN 0000000000000 Cited by: §III-B.
  • [29] N. Kahani, M. Bagherzadeh, J. R. Cordy, J. Dingel, and D. Varró (2019) Survey and classification of model transformation tools. Software & Systems Modeling 18 (4), pp. 2361–2397. Cited by: §IV.
  • [30] G. Kappel, P. Langer, W. Retschitzegger, W. Schwinger, and M. Wimmer (2012) Model transformation by-example: a survey of the first wave. In Conceptual Modelling and Its Theoretical Foundations - Essays Dedicated to Bernhard Thalheim on the Occasion of His 60th Birthday, Vol. 7260 LNCS, pp. 197–215. Note: Good overview of MTBE. External Links: Document, ISBN 9783642282782, ISSN 03029743 Cited by: §I, §I.
  • [31] T. Kehrer, A. M. Alshanqiti, and R. Heckel (2017) Automatic inference of rule-based specifications of complex in-place model transformations. In International Conference on Model Transformations (ICMT), pp. 92–107. Note: Interesting approach that can learn even NACs and PACs but concrete isolated positive and negative examples need to be provided. External Links: Document, ISBN 9783319614724, ISSN 16113349 Cited by: §I, §I, §VI-A, §VII.
  • [32] T. Kehrer, U. Kelter, M. Ohrndorf, and T. Sollbach (2012) Understanding model evolution through semantically lifting model differences with SiLift. In International Conference on Software Maintenance (ICSM), pp. 638–641. Cited by: §I, §II.
  • [33] T. Kehrer, U. Kelter, and G. Taentzer (2011) A rule-based approach to the semantic lifting of model differences in the context of model versioning. In International Conference on Automated Software Engineering (ASE), pp. 163–172. Cited by: §I, §II, §VII.
  • [34] T. Kehrer, G. Taentzer, M. Rindt, and U. Kelter (2016) Automatically deriving the specification of model editing operations from meta-models. In International Conference on Model Transformations (ICMT), Vol. 9765, pp. 173–188. Note: Derives initial set of edit operations from the Meta-Model. External Links: Document, ISBN 9783319420639, ISSN 16113349 Cited by: §I.
  • [35] T. Kehrer (2015) Calculation and propagation of model changes based on user-level edit operations : a foundation for version and variant management in model-driven engineering. Ph.D. Thesis, University of Siegen. External Links: Link Cited by: §IV, §IV.
  • [36] N. S. Ketkar, L. B. Holder, and D. J. Cook (2005) Subdue. pp. 71–76. External Links: Document Cited by: §III-B, §IV.
  • [37] D. E. Khelladi, R. Hebig, R. Bendraou, J. Robin, and M. Gervais (2016) Detecting complex changes and refactorings during (meta)model evolution. Information Systems 62, pp. 220–241. Cited by: §I, §II.
  • [38] M. Koegel, J. Helming, and S. Seyboth (2009) Operation-based conflict detection and resolution. In ICSE Workshop on Comparison and Versioning of Software Models, pp. 43–48. Cited by: §I.
  • [39] S. Kögel, R. Groner, and M. Tichy (2016) Automatic change recommendation of models and meta models based on change histories. In Proceedings of the 10th Workshop on Models and Evolution co-located withInternational Conference on Model Driven Engineering Languages and Systems (MODELS), Vol. 1706, pp. 14–19. External Links: ISSN 16130073 Cited by: §I, §II.
  • [40] D. S. Kolovos, D. Di Ruscio, A. Pierantonio, and R. Paige (2009) Different Models for Model Matching: An analysis of approaches to support model differencing. In ICSE Workshop on Comparison and Versioning of Software Models, pp. 1–6. Cited by: §IV, §IV.
  • [41] D. S. Kolovos, L. M. Rose, S. B. Abid, R. F. Paige, F. A. Polack, and G. Botterweck (2010) Taming EMF and GMF using model transformation. In International Conference on Model Driven Engineering Languages and Systems (MODELS), pp. 211–225. Cited by: §I.
  • [42] P. Langer, M. Wimmer, P. Brosch, M. Herrmannsdörfer, M. Seidl, K. Wieland, and G. Kappel (2013) A posteriori operation detection in evolving software models. Journal of Systems and Software 86 (2), pp. 551–566. Cited by: §I, §II.
  • [43] M. Martinez, L. Duchien, and M. Monperrus (2013) Automatically extracting instances of code change patterns with AST analysis. In International Conference on Software Maintenance (ICSM), pp. 388–391. External Links: Link, Document Cited by: §VII.
  • [44] S. Mazanek and M. Minas (2009) Generating correctness-preserving editing operations for diagram editors. Electronic Communication of the European Association of Software Science and Technology 18. Cited by: §I.
  • [45] T. Mens and P. Van Gorp (2006) A taxonomy of model transformation. Electronic Notes in Theoretical Computer Science 152 (1-2), pp. 125–142. External Links: Document, ISSN 15710661, Link Cited by: §III-C.
  • [46] H. A. Nguyen, T. N. Nguyen, D. Dig, S. Nguyen, H. Tran, and M. Hilton (2019) Graph-based mining of in-the-wild, fine-grained, semantic code change patterns. In International Conference on Software Engineering (ICSE), J. M. Atlee, T. Bultan, and J. Whittle (Eds.), pp. 819–830. External Links: Link, Document Cited by: §VII.
  • [47] S. Nijssen and J. N. Kok (2005) The Gaston tool for frequent subgraph mining. Electronic Notes in Theoretical Computer Science 127 (1), pp. 77–87. External Links: Document, ISSN 15710661, Link Cited by: §III-B, §IV.
  • [48] M. Ohrndorf, C. Pietsch, U. Kelter, and T. Kehrer (2018) ReVision: a tool for history-based model repair recommendations. In International Conference on Software Engineering (ICSE): Companion Proceedings, Vol. 30, pp. 105–108. Cited by: §I, §II, §IV.
  • [49] M. Oruc, F. Akal, and H. Sever (2016) Detecting design patterns in object-oriented design models by using a graph mining approach. In International Conference in Software Engineering Research and Innovation (CONISOFT), Vol. , pp. 115–121. External Links: Document Cited by: §VII.
  • [50] P. Pietsch, H. S. Yazdi, and U. Kelter (2011) Generating realistic test models for model processing tools. In International Conference on Automated Software Engineering (ASE), pp. 620–623. Cited by: §I.
  • [51] M. Polanyi (1958) Personal knowledge: towards a post critical philosophy. University of Chicago Press. Cited by: §I.
  • [52] A. Rodrigues Da Silva (2015) Model-driven engineering: A survey supported by the unified conceptual model. Computer Languages, Systems and Structures 43, pp. 139–155. External Links: Document, ISSN 14778424 Cited by: §I.
  • [53] L. M. Rose, M. Herrmannsdoerfer, S. Mazanek, P. V. Gorp, S. Buchwald, T. Horn, E. Kalnina, A. Koch, K. Lano, B. Schätz, and M. Wimmer (2014) Graph and model transformation tools for model migration - Empirical results from the transformation tool contest. Software & Systems Modeling 13 (1), pp. 323–359. Cited by: §I.
  • [54] H. Saada, M. Huchard, M. Liquiere, and C. Nebut (2014) Learning model transformation patterns using graph generalization. In International Conference on Concept Lattices and Their Applications, Vol. 1252, pp. 11–22. Note: They are learning graph transformations (exogenouous) using Frequent Subgraph Mining for Concept extraction.
  • [118] I am not so sure, if they not only learn concepts that are already pretty explicit in the meta-model.
  • [119] Mindset anyway similar. We want to discover patterns in models which directly lead to Graph Pattern Mining and this to FSM.
  • External Links: ISSN 16130073 Cited by: §VII.
  • [55] M. Schmidt and T. Gloetzner (2008) Constructing difference tools for models using the SiDiff framework. In International Conference on Software Engineering (ICSE): Companion Proceedings, pp. 947–948. Cited by: §IV.
  • [56] G. Schröder, M. Thiele, and W. Lehner (2011) Setting goals and choosing metrics for recommender system evaluations. In UCERSTI2@Conference on Recommender Systems (RecSys), Vol. 23, pp. 53. Cited by: §IV, §V-B.
  • [57] S. Sendall and W. Kozaczynski (2003) Model transformation: The heart and soul of model-driven software development. IEEE Software 20 (5), pp. 42–45. External Links: Document, ISSN 07407459 Cited by: §I.
  • [58] M. Shermer (2008) Patternicity: finding meaningful patterns in meaningless noise. Scientific American 299 (5), pp. 48. Cited by: §VI-B.
  • [59] J. Siegmund, N. Siegmund, and S. Apel (2015) Views on internal and external validity in empirical software engineering. In International Conference on Software Engineering (ICSE), Vol. 1, pp. 9–19. Cited by: §V-B.
  • [60] M. Stephan and J. R. Cordy (2013) A survey of model comparison approaches and applications. In International Conference on Model-Driven Engineering and Software Development (MODELSWARD), pp. 265–277. Cited by: §IV.
  • [61] Y. Sun, J. Gray, and J. White (2011) MT-Scribe : An end-user approach to automate software model evolution. In International Conference on Software Engineering (ICSE), Vol. 1, pp. 980–982. External Links: ISBN 9781450304450 Cited by: §I, §VII.
  • [62] G. Taentzer, A. Crema, R. Schmutzler, and C. Ermel (2007) Generating domain-specific model editors with complex editing commands. In Applications of Graph Transformations with Industrial Relevance (AGTIVE), pp. 98–103. Cited by: §I, §II.
  • [63] A. Van Deursen, E. Visser, and J. Warmer (2007) Model-driven software evolution: A research agenda. Technical Report Series TUD-SERG-2007-006. 7, pp. 33. Cited by: §I, §III-C.
  • [64] D. Varró (2006) Model transformation by example. In International Conference on Model Driven Engineering Languages and Systems (MODELS), Vol. 4199 LNCS, pp. 410–424. External Links: Document, ISBN 3540457720, ISSN 16113349, Link Cited by: §VII.
  • [65] P. Welke, F. Seiffarth, M. Kamp, and S. Wrobel (2020) HOPS: Probabilistic subtree mining for small and large graphs. In Conference on Knowledge Discovery (KDD), pp. 1275–1284. External Links: Document, ISBN 9781450379984 Cited by: §IV.
  • [66] Y. Yu, T. T. Tun, and B. Nuseibeh (2011) Specifying and detecting meaningful changes in programs. In International Conference on Automated Software Engineering (ASE), pp. 273–282. External Links: Document Cited by: §VII.