1 Introduction
Coupling [14, 31]—the number of intermodule connections in software systems—has long been identified as a software architecture quality metric for modularity [29]. Taking coupling metrics into account during development of a software system can help to increase the system’s maintainability and understandability [7], in particular for microservice architectures [24]. As a consequence, aiming for high cohesion and low coupling is accepted as a design guideline in software engineering [11].
In the literature, there exists a wide range of different approaches to measuring coupling. Usually, the coupling degree of a module (class or package) indicates the number of “connections” it has to different system modules. A “connection” between modules and can be, among others, a method call from to or an exception of type thrown by . Many notions of coupling can be measured statically, based on either source code or compiled code.
Static analysis is attractive since it can be performed immediately on source code or on a compiled program. However, it has been observed [5, 12, 15] that for objectoriented software, static analysis does not suffice, as it often fails to account for effects of inheritance with polymorphism and dynamic binding. This is addressed by dynamic analysis, where monitoring logs are generated while running the software.
The results obtained by dynamic analysis depend on the workload used for the run of the system yielding the monitoring data. Hence the availability of representative workload for the system under test is crucial for dynamic analysis. As a consequence, dynamic analysis is more expensive than static analysis.
Dynamic analysis is often used to improve upon the accuracy of static coupling analysis [16]. Dynamic analysis uses monitoring data to find, e.g., all classes whose methods are called by the class . In this case, the individual relationship between two classes and is qualitative: The analysis only determines whether there is a connection between and , and does not take its strength (e.g., number of calls during a system’s run) into account. In contrast, a quantitative coupling measurement quantifies the strength of the connection between and by assigning it a concrete number.
The coupling metrics we consider in this paper are defined using a dependency graph. The nodes of such a graph are program modules (classes or packages). Edges between modules express call relationships. They can be labelled with weights, which are integers denoting the number of occurrences of the call represented by the edge. Depending on whether coupling metrics take these weights into account or not, we call the metrics weighted or unweighted. The main two metrics we consider are the following:

Unweighted static coupling, where an edge from to is present in the dependency graph if some method from is called from in the (source or compiled) program code,

Weighted dynamic coupling, where an edge from to is present in the graph if such a call actually occurs during the monitored run of the system, and is attributed with the number of such calls observed.
Dynamic weighted coupling measures cannot replace their static counterparts in their role to e.g., indicate maintainability of software projects. However, we expect dynamic weighted coupling measures to be highly relevant for software restructuring: In contrast to static coupling measures, weighted dynamic measures can reflect the runtime communication “hot spots” within a system, and therefore may be helpful in establishing performance predictions of restructuring steps. For example, method calls that happen infrequently may be replaced by a sequence of nested calls or with a network query without relevant performance impacts. Since static coupling measures are often used as basis for restructuring decisions [11, 26], dynamic weighted coupling measures can potentially complement their static counterparts in the restructuring process. This possible application leads to the following question: Do dynamic coupling measures yield additional information beyond what we can obtain from static analysis?
Initially, we expected static and dynamic coupling degrees to be almost unrelated: A module has high static coupling degree if there are many method calls from to methods outside of or vice versa in the program code. On the other hand, has high dynamic weighted coupling degree if during the observed run of the system, there are many runtime method calls between and other parts of the system. Since a single occurrence of a method call in the code can be executed millions of times—or not at all—during a run of the program, static and weighted dynamic coupling degrees do not need to correlate. Thus, our initial hypothesis was to not observe a high correlation between static and weighted dynamic metrics.
Our main research question is: Are static coupling degrees and dynamic weighted coupling degrees statistically independent? If we observe correlation, can we quantify the correlation?
To answer these questions, we compare the two coupling measures. We use dynamically collected data to compute weighted metrics that take into account the number of function calls during the system’s run. We obtained the data from a series of four experiments. Each experiment consists of monitoring real production usage of a commercial software system (Atlassian Jira [6]) over a period of four weeks each. Our monitoring data contains more than three billion method calls. We compare the results from our dynamic analysis to computations of static coupling degrees.
Directly comparing static and weighted dynamic coupling degrees is of little value, as these are fundamentally different measurements: For instance, the absolute value of dynamic weighted degrees depends on the duration of the monitored program run, which clearly is not the case for the static measures. We therefore instead compare coupling orders, i.e., the ranking obtained by ordering all program modules by their coupling degree using the Kendall Tau^{1}^{1}1See [19] for a discussion of the relationship between this metric and Spearman’s correlation. metric [21]. This also allows to quantify the difference between such orders.
Our answer to the above stated research questions is that static and (weighted) dynamic coupling degrees are not statistically independent. A possible interpretation of this result is that dynamic weighted coupling degrees give additional, but related information compared to the static case. In addition to this result, we observe insightful differences between class and packagelevel analyses.
Contributions
The results and contributions of this paper are:^{2}^{2}2A replication package inlcuding the collected data of our experiments will soon be published on Zenodo, to allow other researchers to repeat and extend our work.

Using a unified framework, we introduce precise definitions of static and dynamic coupling measures.

To investigate our main research question, we performed four experiments involving real users of a commercial software product (the Atlassian Jira project and issue tracking tool [6]) over a period of four weeks each. The software was instrumented via the dynamic monitoring framework Kieker [20] based on AspectJ [22]. From the collected data, we computed our dynamic coupling measures. We compared the obtained results, using the KendallTau metric [8], to coupling measures we obtained by static analysis.

The results show that all coupling metrics we investigate are correlated, but there are also significant differences. In particular, when considering packagelevel coupling, the correlation is significantly stronger than for classlevel coupling. As reason we assume that effects like polymorphism and dynamic binding often do not cross package boundaries.
Finally, we note that this paper is an extension of a previous short poster paper [30] in which a highlevel overview of the research approach and the first data set are presented. The current paper extends the previous short poster paper (2 pages in length) as follows:

This paper contains an indepth explanation of the research approach, including a precise definition of our coupling metrics.

We report on the statistical properties of the data collected during the experiments.

We report on the findings of four experiments whereas the short paper only discusses the first of our four data sets.
Paper Organization
The remainder of the paper is organized as follows: In Section 2, we discuss related work. Section 3 provides our definition of weighted dynamic coupling. In Section 4, we explain our approach to static and dynamic analysis. Section 5 then describes the setting of our experiment. The results are presented and discussed in Section 6. In Section 7, we discuss threats to validity and conclude in Section 8 with a discussion of possible future work.
2 Related Work
There is extensive literature on using coupling metrics to analyse software quality, see, e.g., Fregnan et al. [17] for an overview. Briand et al. [10] propose a repeatable analysis procedure to investigate coupling relationships. Nagappan et al. [27] show correlation between metrics and external code quality (failure prediction). They argue that no single metric provides enough information (see also Voas and Kuhn [32]), but that for each project a specific set of metrics can be found that can then be used in this project to predict failures for new or changed classes. Misra et al. [25] propose a framework for the evaluation and validation of software complexity measures. Briand and Wüst [9] study the relationship between software quality models to external qualities like reliability and maintainability. They conclude that, among others, import and export coupling appear to be useful predictors of faultproneness. Static weighted coupling measures have been considered by Offutt et al. [28]. Allier et al. [2] compare static and unweighted dynamic metrics.
Our approach is different: We do not study correlation between software metrics and software quality, but correlation between different software metrics.
Dynamic (unweighted) metrics have been investigated in numerous papers (see, e.g., Arisholm et al. [5] as a starting point, also the surveys by Chhabra and Gupta [13] and Geetika and Singh [18]). None of these approaches considers dynamic weighted metrics, as we do.
Dynamic analysis is often used to complement static analysis. As an notable exception, Yacoub et al. [33] use weighted metrics. However, to obtain the data, they do not use runtime instrumentation—as we do—but “earlystage executable models.” They also assume a fixed number of objects during the software’s runtime.
Arisholm et al. [5] study dynamic metrics for objectoriented software. Our dynamic coupling metrics are based on their dynamic messages metric. The difference is as follows: Their metric counts only distinct messages, i.e., each method call is only counted once, even if it appears many times in a concrete run of the system. The main feature of our weighted metrics is that the number of occurrences of each call during the run of a system is counted. The dynamic messages metric from [5] corresponds to our unweighted dynamic coupling metrics (see below).
3 Dynamic, Weighted Coupling
3.1 Dependency Graphs
We performed our analyses with two different levels of granularity: on the (Java) class and package levels. In the following we use the term module for either class or package, depending on the granularity of the analysis. The output of either types of analyses (dynamic and static) is a labeled, directed graph , where the nodes represent program modules (i.e., classes or packages), and the labels are integers which we refer to as weights of the edges. An edge from to has label (weight) , this denotes that the number of directed interactions between and occurring in the analysis is .
In the case of a static analysis, this means that there are places in the code of where some method from is called. For dynamic analysis, this means that during the monitored run of the system, there were runtime invocations of methods from by methods from .
Our graph is a weighted dependency graph, hence we call the coupling metrics we define below weighted metrics. When we disregard the numbers , the graph is a plain dependency graph, i.e., a directed graph where the edges reflect function calls between the modules. We refer to metrics defined on the unweighted dependency graph—i.e., metrics that do not take the weights into account—as unweighted metrics. We study the following three conceptually different approaches to measure coupling dependency between program modules:

The first approach is static analysis, which identifies method calls by analyzing the compiled code (we used BCEL to analyze Java .class and .jar files). Here we do not take weights into account. We therefore compute our static coupling measures from an unweighted dependency graph.

Our second approach is unweighted dynamic analysis. This analysis identifies method calls between modules as they appear in an actual run of the system (the data is obtained by monitoring), but does not take the weights into account. It therefore does not distinguish between cases where a module calls another module a million times or just once. This metric is essentially the dynamic messages metric from [5].

Our third approach is weighted dynamic analysis, which differs from its unweighted counterpart only by taking the weights into account.
The distinctions between static/dynamic analyses and unweighted/weighted analyses are orthogonal choices. In particular, we omit in the present paper a weighted, static analysis, since our main motivation is the comparison of dynamic, weighted metrics unweighted, static metrics.
3.2 Definition of Coupling Metrics
We now define the coupling measures we study. Our measures assign a coupling degree to a program module (i.e., a class or a package). We consider different ways to measure coupling, resulting from the following three orthogonal choices:

The first choice is between classlevel and packagelevel granularity. Depending on this choice, a module is either a (Java) class or a (Java) package.

The second choice is between one of our three basic measurement approaches: static, dynamic unweighted, or dynamic weighted analysis.

The third choice is to measure import export or combined coupling.
To distinguish these 18 types of measurement, we use triples , where is c or p and indicates the granularity, is s, u, or w and indicates the basic measurement approach, and is i, e, or c, indicating the direction of couplings taken into account. Figure 1 illustrates these three orthogonal choices: The example triple denotes an analysis with granularity packagelevel, using dynamic unweighted analysis, and considers coupling in the import direction.
Our coupling measures can be computed from the two dependency graphs resulting from our two analyses (static and dynamic). For a module , and a choice of measure , the coupling degree of , denoted with , is computed as follows:

We compute . This is the weighted dependency graph between classes (if ) or packages (if ) obtained by static analysis (if ) or dynamic analysis (if or ), where each weight is replaced with if the analysis is static or (dynamic) unweighted (i.e., if ).

Then, is the outdegree of , indegree of , or sum of these, depending on whether , , or . The in (out) degree of is the sum of the weights of its incoming (outgoing) edges in the graph.
4 Static and Dynamic Analysis
We perform our static analysis (using the Apache BCEL [4]) on the compiled code. This also implies that some optimizations have already performed by the compiler, such as removal of dead code. Therefore, our static and dynamic analyses are performed on the exact same code, without differences introduced in the compilation process. For the dynamic analysis, we use the Kieker framework [20] that allows to register every method call. Kieker uses AspectJ’s [22] loadtime weaver to instrument the analyzed software automatically at loadtime. In order to reduce the performance impact of monitoring, we restricted the monitoring to a subset of the system, and adjusted the static analysis accordingly.
5 Experiment Design
We analyzed the software Atlassian Jira, versions 7.3.0, 7.4.3, and 7.7.1 [6]. The system was instrumented using AspectJ technology. For each method call, we recorded the time stamp, the class name of caller and of the callee.
To perform our analysis with realistic workload, we conducted four experiments with real users using a software system (Atlassian Jira [6]) in production. Jira was used by students participating in a mandatory programming course of our computer science curriculum. In the course, the students develop a software using the Kanban process management method [1]. The time span of the project is four weeks, with full time participation by the students.
We report on four experiment runs, from February and September of 2017 and 2018. Each time, the software ran for a fourweek period. The collected monitoring data from each run includes the startup sequence, basic configuration such as database access, initial tasks as user registration and setup of the Kanban boards, and daytoday usage. No personrelated data is used for our analysis. In Table 1, we list the number of method calls recorded as well as the number of users of our Jira installation in each of the three experiment runs.
Obviously, there are differences between the four runs of the software that we analyze. For example, different students took parts in the course each time, the focus of the project required using different features of the Jira software in each iteration, and we also instructed them to use more features of the tool in the later iterations (this is one reason why the number of method calls per student is higher in the later runs of the experiment). Therefore, our four experiments—even though they are conducted using the same software system—give us slightly more variation in the data than running the exact same software with the exact same group of users. However, our main analysis results do not vary significantly between the different runs of the experiment, indicating that our findings are invariant under small changes of the experiment setup.
6 Experiment Results
6.0.1 Compared Measures
We compare the coupling degrees computed by these different approaches. Comparing the actual “raw” values of for different combinations of , , and some class or package does not make much sense: The weighted values depend on the length of the measurement run of the system, whereas the static analysis does not.
However, the absolute coupling values are usually not the most interesting results of such an analysis. For a developer, the identification of the modules with the highest coupling degree is among the most interesting results of applying a software metric. Therefore, a useful approach is to study the relationship between the orders among the modules in the different analyses: Each analysis yields an ordering of the classes or packages from the ones with the highest coupling degree to the ones with the lowest one; we call these orders coupling orders. These orders can be compared between different analyses of varying measurement durations.
Given our coupling measure definitions, we have the following choices for a lefthandside (LHS) and a righthandside (RHS) analysis:

The first choice is whether to consider class or package analyses (both the LHS and the RHS should consider the same type of module).

The second choice is which two of our three basic measurement approaches (see Section 3.1) we intend to compare: static analysis, (dynamic) unweighted analysis, and (dynamic) weighted analysis. There are three possible choices: s vs. u, s vs. w, and u vs. w.

For each combination, we consider import, export, and combined coupling.
Hence, there are comparisons we can perform in each of our four data sets, leading to different comparisons.
6.0.2 KendallTau distance
To study the difference between our different basic measurement approaches, we compare the coupling orders of the analyses using the KendallTau distance [8]: For a finite base set with size , the metric compares two linear orders and . The KendallTau distance is the number of swaps needed to obtain the order from , normalized by dividing by number of possible swaps . Hence is always between (if and are identical) and (if is “reverse” of ). Values smaller than indicate that the orders are closer together than expected from two random orders, while values larger than indicate the opposite.
6.0.3 Distance Values
To present our results, we use the following notation to specify the LHS and RHS analyses: We use a triple , where • is or expressing class or package coupling, • is or expressing whether the LHS analysis is static or (dynamic) unweighted, • is or expressing whether the RHS analysis is (dynamic) unweighted or (dynamic) weighted.
6.0.4 Statistical Significance
To measure statistical significance, we computed the absolute zscores of our experiments. The smallest observed absolute zscore among all our experiments is
, and all but two absolute values are above 10. As a point of reference, the corresponding likelihood for zscore 10 is, this is the probability to observe the amount of correlation seen in our dataset under the assumption that the compared orders are in fact independent. This indicates a huge degree of statistical significance, which is due to the large number of program units appearing in our analysis.
6.0.5 Discussion
The first obvious takeaway from the values presented in Tables 6.0.46.0.4 is that all reported distances (and of course also the average values) are below , many of them significantly so. This indicates that there is a significant similarity between the coupling orders of the static and the two dynamic analyses. This was not to be expected: While in small runs of a system, one could possibly conjecture that there might not be a large difference between the static and dynamic notions of coupling, this changes when we analyze longer system runs: In our longest experiment, we analyzed more than 2.4 billion method calls. The dynamic, weighted coupling degree of a class is the number of calls from or to methods from among these 2.4 billion calls, while its static, unweighted coupling degree is the number of classes such that the compiled code of the software contains a call from to or vice versa. A single method call in the code is only counted once in an unweighted analysis, but this call can be executed millions of times during the experiment, and each of these executions is counted in the weighted, dynamic coupling analysis. Therefore, it was not necessarily to be expected that we observe correlation between unweighted static and weighted dynamic coupling degrees.
However, our results suggest that all of the three types of analyses that we performed are correlated, with different degrees of significance. In particular, dynamic weighted coupling degrees seem to give additional, but not unrelated information compared to the static case.
The static coupling order is closer to the dynamic unweighted than to the dynamic weighted order in almost all cases. This was expected: In an hypothetical “complete run” of a system, and in the absence of issues resulting from objectoriented features these measures would coincide. On the other hand, the dynamic weighted analysis is very different from the static one by design.
A very interesting observation is that in all cases except for cases involving import coupling in our first two data sets, comparing for some coupling measure with and .
to shows that the distance from the analysis of the package case is smaller than the corresponding distance in the class case, sometimes significantly so. A possible explanation is that in the package case, the objectoriented effects that are often cited as the main reasons for performing dynamic analysis are less present, as, e.g., inheritence relationships are often between classes residing in the same package.
7 Threats to Validity
Concerning external validity, our analysis is limited by the fact that we covered only four runs, each with four weeks, of only one software system (Atlassian Jira). To address this threat, we plan to monitor additional software tools such as Jenkins and Tomcat (which are also used in the course). Concerning internal validity, our dynamic analysis omits some of Jira’s classes in order to maintain sufficient performance of the system. To ensure that our comparisons in Section 6.0.3 are conclusive, we only considered the classes and packages covered by both the static and dynamic analysis in the computation of the Kendall Tau distances. Additionally, different interpretations of what is considered as coupling between the static and in the dynamic analyses are always possible. However, since our notion of coupling is rather simple (method calls between different classes), we are confident that our static and dynamic analysis in fact use the same notion of coupling. Finally, as discussed in Section 4, we examine compiled code, not source code. When performing a similar analysis on source code, the differences between the static and the dynamic analyses would likely increase, as the dynamic analysis of course also uses compiled code. However, this can also be seen as an advantage, since this allows us to focus on the differences between static code and a running system, which is the goal of this study.
8 Conclusions and Future Work
We studied three different basic measurement approaches: Static coupling, unweighted dynamic coupling, and weighted dynamic coupling. We performed four runs of an experiment that allows to compare these metrics to static coupling measurements. Our results, as discussed in Section 6.0.5, suggest that dynamic coupling metrics complement their static counterparts: Despite the large (and expected) difference, there is also a statistically significant correlation. This suggests that further study of dynamic weighted coupling and its relationship with other coupling metrics is an interesting line of research.
A key question is how the additional information given by weighted dynamic coupling measurements can be used to evaluate the architectural quality of software systems, or more generally, to assist a software engineer in her design decisions. Coupling metrics can be used as recommenders for restructuring [11], and for static coupling measures, correlation between coupling and external quality has been observed [23]. A study of the relationship between static coupling measures and changeability and code comprehension has been performed in [33]. In [3], it is argued that unweighted dynamic metrics can be used for maintenance prediction. Since dynamic weighted metrics contain additional information compared to their unweighted counterparts, it will be interesting to study whether and how this additional information can be used in these contexts.
References
 [1] Ahmad, M.O., Markkula, J., Oivo, M.: Kanban in software development: A systematic literature review. In: 2013 39th Euromicro Conference on Software Engineering and Advanced Applications. pp. 9–16 (Sep 2013). https://doi.org/10.1109/SEAA.2013.28
 [2] Allier, S., Vaucher, S., Dufour, B., Sahraoui, H.A.: Deriving coupling metrics from call graphs. In: Tenth IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2010, Timisoara, Romania, 1213 September 2010. pp. 43–52. IEEE Computer Society (2010). https://doi.org/10.1109/SCAM.2010.25
 [3] Anuradha Chug, H.S.: Dynamic metrics are superior than static metrics in maintainability prediction : An empirical case study. In: Reliability, Infocom Technologies and Optimization (ICRITO)(Trends and Future Directions), 2015 4th International Conference on. pp. 1–6. IEEE (2015)
 [4] Apache Software Foundation: Commons BCEL: Byte code engineering library (2016), https://commons.apache.org/proper/commonsbcel/
 [5] Arisholm, E., Briand, L.C., Føyen, A.: Dynamic coupling measurement for objectoriented software. IEEE Trans. Software Eng. 30(8), 491–506 (2004). https://doi.org/10.1109/TSE.2004.41
 [6] Atlassian: JIRA project and issue tracking (2017), https://www.atlassian.com/software/jira/
 [7] Bogner, J., Wagner, S., Zimmermann, A.: Automatically measuring the maintainability of serviceand microservicebased systems: a literature review. In: Proceedings of the 27th International Workshop on Software Measurement and 12th International Conference on Software Process and Product Measurement. pp. 107–115. ACM (2017)
 [8] Briand, L., Emam, K.E., Morasca, S.: On the application of measurement theory in software engineering. Empirical Software Engineering 1(1), 61–88 (Jan 1996). https://doi.org/10.1007/BF00125812
 [9] Briand, L.C., Wüst, J.: Empirical studies of quality models in objectoriented systems. Advances in Computers 56, 97–166 (2002). https://doi.org/10.1016/S00652458(02)800055
 [10] Briand, L.C., Wüst, J., Daly, J.W., Porter, D.V.: Exploring the relationships between design measures and software quality in objectoriented systems. Journal of Systems and Software 51(3), 245–273 (2000). https://doi.org/10.1016/S01641212(99)001028
 [11] Candela, I., Bavota, G., Russo, B., Oliveto, R.: Using cohesion and coupling for software remodularization: Is it enough? ACM Trans. Softw. Eng. Methodol. 25(3), 24:1–24:28 (Jun 2016). https://doi.org/10.1145/2928268
 [12] Carver, R.H., Counsell, S., Nithi, R.V.: An evaluation of the MOOD set of objectoriented software metrics. IEEE Trans. Software Eng. 24(6), 491–496 (1998). https://doi.org/10.1109/32.689404
 [13] Chhabra, J.K., Gupta, V.: A survey of dynamic software metrics. J. Comput. Sci. Technol. 25(5), 1016–1029 (2010). https://doi.org/10.1007/s1139001093843
 [14] Chidamber, S.R., Kemerer, C.F.: Towards a metrics suite for object oriented design. In: OOPSLA. pp. 197–211. ACM (1991)
 [15] Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Software Eng. 20(6), 476–493 (1994). https://doi.org/10.1109/32.295895
 [16] Cornelissen, B., Zaidman, A., van Deursen, A., Moonen, L., Koschke, R.: A systematic survey of program comprehension through dynamic analysis. IEEE Transactions on Software Engineering 35(5), 684–702 (2009)
 [17] Fregnan, E., Baum, T., Palomba, F., Bacchelli, A.: A survey on software coupling relations and tools. Information & Software Technology 107, 159–178 (2019). https://doi.org/10.1016/j.infsof.2018.11.008
 [18] Geetika, R., Singh, P.: Dynamic coupling metrics for object oriented software systems: A survey. SIGSOFT Softw. Eng. Notes 39(2), 1–8 (Mar 2014). https://doi.org/10.1145/2579281.2579296
 [19] Gilpin, A.R.: Table for conversion of Kendall’s tau to Spearman’s rho within the context of measures of magnitude of effect for metaanalysis. Educational and Psychological Measurement 53, 87–92 (03 1993). https://doi.org/10.1177/0013164493053001007
 [20] van Hoorn, A., Waller, J., Hasselbring, W.: Kieker: A framework for application performance monitoring and dynamic software analysis. In: Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering (ICPE 2012). pp. 247–248. ACM (Apr 2012). https://doi.org/10.1145/2188286.2188326
 [21] Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1/2), pp. 81–93 (1938)
 [22] Kiczales, G., Hilsdale, E., Hugunin, J., Kersten, M., Palm, J., Griswold, W.G.: An overview of AspectJ. In: ECOOP 2001 — ObjectOriented Programming: 15th European Conference Budapest, Hungary, June 18–22, 2001 Proceedings. pp. 327–354. Springer, Berlin, Heidelberg (2001). https://doi.org/10.1007/3540453377_18
 [23] Kirbas, S., Caglayan, B., Hall, T., Counsell, S., Bowes, D., Sen, A., Bener, A.: The relationship between evolutionary coupling and defects in large industrial software. Journal of Software: Evolution and Process 29(4), e1842–n/a (2017). https://doi.org/10.1002/smr.1842, e1842 smr.1842
 [24] Knoche, H., Hasselbring, W.: Drivers and barriers for microservice adoption – a survey among professionals in Germany. Enterprise Modelling and Information Systems Architectures (EMISAJ) – International Journal of Conceptual Modeling 14(1), 1–35 (2019). https://doi.org/10.18417/emisa.14.1
 [25] Misra, S., Akman, I., Palacios, R.C.: Framework for evaluation and validation of software complexity measures. IET Software 6(4), 323–334 (2012). https://doi.org/10.1049/ietsen.2011.0206
 [26] Mitchell, B.S., Mancoridis, S.: Comparing the decompositions produced by software clustering algorithms using similarity measurements. In: 2001 International Conference on Software Maintenance, ICSM 2001, Florence, Italy, November 610, 2001. pp. 744–753. IEEE Computer Society (2001). https://doi.org/10.1109/ICSM.2001.972795
 [27] Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the 28th international conference on Software engineering (ICSE 2006). pp. 452–461. ACM (2006). https://doi.org/10.1145/1134285.1134349
 [28] Offutt, J., Abdurazik, A., Schach, S.R.: Quantitatively measuring objectoriented couplings. Software Quality Journal 16(4), 489–512 (2008). https://doi.org/10.1007/s112190089051x
 [29] Parnas, D.L.: On the criteria to be used in decomposing systems into modules. Commun. ACM 15(12), 1053–1058 (Dec 1972). https://doi.org/10.1145/361598.361623
 [30] Schnoor, H., Hasselbring, W.: Toward measuring software coupling via weighted dynamic metrics. In: Chaudron, M., Crnkovic, I., Chechik, M., Harman, M. (eds.) Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, ICSE 2018, Gothenburg, Sweden, May 27  June 03, 2018. pp. 342–343. ACM (2018). https://doi.org/10.1145/3183440.3195000
 [31] Stevens, W., Myers, G., Constantine, L.: Structured design. In: Yourdon, E.N. (ed.) Classics in Software Engineering, pp. 205–232. Yourdon Press, Upper Saddle River, NJ, USA (1979)
 [32] Voas, J.M., Kuhn, R.: What happened to software metrics? IEEE Computer 50(5), 88–98 (2017). https://doi.org/10.1109/MC.2017.144
 [33] Yacoub, S.M., Ammar, H.H., Robinson, T.: Dynamic metrics for object oriented designs. In: 6th IEEE International Software Metrics Symposium (METRICS 1999), 46 November 1999, Boca Raton, FL, USA. pp. 50–61. IEEE Computer Society (1999). https://doi.org/10.1109/METRIC.1999.809725