Interactive, Effort-Aware Library Version Harmonization

02/25/2020 ∙ by Kaifeng Huang, et al. ∙ 0

As a mixed result of intensive dependency on third-party libraries, flexible mechanism to declare dependencies, and increased number of modules in a project, multiple versions of the same third-party library are directly depended in different modules of a project. Such library version inconsistencies can increase dependency maintenance cost, or even lead to dependency conflicts when modules are inter-dependent. Although automated build tools (e.g., Maven's enforcer plugin) provide partial support to detect library version inconsistencies, they do not provide any support to harmonize inconsistent library versions. We first conduct a survey with 131 Java developers from GitHub to retrieve first-hand information about the root causes, detection methods, reasons for fixing or not fixing, fixing strategies, fixing efforts, and tool expectations on library version inconsistencies. Then, based on the insights from our survey, we propose LibHarmo, an interactive, effort-aware library version harmonization technique, to detect library version inconsistencies, interactively suggest a harmonized version with the least harmonization efforts based on library API usage analysis, and refactor build configuration files. LibHarmo is currently developed for Java Maven projects. Our experimental study on 443 highly-starred Java Maven projects from GitHub indicates that i) LibHarmo identifies 621 library version inconsistencies covering 152 (34.3 that 1 and 12 library API calls are affected, respectively due to the deleted and changed library APIs in the harmonized version. 5 library version inconsistencies have been confirmed, and 1 of them has been already harmonized by developers.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

With the increased diversity and complexity of modern systems, modular development (Schlosser, Gerhard, and Günter P. Wagner, 2004) has become a common practice to encourage reuse, improve maintainability, and provide efficient ways for large teams of developers to collaborate (Humble, Jez, and David Farley, 2010). Therefore, automated build tools (e.g., Maven) provide mechanisms (e.g., the aggregation mechanism in Maven (35)) to support multi-module projects for the ease of management and build. In contrast to the benefits that multi-module project brings to software development, one of the drawbacks is the sophisticated dependency management (colloquially termed as “dependency hell” (Jang, 2006)), exacerbated by the increased number of modules and the intensive dependency on third-party libraries. In this paper, we focus on the dependency management in Maven projects as Maven has dominated the build tool market for many years (Paraschiv, 2018).

Problem. It is quite common that different modules of a project directly depend on the same third-party libraries. Maven provides flexible mechanisms for child modules to either inherit third-party library dependencies from parent modules (e.g., the inheritance mechanism (35)) or declare their own third-party library dependencies. Besides, Maven allows the version of a third-party library dependency to be explicitly hard-coded or implicitly referenced from a property which can be declared in parent modules. Therefore, library version inconsistency can be easily caused in practice; i.e., multiple versions of the same third-party library are directly depended in different modules of a project. Even if the same version of a third-party library is directly depended in different modules, the versions can be separately declared instead of referencing a common property. We refer to it as library version false consistency as it is likely to turn into library version inconsistency when there is an incomplete library version update (e.g., a developer updates the version in one of the modules). Intuitively, library version inconsistency could increase dependency maintenance cost in the long run, or even lead to dependency conflicts (Wang et al., 2018) when modules are inter-dependent.

For example, an issue HADOOP-6800 (29) was reported to the project Apache Hadoop, and said that “multiple versions of the same library JAR are being pulled in …. Dependent subprojects use different versions. E.g. Common depends on Avro 1.3.2 while MapReduce depends on 1.3.0. Since MapReduce depends on Common, this has the potential to cause a problem at runtime”. This issue was prioritized as a blocker issue, and was resolved in 30 days. Developers found other library version inconsistencies, and finally harmonized the inconsistent versions of libraries avro, commons-logging, commons-logging-api and jets3t across modules Common, MapReduce and HDFS.

Maven’s enforcer plugin uses a dependency convergence rule to detect multiple versions of the same third-party library along the transitive dependency graph; i.e., if a module has two dependencies, A and B, and both depends on the same dependency, C, this rule will fail the build if A depends on a different version of C than the version of C depended on by B. In that sense, this rule cannot detect library version inconsistencies across modules that are not inter-dependent, and does not provide any support to harmonize inconsistent library versions. As project developers have no direct control to harmonize the inconsistent library versions in transitive dependencies, we only consider direct dependencies across modules.

Approach. To better address the problem, e.g., by realizing practical solutions that are acceptable by developers, it is important to first understand developers’ practices on library version inconsistencies. Therefore, we conduct a survey with 131 Java developers from GitHub to retrieve first-hand information about the root causes, detection methods, reasons for fixing or not fixing, fixing strategies, fixing efforts, and tool expectations on library version inconsistencies. 90.8% of participants experienced library version inconsistency, and 69.4% consider it as a problem in project maintenance. Our survey suggests several insights, e.g., tools are needed to proactively locate and harmonize inconsistent library versions, and such tools need to interact with developers and provide API-level harmonization efforts.

Then, inspired by the insights from our developer survey, we propose LibHarmo, the first interactive, effort-aware technique to harmonize inconsistent library versions in Java Maven projects. LibHarmo works in three steps. First, it identifies library version inconsistencies by analyzing build configuration files (i.e., POM files). Second, for each library version inconsistency, it suggests a harmonized version with the least harmonization efforts (e.g., the number of calls to library APIs that are deleted and changed in the harmonized version) based on library API usage analysis and interaction with developers. Finally, if developers determine to harmonize, it refactors POM files, and also suggests replacement library APIs to some deleted library APIs based on API documentations.

We have run LibHarmo against 443 highly-starred Java Maven projects from GitHub. Our experimental results have indicated that i) LibHarmo detects 621 library version inconsistencies, which cover 152 (34.3%) of projects, and ii) the average harmonization efforts are that 1 and 2 of the 24 called library APIs are respectively deleted and changed in the harmonized version, totally affecting 1 and 12 library API calls. Moreover, 5 library version inconsistencies have been confirmed, and 1 of them has been harmonized by developers.

Contributions. This paper makes the following contributions.

  • [leftmargin=*]

  • We conducted the first survey with 131 Java developers from GitHub to retrieve first-hand information about the practices and tool expectations on library version inconsistencies.

  • We proposed the first interactive, effort-aware library version harmonization technique, LibHarmo, based on our survey insights.

  • We evaluated LibHarmo on 443 highly-starred Java Maven projects from GitHub, and found 621 library version inconsistencies. 5 of them have been confirmed with 1 being harmonized.

2. Developer Survey

Our online survey is designed for developers who participated in the development of Java Maven multi-module projects. Therefore, we selected Java Maven multi-module projects from GitHub, and also restricted that the number of stars was larger than 200 to ensure the project popularity. Finally, we had 443 projects. From these projects, we collected 5,316 developers whose email on profile page was valid. We sent an email to each of the 5,316 developers to clarify the library version inconsistency problem and kindly ask them to participate in our online questionnaire survey (the questions are shown in Table 1, and the complete questionnaire with options is available at (1)). We promised that their participation would remain confidential, and all the analysis and reporting would be based on aggregated responses.

Our survey consists of 14 questions, covering the following seven aspects, to learn about their professional background, practices and tool expectations on library version inconsistencies.

Q1 How many years of Java programming experience do you have?
Q2 How many modules in a Java project did you participate in?
Q3 Have you ever encountered library version inconsistency?
Q4 Is library version inconsistency a problem during project maintenance?
Q5 What are the root causes of library version inconsistencies?
Q6 How did you detect library version inconsistencies?
Q7 What are the reasons of not fixing library version inconsistencies?
Q8 What are the reasons of fixing library version inconsistencies?
Q9 Which version do you use as the harmonized version to fix library version inconsistencies?
Q10 How do you fix library version inconsistencies?
Q11 How much time do you spend in fixing library version inconsistencies?
Q12 Which part of it is most time-consuming in fixing library version inconsistencies?
Q13 Is an automatic library version harmonization tool useful for library management?
Q14 Which features would be useful for an automatic library version harmonization tool?
Table 1. Survey Questions

Professional Background (Q1–Q4). In response to the invitation emails, 131 developers finished the questionnaire within seven days (i.e., a participation rate of 2.5%). Of all participants, 44.3% have more than 10 years of Java programming experience, 25.2% have 5 to 10 years, and 30.5% have less than 5 years. 47.3% participated in the development of more than 10 modules in one project, 23.7% participated in 5 to 10 modules, and 29.0% participated in less than 5 modules. 90.8% of participants experienced library version inconsistency, and 69.4% consider it as a problem in project maintenance. The participants have relatively good experience in modular development as well as in handling library version inconsistencies.

Root Causes (Q5). 67.1% and 65.8% named unawareness of the same library in other modules and backward incompatibility issues in library versions as the major root causes of library version inconsistencies. Different development schedule among different modules (46.1%), unawareness of the library version inconsistency problem (31.6%), and not regarding library version inconsistency as a problem (23.7%) are the further root causes. Other minor root causes (14.5%) include bad dependency management hygiene, unawareness of new library versions, usage difficulty with Maven, etc.

Detection Methods (Q6). Being asked about the detection or manifestation of library version inconsistencies, bugs due to conflicting library versions (Wang et al., 2018) (72.4%) is the main way to manifest, followed by bugs due to library API behavior changes (47.4%). Manual investigation of module POM files (46.1%) is the main way to detect, followed by communication with developers of other modules (14.5%) and adoption of Maven’s enforcer plugin (10.5%).

Reasons for Fixing or not Fixing (Q7–Q8). The participants reported four main reasons for not fixing: heavy fixing efforts due to backward incompatibility issues (45.3%), heavy fixing efforts due to intensive library API dependency (38.7%), fixing difficulty due to different development schedule in different modules (36.0%), and no serious consequence occurred (30.7%). 6.6% emphasized that they always selected to fix. On the other hand, there are three main reasons for fixing: avoiding great maintenance efforts in the long run (68.4%), ensuring consistent library API behaviors across modules (63.2%), and serious consequences occurred (e.g., bugs) (55.3%).

Fixing Strategies (Q9–Q10). When harmonizing the inconsistent library versions, 77.6% used one of the newer versions than all currently declared versions with the least harmonization efforts, but 29.0% chose one of the currently declared versions with the least harmonization efforts. Besides, 61.8% harmonized the versions in all of the affected modules, while 38.2% only harmonized the versions in some of the affected modules.

Fixing Efforts (Q11–Q12). 50.0% spent hours in fixing library version inconsistencies, 32.9% even spent days, and only 11.8% spent minutes. Besides, locating all inconsistent library versions (56.7%), determining the harmonized version (49.3%), and refactoring the source code (48.0%) are the most time-consuming steps in fixing. Other time-consuming steps include refactoring the POM files (32.0%) and verifying the fix through regression testing (6.7%).

Tool Expectations (Q13–Q14). 45.6% thought an automated library version harmonization tool would be useful, but 14.0% thought it would not be useful mostly because they already adopted Maven’s enforcer plugin. 46.5% thought it depended on how well it would be integrated into the build process, how automated it would be, etc. With respect to the most useful feature in such a tool, detecting all library version inconsistencies (75.9%) and suggesting the harmonized version (71.4%) are the most useful ones, followed by reporting detailed API-level fixing efforts (49.1%) and refactoring the POM files (42.0%). Surprising, refactoring the source code (25.0%) is less useful than all the previous features.

Figure 1. An Overview of LibHarmo

Insights. From our survey results, we have several insights. I1: tools are needed to help developers proactively locate and harmonize inconsistent library versions, as library version inconsistencies are mostly manually detected, or passively found after serious consequences. I2: developers should interact with such tools to determine where and whether to harmonize, as library version inconsistencies span multiple modules that have different development schedule, and might be not fixed due to heavy harmonization efforts. I3: such tools need to provide developers with API-level harmonization efforts, as API backward incompatibility, API dependency intensity, and API behavior consistency are key factors for developers to determine whether to harmonize. I4: such tools need to be integrated into the build process for the ease of adoption.

3. Methodology

Based on the insights I1, I2 and I3 from our developer survey, we propose the first interactive, effort-aware technique, named LibHarmo, to assist developers in harmonizing inconsistent library versions (and falsely consistent library versions). As shown in Fig. 1, it takes as an input a Java Maven project repository, and interactively works with developers in three steps, i.e., detecting inconsistency (Sec. 3.1), suggesting harmonized version (Sec. 3.2), and refactoring POMs and suggesting APIs (Sec. 3.3). LibHarmo also relies on a library database (Sec. 3.4) to provide JAR files and documentations. Currently, LibHarmo is at the stage of a prototype, and thus it is not integrated into the build process and does not satisfy the insight I4.

3.1. Detecting Inconsistency

The first step of LibHarmo is composed of three sub-steps: it first generates the POM inheritance graph, then analyzes the inheritance relations to resolve library dependencies in each POM, and finally identifies library version inconsistencies and false consistencies.

Generating POM Inheritance Graph. Maven provides the inheritance mechanism (35) to inherit elements (e.g., dependency) from a parent POM. It does not support multiple inheritance, however, it indirectly supports the concept by using the import scope (34). Maven also does not allow cyclic inheritance. Therefore, the inheritance relations among POMs in a project form a directed acyclic graph. We define such a POM inheritance graph as a 2-tuple , where  denotes all the POMs in a project, and denotes the inheritance relations among the POMs in . Each inheritance relation is denoted as a 2-tuple , where , and  inherits (i.e., is the parent POM of ).

To construct of a project, LibHarmo scans its repository recursively to collect all the local POMs and put them into . Then, for each POM in , LibHarmo parses it to locate its parent POMs based on the inheritance mechanism and the import scope; i.e., LibHarmo parses the parent section and the dependencyManagement section. For each located parent POM , an inheritance relation is generated and put into . As can be a remote POM, LibHarmo crawls it from Maven repository, and puts it into .  is constructed after all the local and remote POMs in are parsed.

Figure 2. An Example of POM Inheritance Graph
Example 3.1 ().

Fig. 2 presents a generated POM inheritance graph, where the nodes represent POMs, the arrows represent inheritance relations, and the dotted lines link to excerpts from POMs. Here AB, C, D and E are local POMs, and R is a remote POM. B has two parent POMs, A and R. In particular, B inherits A by declaring the groudId, artifactId and version of A in the parent section (Line 1–5 in B). B inherits R by declaring the groudId, artifactId and version of R in a dependency with type being pom and scope being import in the dependencyManagement section (Line 7–17 in B).

Resolving Library Dependencies. We first introduce Maven’s dependency declaration mechanisms before diving into the details. The dependencies section contains the library dependencies that a POM declares to use, and such library dependencies will be automatically inherited by child POMs, whereas the dependencyManagement section contains the library dependencies that a POM declares to manage, and such library dependencies will be used/inherited only when they are explicitly declared in the dependencies section without specifying their version. Moreover, the version of a library dependency can be explicitly declared by a hard-coded value or implicitly declared via referencing a property. A property can be overwritten by declaring the same property with a different value.

Example 3.2 ().

In Fig. 2, B declares two library dependencies B wants to use, and the versions are hard-coded (Line 20–29 in B). C declares one library dependency C wants to use (Line 10–16 in C); and C also declares one library dependency C wants to manage (Line 1–9 in C), and the version references a property, guava.version, which is declared in Line 5–7 in A. D automatically inherits the library dependency in Line 10–16 in C; and D also inherits the managed library dependency in Line 1–9 in C by explicitly declaring it in Line 1–6 in D. E inherits from D the two library dependencies D inherits from C.

Based on the dependency declaration mechanisms, all the library dependencies of a POM can be resolved based on the resolved library dependencies of its ancestor POMs. To ease the detection and harmonization of inconsistencies and false consistencies, we first define a library dependency as a 6-tuple , where denotes a library, uniquely identified by its groupId (i.e., the organization belongs to) and artifactId (i.e., the name of ); denotes the resolved version number of  denotes the property that the version of references, and it will be null when the version of  is hard-coded;  denotes the POM that owns either by declaration or inheritance; denotes the POM that declares the version of ; and  denotes the POM that declares , and it will be null when is null.

For each POM  in , we resolve ’ library dependencies that are declared in  or inherited from ancestors of . To this end, LibHarmo performs a breath-first search on to visit  and ’s ancestors while following Maven’s “nearest definition wins” and “first declaration wins” strategy (34). For each visited POM, we parse each library dependency in the dependencies section to create a  and put to , and analyze the properties and dependencyManagement section to resolve the unresolved version of library dependencies in . Finally, we get all library dependencies .

E
D ¡guava, , , E, , ¿
C ¡guava, , guava.version, E, C, ¿ ¡commons-io, 2.5, null, E, C, null¿
A ¡guava, 16.0.1, guava.version, E, C, A¿ ¡commons-io, 2.5, null, E, C, null¿
Table 2. An Example of Resolving Library Dependencies
Example 3.3 ().

Table 2 presents the the process of resolving library dependencies for E in Fig. 2 along its inheritance hierarchy. At E, as E does not declare any library dependency, no library dependency is created. Next, at E’s parent D, guava is declared but its version is not declared. Hence, is created with and set to guava and E. Next, at D’s parent C, ’s version is declared by referencing a property. Thus, ’s and is set to guava.version and C. Meanwhile, C declares commons-io and hard-codes its version. Thus,  is created as commons-io, 2.5, null, E, C, null. Finally, at C’s parent A, the property guava.version is declared, and thus ’s and is set to 16.0.1 and A.

Identifying Inconsistencies and False Consistencies. As we do not have direct control over remote POMs, we remove from  the library dependencies whose is a remote POM. However,  it is possible that the library dependencies of local POMs are inherited from remote POMs. To detect library version inconsistencies and false consistencies, we first identify the libraries from , i.e., . Then, for each , we find all the library dependencies . Finally, we determine the consistency of

by classifying it into the following four types.

  • [leftmargin=*]

  • Inconsistency (IC). belongs to the type of inconsistency if the library dependencies in do not have the same version; i.e., satisfies .

  • True Consistency (TC). belongs to the type of true consistency if all the library dependencies in have the same version by referencing one property; i.e., satisfies .

  • False Consistency (FC). belongs to the type of false consistency if all the library dependencies in have the same version but do not reference one property (i.e., the version is resolved by referencing different properties or by hard-coding); i.e., satisfies .

  • Single Library (SL). belongs to the type of single library if there is only one library dependency in (i.e., ).

Figure 3. The Resolved Library Dependencies of Fig. 2
Example 3.4 ().

Fig. 3 presents all the resolved library dependencies of Fig. 2, which involve two libraries guava and commons-io. Hence, we have and . belongs to IC, and belongs to FC.

3.2. Suggesting Harmonized Version

For a false consistency, the same version is adopted across different library dependencies, and it is likely to turn into an inconsistency if there is an incomplete library version update (e.g., a developer only updates the version of some of the library dependencies). Hence, it also needs to be harmonized to become a true consistency (which will be introduced in Sec. 3.3). Here we directly suggest the currently used version as the harmonized version to reduce the harmonization efforts. On the other hand, for an inconsistency , we first analyze the harmonization efforts at the library API level, and then interactively suggest a harmonized version with the least efforts.

Analyzing Harmonization Efforts. Basically, we measure the harmonization efforts in terms of the number of calls to library APIs that are deleted or changed in the harmonized version, because the deleted library APIs may cause program crashes, while the changed library APIs may introduce API breaking. Hence, for each , LibHarmo first applies JavaParser (Smith et al., 2017) on the src folder that has the same prefix path to , together with the JAR files from our library database (see Sec. 3.4), to locate API calls to . Thus, we have a set of called library APIs and a set of library API calls .

Then, LibHarmo determines the candidate library versions  for harmonization from our library database which contains all the released versions of . Here, we compute as the versions that are no older than the highest version in as developers tend to use newer versions, as suggested by our survey. Next, for each candidate version , LibHarmo locates the called library APIs in that are deleted or changed in the candidate version . Here, an library API is deleted in if there is no library API with the same fully qualified name in . An library API is changed in if its fully qualified name is not changed but the body code of the library API or the code of the methods in its static call graph is changed. LibHarmo uses java-callgraph (Gousios, ) to extract the static call graph. Thus, we decompose into three sets , and , respectively representing the called library APIs in that are deleted, changed and unchanged in . Correspondingly, we can decompose into three sets , and , respectively representing the calls to the library APIs in , and (i.e., the calls to the deleted, changed and unchanged library APIs). Therefore, the efforts  to harmonize to the version can be characterized as a 6-tuple, i.e., .

Interactively Recommending Harmonized Version. As revealed by our survey (see Sec. 2), developers may choose to not harmonize all inconsistent library dependencies due to various reasons (e.g., different development schedule, or heavy efforts due to API dependency intensity or backward incompatibility). Thus, LibHarmo is designed to interact with developers such that 1) developers are provided with detailed library API-level harmonization efforts for each library dependency to be harmonized into each candidate version ; 2) developers have the flexibility to decide which of the library dependencies  need to be harmonized; and 3) developers are provided with a ranked list of candidate versions based on flexible combinations of , , , , and (e.g., the default ranking is based on the summation of and over all library dependencies in ) such that they can choose the harmonized version with the least harmonization efforts they consider acceptable.

To ease the determination of , we first decompose  according to ; i.e., the library dependencies that have their version declared in the same POM are grouped into actually belongs to the type of true consistency (or single library), and should be harmonized together to still keep the consistency. For example, in Example 3.4 can be decomposed into and . Based on the decomposition, we allow developers to determine which groups need to be harmonized.

3.3. Refactoring POMs and Suggesting APIs

The last step of LibHarmo is to provide support to carry out the harmonization on POMs and source code. LibHarmo can automatically refactor POMs based on the library dependencies  that developers choose from an inconsistency and the harmonized version that developers choose. The POM refactoring is exactly the same for false consistencies. Besides, LibHarmo provides conservative support for library API adaptation; i.e., it only suggests replacement library APIs to some of the deleted library APIs based on the extracted information from API documentations.

Refactoring POMs. The goal of our harmonization is to make become a true consistency; i.e., all the library dependencies in need to have their version reference a property of value . To this end, LibHarmo first locates the POMs that declare the version of the library dependencies in ; i.e., . On one hand, the lowest common ancestor of the POMs in on the POM inheritance graph is the POM where LibHarmo newly declares a property of value . On the other hand, contains the POMs where LibHarmo changes the (implicit or explicit) version declaration of to a reference to the newly declared property. Occasionally, the lowest common ancestor could be a remote POM that we do not have direct control, or contains several sub-graphs that are not connected. Thus, LibHarmo finds several lowest common ancestors, each of which is the lowest common ancestor of some POMs in , and then applies the similar refactoring process.

Finally, LibHarmo checks whether the properties that are referenced in are referenced by the other library dependencies in . Specifically, for each library dependency that declares the version by referencing a property, LibHarmo extracts a 2-tuple , and checks whether there exists a library dependency in such that . If exists, is still referenced by other library dependencies, and thus it is kept; otherwise, LibHarmo deletes from .

Figure 4. An Example of Refactoring POMs
Example 3.5 ().

Given in Example 3.4 and the harmonized version 23.0, is computed as , and their lowest common ancestor is A. Hence, a property guava.new.version is declared at Line 6 in A in Fig. 4, and B and C respectively change the version declaration at Line 28 in B and at Line 6 in C to reference the property guava.new.version. Moreover, as the previous property guava.version is not referenced by other library dependencies, it is deleted from A. Similarly, for the false consistency in Example 3.4, is computed as . Hence, a property commons-io.version is declared in A, the lowest common ancestor of B and C; and the version declaration at Line 23 in B and at Line 14 in C is changed to reference commons-io.version.

Suggesting APIs. After POMs are refactored, the source code also needs to be adapted to the harmonized version. Library API adaptation has been widely investigated (Chow and Notkin, 1996; Balaban et al., 2005; Henkel and Diwan, 2005; Xing and Stroulia, 2007; Dagenais and Robillard, 2009, 2011; Schäfer et al., 2008; Nguyen et al., 2010; Fazzini et al., 2019; Wu et al., 2010), and empirical studies (Cossette and Walker, 2012; Wu et al., 2015) have shown that they achieved an average accuracy of 20%. Besides, our survey indicates that refactoring the source code is surprisingly a less useful feature. Based on the two observations, LibHarmo takes a conservation strategy to only consider the library APIs that are deleted in the harmonized version and attempt to suggest their replacement library APIs. The reason is that some library APIs are deleted because they are deprecated and their replacement library APIs might be clearly documented in the deprecated page in Javadoc in the form of “use xxx” where xxx is the fully qualified name of a replacement library API.

Specifically, for each deleted library API , LibHarmo first obtain from our library database the Javadocs of all the library releases released between the release date of the library version and the release data of the harmonized version . Notice that these library releases could possibly document the deprecation of . Then, for each Javadoc, LibHarmo checks in the deprecated page whether the library API is deprecated; and if yes, LibHarmo

uses pattern matching to search the existence of “use

xxx”. If exists, the replacement library API to is found, and suggested to developers.

3.4. Library Database

Recall that our harmonization efforts analysis (see Sec. 3.2) requests from the library database the JAR files of a library version and some newer releases of the same library, and our replacement library API suggestion (see Sec. 3.3) requests the Javadocs of some library releases from the library database. Therefore, LibHarmo crawls the JAR files and Javadocs of all releases of a library in a demand-driven way from Maven repository. Besides, LibHarmo regularly updates any new library releases for the libraries in our library database.

4. Evaluation

We have implemented a prototype of LibHarmo in Java and Python in a total of 14.6K lines of code. We have released the source code at our website (1). In this section, we report our experimental results of LibHarmo on GitHub projects.

(a) Distribution
(b) Intersection
Figure 5. Overall Distribution of IC, FC, TC and SL
(a) Distribution
(b) Projects Having IC, FC, TC and SL
Figure 6. Fine-Grained Distribution of IC, FC, TC and SL

4.1. Evaluation Design

We used the same set of 443 Java Maven multi-module projects used in our survey as the dataset for our evaluation. We designed our evaluation to answer the following four research questions.

  • [leftmargin=*]

  • RQ1: What is the distribution of inconsistency, false consistency, true consistency, and single library? (Sec. 4.2)

  • RQ2: What is the severity of the detected inconsistency and false consistency? (Sec. 4.3)

  • RQ3: What are the efforts for harmonizing inconsistency? (Sec. 4.4)

  • RQ4: What is developers’ feedback about LibHarmo? (Sec. 4.5)

Specifically, we ran LibHarmo against each project to 1) detect all the inconsistencies and false consistencies, which is used to answer RQ1 and RQ2, 2) analyze the harmonization efforts for each inconsistency for each candidate harmonized version, which is used to answer RQ3, and 3) generate a report including the previous two set of information and send it to developers, which is used to answer RQ4.

4.2. Distribution Evaluation (RQ1)

We first measured the overall distribution of inconsistency, false consistency, true consistency and single library, and then measured their fine-grained distribution with respect to the modular complexity of projects (approximated as the number of POMs).

Overall Distribution. Fig. 4(a) shows the overall distribution of inconsistency (IC), false consistency (FC), true consistency (TC), and single library (SL) (see Sec. 3.1). SL accounts for 61.0%, and IC, FC and TC account for 39.0%, which means that more than one-third of the libraries are used across multiple modules. More specifically, TC accounts for 22.6%, which is much higher than IC and FC, and covers 318 projects. This indicates that library version harmonization (via referencing a property) is already a practice that is adopted by many projects. Nevertheless, there are still 2,576 cases of FC. They account for 13.2% and cover 346 projects. They are very likely to turn into IC if not carefully maintained, and thus increase the burden of library maintenance. There are 621 cases of IC, which account for 3.2% and cover 152 projects. These results indicate that library version inconsistency and false consistency are common in real-world projects.

Moreover, Fig. 4(b) reports the intersections among the projects that are affected by IC, FC and TC. Noticeably, there is a high overlap (i.e., 251 projects) between TC and FC. This indicates that while many projects adopt consistent library versions, they still leave many libraries not truly consistent. Similarly, the libraries in 51 projects are all consistent, while the libraries in 70 projects are all falsely consistent. Moreover, the overlap between FC and IC is also high (i.e., 134 projects), and most of the projects that have IC also have FC. This is potentially because that FC has a high chance to turn into IC. Furthermore, 109 projects have IC, FC and TC at the same time, which indicates that using consistent library versions is not consistently recognized across the whole development team of a project.

(a) Distribution across the Number of Affected POMs
(b) Distribution across the Ratio of Affected POMs
Figure 7. Distribution of IC and FC across Affected POMs
(a) Distinct Versions
(b) Distinct Versions w.r.t. Affected POMs
Figure 8. Distribution of Distinct Versions in IC

Fine-Grained Distribution. The bars in Fig. 5(a) report the total number of projects whose number of POMs is in a specific range. As we can see, nearly half (47.6%) of the projects have less than 10 POMs, and only 22.3% of projects have more than 30 POMs. These results indicate that most projects have moderate complexity in modules. The four curves in Fig. 5(a) reports the distribution of IC, FC, TC and SL as projects’ complexity in modules increases, while the four curves in Fig. 5(b) correspondingly present the ratio of projects that have IC, FC, TC and SL. We can see that, the ratio of IC slightly increases and the ratio of projects having IC greatly increases. This indicates that as projects have more complexity in modules, library maintenance becomes more complicated, and hence there is a higher chance to introduce inconsistencies. Besides, the ratio of FC and TC does not decrease, and the ratio of projects having FC and TC even increases. This indicates that although projects become more complex in modules, developers may still willing to keep library versions consistent. In that sense, LibHarmo can help developers systematically detect inconsistencies or false consistencies as early as possible.

[size=title, opacityfill=0.15] LibHarmo detected 621 library version inconsistencies and 2,576 false consistencies, accounting for 16.4% while affecting 364 projects. As projects have more complexity in modules, it becomes more likely to introduce inconsistencies.

4.3. Severity Evaluation (RQ2)

We analyzed the severity of a detected inconsistency or false consistency (i.e., ) in terms of four indicators: 1) the number of POMs that are affected (i.e., ), 2) the ratio of POMs that are affected (i.e., / ), 3) the number of distinct versions declared in , and 4) whether the versions of library dependencies in are all explicitly declared (i.e., hard-coded), all implicitly declared (i.e., via referencing a property), or declared in a mixed way. The third indicator is only applicable for inconsistencies as false consistencies only have one version. The higher the first three indicators, the more versions are simultaneously adopted in more POMs, and thus the more severe the inconsistency or false consistency. For the fourth indicator, we regard explicit declaration is more severe than mixed declaration and implicit declaration because it indicates that developers seem to have no sense to harmonize library versions via a property. We report the aggregated result over all consistencies or false consistencies for each of the four indicators.

Affected POMs. Fig. 6(a) presents the distribution of IC and FC with respect to the number of affected POMs. 67.8% of ICs and 70.8% of FCs affect less than five POMs, and 21.3% of ICs and 12.9% of FCs affect more than ten POMs. On the other hand, Fig. 6(b) reports the distribution of IC and FC with respect to the ratio of affected POMs. 70.9% of ICs and 53.5% of FCs affect less than 20% of POMs, while 9.7% of ICs and 22.9% of FCs affect more than 50% of POMs. These results indicate that most ICs and FCs affect a relatively small number of POMs, but still around one-tenth of ICs and one-fifth of FCs could involve a relatively large number of POMs.

Distinct Versions. Fig. 7(a) reports the distribution of distinct versions in inconsistencies. We can see that 81.8% of ICs only have two distinct versions, and only 3.7% of ICs have more than five distinct version. Moreover, we generated a box plot for each bar in Fig. 7(a) to measure the affected POMs. The result is reported in Fig. 7(b)

, where the arrows indicate higher outliers that we hide to enhance the comprehension of the box plots. As the number of distinct versions in ICs increases, the number of affected POMs increases. In regard of the ICs that have two distinct versions, the median number of affected POMs is around

three. This indicates that most ICs are still manageable if developers want to harmonize them. Still, there are 80 outliers in the first box plot, and some of them can affect around 400 POMs. We looked into these 80 outliers, and found that in 72 (90.0%) outliers, more than 80% of the POMs use one version, while less than 20% of the POMs use the other version. More interestingly, in 58 (72.5%) outliers, one of the distinct version is only used in one POM. One potential reason is that developers have to use a specific version in a minority of POMs to avoid the heavy API backward incompatibility in them. Another potential reason is that developers are unaware of the minority of POMs that use a distinct version due to the complex POM inheritance graph.

(a) IC
(b) FC
Figure 9. Version Declaration Distribution in IC and FC

Version Declaration. Fig. 9 shows the distribution of version declarations (i.e., explicit declaration (EX), implicit declaration (IM) and mixed declaration (MX)) for IC and FC. It turns out that 94.1% of FCs declare versions by hard-coding. This means that developers need to change all the affected POMs at the same time to keep these FCs consistent rather than turning such FCs into ICs, which is actually a huge but avoidable maintenance cost. Besides, 36.1%

 of ICs declare versions by hard-coding. This shows that hard-coding version numbers is probably not a good practice, and it tends to introduce inconsistencies.

63.9% of ICs include implicitly declared versions. This means that developers already have the sense to declare versions by referencing a property for reducing library maintenance cost, but IC still exists. As revealed by our survey (see Sec.2), there are multiple reasons for not harmonizing inconsistencies intentionally. We attempted to manually look into these cases to determine whether our detected inconsistencies were intentionally kept. However, it is very challenging to confirm the underlying reasons as they are often business logic-related. This is also one of the reasons that we allow developers to interact with LibHarmo. Nevertheless, we did find some cases that conformed to our survey; but we also found one case that was not reported by our survey, i.e., developers intentionally declare multiple properties for different versions of the same library to provide comprehensive support for different runtime environments. For example, project memcached-session-manager is a tomcat session manager that keeps sessions in memcached or Redis, for highly available, scalable and fault tolerant web applications. It declares four properties for version 6.0.45, 7.0.85, 8.5.29 and 9.0.6 for various tomcat dependencies, as it is currently working with tomcat 6.x, 7.x, 8.x and 9.x (as explained in its README file).

[size=title, opacityfill=0.15] 67.8% of ICs and 70.8% of FCs affect less than five POMs. 81.8% of ICs only have two distinct versions, affecting a median number of three POMs. 36.1% of ICs and 94.1% of FCs declare all versions by hard-coding. Overall, the severity of ICs and FCs is relatively not high.

4.4. Efforts Evaluation (RQ3)

Figure 10. Harmonization Efforts

We analyzed the harmonization efforts for each of the 621 inconsistencies for each candidate harmonized version. We report the results for the candidate version with the least harmonization efforts selected based on our default ranking (see Sec. 3.2). Fig. 10 shows two box plots (one denotes the number of APIs, and the other denotes the number of API calls) for deleted, changed, unchanged, and total library APIs that are called. Overall, 190 (30.6%) ICs have no harmonization efforts; i.e., all the invoked library APIs are not changed in the suggested harmonized version. In the remaining 431 ICs, on average, 1 and 2 of the 24 called library APIs are respectively deleted and changed in the suggested harmonized version, affecting 1 and 12 of the 63 library API calls, as indicated by the green diamonds. These results indicate that the harmonization efforts with respect to the number of deleted and changed APIs seem small. However, the actual harmonization efforts depend on how the deleted or changed APIs affect the business logic. Therefore, we choose to provide developers with detailed API-level report to assist them in determining where and whether to harmonize.

Replacement API to Deleted APIs. In these suggested harmonized versions for the 621 inconsistencies, a total of 1,798 library APIs are deleted. Our documentation-based replacement API suggestion method (see Sec. 3.3) successfully find the replacement APIs for 207 (11.5%) of them. Such a low coverage indicates the potential efforts in refactoring source code. Contrarily, our survey indicates that automatic source code is a less useful feature. This is also why we decide to design such an API suggestion method that only suggests replacement APIs with 100% accuracy.

Case Studies. Here we choose a popular project Apache Tika to demonstrate what our report tells to the developers. Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. It is a project of the Apache Software Foundation. LibHarmo identifies four inconsistencies and six false consistencies.

One of the inconsistencies is about library commons-cli. It involves three modules: tika-server, tika-batch and tika-eval. The first module explicitly declares commons-cli with version 1.2. The latter two reference two properties with the same property name cli.version defined in their own POM file, and both of them declare version 1.4. Therefore, an inconsistency occurs. Version 1.4 is suggested as the harmonized version as it leads to the smallest number of calls to the deleted and changed library APIs, indicating that updating commons-cli from 1.2 to 1.4 in tika-server solves the inconsistency. It turns out that the number of called library APIs in tika-server is 5, and there are totally 33 library API calls. These 5 library APIs are all changed in version 1.4. Finally, a lowest common ancestor POM file, tika-parent/pom.xml, is located, where a new property specifying version 1.4 is declared. The two old properties declared separately in tika-batch and tika-eval can be removed safely because no other library dependency references them. In fact, according to the comment in the POM file, developers actually have already noticed this inconsistency, and thought about moving the property declaration into tika-parent/pom.xml (as suggested by LibHarmo) to harmonize the inconsistency. However, the harmonization was postponed due to migration efforts.

One of the false consistencies is about json-simple. It involves two modules: tika-parsers and tika-translate. In these two modules, json-simple is declared explicitly with version 1.1.1. Similar to the previous inconsistency example, LibHarmo searches the POM inheritance graph and locates tika-parent/pom.xml as a common parent POM file; then it declares a property of version 1.1.1, and refactors json-simple dependency declaration in tika-parsers and tika-translate to implicitly reference the new property.

[size=title, opacityfill=0.15] In 190 (30.6%) ICs, all the called library APIs are not changed in the suggested harmonized version. In the remaining 431 ICs,

on average, 1 and 2 of the 24 called library APIs are deleted and changed, affecting 1 and 12 library API calls. Overall, the harmonization efforts seem relatively small but the true efforts are often application-specific.

4.5. Developer Feedback (RQ4)

To understand developers’ feedback about LibHarmo, we targeted 621 inconsistencies in 152 projects, and sent our generated report to the developers of these projects. Within one week, 16 developers replied. 8 of them explicitly commented that version inconsistency is certainly a problem for library maintainers, and our tool and report are useful; e.g., “the problem you’re describing is very real, and I have encountered it myself in my day-to-day job several times”, “keep up the good work with your harmonization tool. It definitely sounds interesting!”, and “the cool reports here helped me find one real issue, thanks!”. 5 of them did not comment on our tool, but only discussed the inconsistencies, and 3 of them were no longer Java developers or no longer in charge of the projects.

Moreover, 4 developers confirmed the inconsistencies but explained that they were intentionally kept due to API compatibility, and 1 developers confirmed and quickly fixed the consistency. As we crawled the project repositories several months before our report, 4 developers asked us to re-generate the report for their current repositories, and we are still waiting for their further feedback. The others are still under discussion.

Interestingly, a developer from hadoop confirmed that adopting consistent libraries is one of their common practice, but “people neglect to do this; when that’s found we will pull the explicit version declaration out and reference from hadoop-project; adding the import there if not already present. Therefore any duplicate declaration of a dependency with its own ¡version¿ field in any module other than hadoop-project is an error. Your dependency graphs are helpful here”. It is also worth mentioning that 4 developers commented that they also cared about inconsistencies in transitive dependencies, but also said that “it is also very hard to fix, since the source code is not owned by me”. This is why we only focus on direct dependencies.

[size=title, opacityfill=0.15] Half of the responded developers thought that our tool and report are useful. 5 inconsistencies have been confirmed and 1 of them has been harmonized.

4.6. Discussions

Threats to Survey. First, we chose an online survey with GitHub developers instead of a face-to-face interview study with industrial developers, because it is difficult to recruit industrial developers for interviews at a reasonable cost, and an online study allows us to recruit a relatively large number of developers. Second, we decided to not offer compensation but kindly ask participants to voluntarily take the survey. As a result, we expected that GitHub developers who were really interested in library version inconsistencies and well motivated would participate in this survey. This instead could improve the quality of our survey to avoid potential cases that participants only wanted the compensation but answered haphazardly.

Threats to Evaluation. First, as we have not integrated our tool into the build process, it is not feasible for us to empirically evaluate the soundness of our tool in refactoring POMs on a large-scale of diverse projects. However, we refactor POMs in such a non-invasive way that does not change the inheritance relationship among POMs. We believe our POM refactoring is sound. Second, due to the same reason, we generated reports about inconsistencies and sent reports to relevant developers for obtaining their feedback instead of letting developers directly use our tool on their projects. While this may not get the first-hand information from developers, it relieves the burden of developers to install our tool and only focuses on the results. We believe this can help us obtain more feedback.

Limitations. First, due to the well-known limitation of static analysis, our generated API call graphs could be unsound (e.g., due to reflection), which will affect the precision of our API-level harmonization efforts analysis. We will investigate a combination of static analysis and dynamic analysis to make the call graph generation more precise. Second, we currently only target Maven Java projects, but there are several other automated build tools such as Gradle and Ant. The reasons we currently choose Maven are that 1) Maven is the most widely-used build tool for Java project, and 2) many Maven projects have been developed for decades and these long-history projects may be mostly beneficial from our tool. Given the positive feedback from developers, we plan to develop corresponding tools for other automated build tools.

5. Related Work

In this section, we review the related work in four areas: library analysis, API evoluation, API adaptation, and library empirical studies.

5.1. Library Analysis

Patra et al. (Patra et al., 2018) analyzed JavaScript library conflicts caused by the lack of namespaces in JavaScript, and proposed ConflictJS to first use dynamic analysis to identify potential conflicts and then use targeted test synthesis to validate them. Wang et al. (Wang et al., 2018) investigated the manifestation and fixing patterns of dependency conflicts in Java, and designed Decca to detect dependency conflicts and assess their severity via static analysis. Wang et al. (Wang et al., 2019) also proposed Riddle to generate crashing stack traces for detected dependency conflicts. Such dependency conflicts are one of the bad consequences of inconsistent library versions.

Cadariu et al. (Cadariu et al., 2015) proposed an alerting tool to notify developers about Java library dependencies with security vulnerabilities. Mirhosseini and Parnin (Mirhosseini and Parnin, 2017) compared the usage of pull requests and badges to notify outdated npm packages. These approaches only detect the inclusion of vulnerable libraries. To further determine if the vulnerable library code is in the execution path of a project, Plate et al. (Plate et al., 2015) applied dynamic analysis to check whether the vulnerable methods were executed by a project; and Ponta et al. (Ponta et al., 2018) extended it by combining dynamic analysis with static analysis. It is interesting to also consider security vulnerabilities as another factor when we recommend harmonized versions in LibHarmo.

Bloemen et al. (Bloemen et al., 2014) analyzed the evolution of the Gentoo package dependency graph, while Kikas et al. (Kikas et al., 2017) and Decan et al. (Decan et al., 2019) compared the evolution of dependency graphs in different ecosystems. Kikas et al. (Kikas et al., 2017) and Decan et al. (Decan et al., 2018b) also investigated the impact of security vulnerabilities on the dependency graph. Zimmermannet et al. (Zimmermann et al., 2019) further modeled maintainers and vulnerabilities into the dependency graph in the npm ecosystem, and systematically analyzed the risk of attacked packages and maintainers and vulnerabilities. LibHarmo can be extended to support library version inconsistency analysis on the ecosystem-level dependency graph.

To the best of our knowledge, no previous work has systematically investigate library version inconsistency.

5.2. API Evolution

A large body of studies have been focused on API evolution to analyze how developers react to API evolution (Robbes et al., 2012; Sawant et al., 2016; Hora et al., 2015; McDonnell et al., 2013; Bogart et al., 2016), how APIs are changed and used (Wu et al., 2016), how API stability is measured (Raemaekers et al., 2012), how API stability affects Android apps’ success (Linares-Vásquez et al., 2013), how refactoring influences API breaking (Dig and Johnson, 2006; Kim et al., 2011; Kula et al., 2018b), how and why developers break APIs (Jezek et al., 2015; Xavier et al., 2017b; Brito et al., 2018b), how API breaking impacts client programs (Xavier et al., 2017a; Raemaekers et al., 2017), etc. Moreover, several advances have been made to detect API breaking. Previous work mostly uses theorem proving (McCamant and Ernst, 2003, 2004; Lahiri et al., 2012; Godlin and Strichman, 2013; Felsing et al., 2014) or symbolic execution (Trostanetski et al., 2017; Mora et al., 2018), but has scalability issues when detecting breaking APIs in real-life program. Recently, testing techniques have been used to detect breaking APIs. Gyori et al. (Gyori et al., 2018) relied on regression tests, while Soares et al. (Soares et al., 2010) generated new tests to detect behavior changes in refactored APIs. Mezzetti et al. (Mezzetti et al., 2018) and Møller and Torp (Møller and Torp, 2019) targeted Node.js libraries, and used model-based testing to detect type-related breaking (i.e., changes to API signatures). Similarly, Brito et al. (Brito et al., 2018a)

used heuristics to statically detect type-related changes in Java libraries. However, it is an open problem to detect behavior changes when API signatures are not changed but the API bodies are changed. We will extend such approaches to improve the precision of our effort analysis.

5.3. API Adaptation

A large number of advances have been made to adapt client programs to API evolution according to change rules. Change rules can be manually written by developers (Chow and Notkin, 1996; Balaban et al., 2005), automatically recorded from developers (Henkel and Diwan, 2005), derived through API similarity matching (Xing and Stroulia, 2007), mined from API usage changes in libraries themselves (Dagenais and Robillard, 2009, 2011) as well as client programs (Schäfer et al., 2008; Nguyen et al., 2010; Fazzini et al., 2019), and extracted by a combination of some of these methods (Wu et al., 2010). Several empirical studies have also been proposed to investigate the effectiveness of these methods (Cossette and Walker, 2012; Wu et al., 2015). They found that these methods only achieved an average accuracy of 20%, but still could help developers. Currently, we only recommend change rules that are extracted from documentation and hence are absolutely correct, and we will integrate these methods to recommend undocumented change rules.

5.4. Library Empirical Studies

A large body of studies has been focused on characterizing the usage and update practice of libraries in different ecosystems, e.g., the usage trend and popularity of libraries and APIs (Bauer et al., 2012; Bauer and Heinemann, 2012; Mileva et al., 2009; Kula et al., 2017; Mileva et al., 2010; Hora and Valente, 2015; Lämmel et al., 2011; De Roover et al., 2013; Qiu et al., 2016; Wittern et al., 2016; Abdalkareem et al., 2017; Li et al., 2016), the practice of updating library versions (Kula et al., 2018a; Bavota et al., 2013), the latency of updating library versions (Kula et al., 2015; Cox et al., 2015; Lauinger et al., 2017; Decan et al., 2018a), and the reason of updating or not updating library versions (Bavota et al., 2013, 2015; Derr et al., 2017; Kula et al., 2018a). To the best of our knowledge, we are the first to systematically understand library version inconsistency in real-life projects through a survey with GitHub developers.

6. Conclusions

In this paper, we have conducted a survey with 131 Java developers from GitHub to collect the first-hand information about root causes, detection methods, reasons for fixing or not fixing, fixing strategies, fixing efforts, and tool expectations on library version inconsistencies. Our survey suggests several insights, e.g., tools are needed to proactively locate and harmonize inconsistent library versions, and such tools need to interact with developers and provide API-level harmonization efforts. Based on such insights, we have developed LibHarmo, the first interactive, effort-aware technique to harmonize inconsistent library versions in Java Maven projects. We have evaluated LibHarmo against 443 Java Maven projects from GitHub. LibHarmo successfully detects 621 library version inconsistencies, covering 152 of projects, as well as 2,576 false consistencies, covering 346 projects. The average harmonization efforts are that 1 and 2 of the 24 called library APIs are respectively deleted and changed in the harmonized version, affecting 1 and 12 library API calls. Moreover, 5 library version inconsistencies have been confirmed, and 1 of them has been harmonized by developers. In future, we plan to integrate LibHarmo into the build process so that developers can more intensively and naturally interact with LibHarmo. We also hope to extend LibHarmo to support other automation build tools (e.g., Gradle), and develop advanced API breaking analysis techniques to improve the accuracy of our API-level effort analysis.

References

  • [1] (Website) External Links: Link Cited by: §2, §4.
  • R. Abdalkareem, O. Nourry, S. Wehaibi, S. Mujahid, and E. Shihab (2017) Why do developers use trivial packages? an empirical case study on npm. In FSE, pp. 385–395. Cited by: §5.4.
  • I. Balaban, F. Tip, and R. Fuhrer (2005) Refactoring support for class library migration. In OOPSLA, pp. 265–279. Cited by: §3.3, §5.3.
  • V. Bauer, L. Heinemann, and F. Deissenboeck (2012) A structured approach to assess third-party library usage. In ICSM, pp. 483–492. Cited by: §5.4.
  • V. Bauer and L. Heinemann (2012) Understanding api usage to support informed decision making in software maintenance. In CSMR, pp. 435–440. Cited by: §5.4.
  • G. Bavota, G. Canfora, M. Di Penta, R. Oliveto, and S. Panichella (2013) The evolution of project inter-dependencies in a software ecosystem: the case of apache. In ICSM, pp. 280–289. Cited by: §5.4.
  • G. Bavota, G. Canfora, M. Di Penta, R. Oliveto, and S. Panichella (2015) How the apache community upgrades dependencies: an evolutionary study. Empirical Software Engineering 20 (5), pp. 1275–1317. Cited by: §5.4.
  • R. Bloemen, C. Amrit, S. Kuhlmann, and G. Ordóñez–Matamoros (2014) Gentoo package dependencies over time. In MSR, pp. 404–407. Cited by: §5.1.
  • C. Bogart, C. Kästner, J. Herbsleb, and F. Thung (2016) How to break an api: cost negotiation and community values in three software ecosystems. In FSE, pp. 109–120. Cited by: §5.2.
  • A. Brito, L. Xavier, A. Hora, and M. T. Valente (2018a) APIDiff: detecting api breaking changes. In SANER, pp. 507–511. Cited by: §5.2.
  • A. Brito, L. Xavier, A. Hora, and M. T. Valente (2018b) Why and how java developers break apis. In SANER, pp. 255–265. Cited by: §5.2.
  • M. Cadariu, E. Bouwers, J. Visser, and A. van Deursen (2015) Tracking known security vulnerabilities in proprietary software systems. In SANER, pp. 516–519. Cited by: §5.1.
  • K. Chow and D. Notkin (1996) Semi-automatic update of applications in response to library changes. In ICSM, pp. 359–368. Cited by: §3.3, §5.3.
  • B. E. Cossette and R. J. Walker (2012) Seeking the ground truth: a retroactive study on the evolution and migration of software libraries. In FSE, pp. 55. Cited by: §3.3, §5.3.
  • J. Cox, E. Bouwers, M. van Eekelen, and J. Visser (2015) Measuring dependency freshness in software systems. In ICSE, Vol. 2, pp. 109–118. Cited by: §5.4.
  • B. Dagenais and M. P. Robillard (2009) SemDiff: analysis and recommendation support for api evolution. In ICSE, pp. 599–602. Cited by: §3.3, §5.3.
  • B. Dagenais and M. P. Robillard (2011) Recommending adaptive changes for framework evolution. ACM Transactions on Software Engineering and Methodology 20 (4), pp. 19. Cited by: §3.3, §5.3.
  • C. De Roover, R. Lammel, and E. Pek (2013) Multi-dimensional exploration of api usage. In ICPC, pp. 152–161. Cited by: §5.4.
  • A. Decan, T. Mens, and E. Constantinou (2018a) On the evolution of technical lag in the npm package dependency network. In ICSME, pp. 404–414. Cited by: §5.4.
  • A. Decan, T. Mens, and E. Constantinou (2018b) On the impact of security vulnerabilities in the npm package dependency network. In MSR, pp. 181–191. Cited by: §5.1.
  • A. Decan, T. Mens, and P. Grosjean (2019) An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empirical Software Engineering 24 (1), pp. 381–416. Cited by: §5.1.
  • E. Derr, S. Bugiel, S. Fahl, Y. Acar, and M. Backes (2017) Keep me updated: an empirical study of third-party library updatability on android. In CCS, pp. 2187–2200. Cited by: §5.4.
  • D. Dig and R. Johnson (2006) How do apis evolve? a story of refactoring: research articles. J. Softw. Maint. Evol. 18 (2), pp. 83–107. Cited by: §5.2.
  • M. Fazzini, Q. Xin, and A. Orso (2019) Automated api-usage update for android apps. In ISSTA, pp. 204–215. Cited by: §3.3, §5.3.
  • D. Felsing, S. Grebing, V. Klebanov, P. Rümmer, and M. Ulbrich (2014) Automating regression verification. In ASE, pp. 349–360. Cited by: §5.2.
  • B. Godlin and O. Strichman (2013) Regression verification: proving the equivalence of similar programs. Software Testing, Verification and Reliability 23 (3), pp. 241–258. Cited by: §5.2.
  • [27] G. Gousios(Website) External Links: Link Cited by: §3.2.
  • A. Gyori, O. Legunsen, F. Hariri, and D. Marinov (2018)

    Evaluating regression test selection opportunities in a very large open-source ecosystem

    .
    In ISSRE, pp. 112–122. Cited by: §5.2.
  • [29] (Website) External Links: Link Cited by: §1.
  • J. Henkel and A. Diwan (2005) CatchUp! capturing and replaying refactorings to support api evolution. In ICSE, pp. 274–283. Cited by: §3.3, §5.3.
  • A. Hora, R. Robbes, N. Anquetil, A. Etien, S. Ducasse, and M. T. Valente (2015) How do developers react to api evolution? the pharo ecosystem case. In ICSME, pp. 251–260. Cited by: §5.2.
  • A. Hora and M. T. Valente (2015) Apiwave: keeping track of api popularity and migration. In ICSME, pp. 321–323. Cited by: §5.4.
  • Humble, Jez, and David Farley (2010) Continuous delivery: reliable software releases through build, test, and deployment automation (adobe reader). Pearson Education. Cited by: §1.
  • [34] (Website) External Links: Link Cited by: §3.1, §3.1.
  • [35] (Website) External Links: Link Cited by: §1, §1, §3.1.
  • M. Jang (2006) Linux annoyances for geeks: getting the most flexible system in the world just the way you want it. O’Reilly Media, Inc.. Cited by: §1.
  • K. Jezek, J. Dietrich, and P. Brada (2015) How java apis break–an empirical study. Information and Software Technology 65, pp. 129–146. Cited by: §5.2.
  • R. Kikas, G. Gousios, M. Dumas, and D. Pfahl (2017) Structure and evolution of package dependency networks. In MSR, pp. 102–112. Cited by: §5.1.
  • M. Kim, D. Cai, and S. Kim (2011) An empirical investigation into the role of api-level refactorings during software evolution. In ICSE, pp. 151–160. Cited by: §5.2.
  • R. G. Kula, D. M. German, T. Ishio, and K. Inoue (2015) Trusting a library: a study of the latency to adopt the latest maven release. In SANER, pp. 520–524. Cited by: §5.4.
  • R. G. Kula, D. M. German, T. Ishio, A. Ouni, and K. Inoue (2017) An exploratory study on library aging by monitoring client usage in a software ecosystem. In SANER, pp. 407–411. Cited by: §5.4.
  • R. G. Kula, D. M. German, A. Ouni, T. Ishio, and K. Inoue (2018a) Do developers update their library dependencies?. Empirical Software Engineering 23 (1), pp. 384–417. Cited by: §5.4.
  • R. G. Kula, A. Ouni, D. M. German, and K. Inoue (2018b) An empirical study on the impact of refactoring activities on evolving client-used apis. Inf. Softw. Technol. 93 (C), pp. 186–199. Cited by: §5.2.
  • S. K. Lahiri, C. Hawblitzel, M. Kawaguchi, and H. Rebêlo (2012) Symdiff: a language-agnostic semantic diff tool for imperative programs. In CAV, pp. 712–717. Cited by: §5.2.
  • R. Lämmel, E. Pek, and J. Starek (2011) Large-scale, ast-based api-usage analysis of open-source java projects. In SAC, pp. 1317–1324. Cited by: §5.4.
  • T. Lauinger, A. Chaabane, S. Arshad, W. Robertson, C. Wilson, and E. Kirda (2017) Thou shalt not depend on me: analysing the use of outdated javascript libraries on the web. In NDSS, Cited by: §5.4.
  • L. Li, T. F. Bissyandé, J. Klein, and Y. Le Traon (2016) An investigation into the use of common libraries in android apps. In SANER, pp. 403–414. Cited by: §5.4.
  • M. Linares-Vásquez, G. Bavota, C. Bernal-Cárdenas, M. Di Penta, R. Oliveto, and D. Poshyvanyk (2013) API change and fault proneness: a threat to the success of android apps. In ESEC/FSE, pp. 477–487. Cited by: §5.2.
  • S. McCamant and M. D. Ernst (2003) Predicting problems caused by component upgrades. In ESEC/FSE, pp. 287–296. Cited by: §5.2.
  • S. McCamant and M. D. Ernst (2004) Early identification of incompatibilities in multi-component upgrades. In ECOOP, pp. 440–464. Cited by: §5.2.
  • T. McDonnell, B. Ray, and M. Kim (2013) An empirical study of api stability and adoption in the android ecosystem. In ICSM, pp. 70–79. Cited by: §5.2.
  • G. Mezzetti, A. Møller, and M. T. Torp (2018) Type regression testing to detect breaking changes in node. js libraries. In ECOOP, Cited by: §5.2.
  • Y. M. Mileva, V. Dallmeier, M. Burger, and A. Zeller (2009) Mining trends of library usage. In IWPSE-Evol, pp. 57–62. Cited by: §5.4.
  • Y. M. Mileva, V. Dallmeier, and A. Zeller (2010) Mining api popularity. In Testing – Practice and Research Techniques, pp. 173–180. Cited by: §5.4.
  • S. Mirhosseini and C. Parnin (2017) Can automated pull requests encourage software developers to upgrade out-of-date dependencies?. In ASE, pp. 84–94. Cited by: §5.1.
  • A. Møller and M. T. Torp (2019) Model-based testing of breaking changes in node. js libraries. Cited by: §5.2.
  • F. Mora, Y. Li, J. Rubin, and M. Chechik (2018) Client-specific equivalence checking. In ASE, pp. 441–451. Cited by: §5.2.
  • H. A. Nguyen, T. T. Nguyen, G. Wilson,Jr., A. T. Nguyen, M. Kim, and T. N. Nguyen (2010) A graph-based approach to api usage adaptation. In OOPSLA, pp. 302–321. Cited by: §3.3, §5.3.
  • E. Paraschiv (2018) External Links: Link Cited by: §1.
  • J. Patra, P. N. Dixit, and M. Pradel (2018) ConflictJS: finding and understanding conflicts between javascript libraries. In ICSE, pp. 741–751. Cited by: §5.1.
  • H. Plate, S. E. Ponta, and A. Sabetta (2015) Impact assessment for vulnerabilities in open-source software libraries. In ICSME, pp. 411–420. Cited by: §5.1.
  • S. E. Ponta, H. Plate, and A. Sabetta (2018) Beyond metadata: code-centric and usage-based analysis of known vulnerabilities in open-source software. In ICSME, pp. 449–460. Cited by: §5.1.
  • D. Qiu, B. Li, and H. Leung (2016) Understanding the api usage in java. Information and software technology 73, pp. 81–100. Cited by: §5.4.
  • S. Raemaekers, A. van Deursen, and J. Visser (2012) Measuring software library stability through historical version analysis. In ICSM, pp. 378–387. Cited by: §5.2.
  • S. Raemaekers, A. van Deursen, and J. Visser (2017) Semantic versioning and impact of breaking changes in the maven repository. Journal of Systems and Software 129, pp. 140–158. Cited by: §5.2.
  • R. Robbes, M. Lungu, and D. Röthlisberger (2012) How do developers react to api deprecation?: the case of a smalltalk ecosystem. In FSE, pp. 56:1–56:11. Cited by: §5.2.
  • A. A. Sawant, R. Robbes, and A. Bacchelli (2016) On the reaction to deprecation of 25,357 clients of 4+ 1 popular java apis. In ICSME, pp. 400–410. Cited by: §5.2.
  • T. Schäfer, J. Jonas, and M. Mezini (2008) Mining framework usage changes from instantiation code. In ICSE, pp. 471–480. Cited by: §3.3, §5.3.
  • Schlosser, Gerhard, and Günter P. Wagner (2004) Modularity in development and evolution. University of Chicago Press. Cited by: §1.
  • N. Smith, D. van Bruggen, and F. Tomassetti (2017) JavaParser: visited. Leanpub, oct. de. Cited by: §3.2.
  • G. Soares, R. Gheyi, D. Serey, and T. Massoni (2010) Making program refactoring safer. IEEE software 27 (4), pp. 52–57. Cited by: §5.2.
  • A. Trostanetski, O. Grumberg, and D. Kroening (2017) Modular demand-driven analysis of semantic difference for program versions. In SAS, pp. 405–427. Cited by: §5.2.
  • Y. Wang, M. Wen, Z. Liu, R. Wu, R. Wang, B. Yang, H. Yu, Z. Zhu, and S. Cheung (2018) Do the dependency conflicts in my project matter?. In ESEC/FSE, pp. 319–330. Cited by: §1, §2, §5.1.
  • Y. Wang, M. Wen, R. Wu, Z. Liu, S. H. Tan, Z. Zhu, H. Yu, and S. Cheung (2019) ICSE. pp. 572–583. Cited by: §5.1.
  • E. Wittern, P. Suter, and S. Rajagopalan (2016) A look at the dynamics of the javascript package ecosystem. In MSR, pp. 351–361. Cited by: §5.4.
  • W. Wu, Y. Guéhéneuc, G. Antoniol, and M. Kim (2010) Aura: a hybrid approach to identify framework evolution. In ICSE, pp. 325–334. Cited by: §3.3, §5.3.
  • W. Wu, F. Khomh, B. Adams, Y. Guéhéneuc, and G. Antoniol (2016) An exploratory study of api changes and usages based on apache and eclipse ecosystems. Empirical Software Engineering 21 (6), pp. 2366–2412. Cited by: §5.2.
  • W. Wu, A. Serveaux, Y. Guéhéneuc, and G. Antoniol (2015) The impact of imperfect change rules on framework api evolution identification: an empirical study. Empirical Software Engineering 20 (4), pp. 1126–1158. Cited by: §3.3, §5.3.
  • L. Xavier, A. Brito, A. Hora, and M. T. Valente (2017a) Historical and impact analysis of api breaking changes: a large-scale study. In SANER, pp. 138–147. Cited by: §5.2.
  • L. Xavier, A. Hora, and M. T. Valente (2017b) Why do we break apis? first answers from developers. In SANER, pp. 392–396. Cited by: §5.2.
  • Z. Xing and E. Stroulia (2007) API-evolution support with diff-catchup. IEEE Transactions on Software Engineering 33 (12), pp. 818–836. Cited by: §3.3, §5.3.
  • M. Zimmermann, C. Staicu, C. Tenny, and M. Pradel (2019) Small world with high risks: A study of security threats in the npm ecosystem. In USENIX Security, Cited by: §5.1.