Learning Quick Fixes from Code Repositories
Code analyzers such as ErrorProne and FindBugs detect code patterns symptomatic of bugs, performance issues, or bad style. These tools express patterns as quick fixes that detect and rewrite unwanted code patterns. However, it is hard to come up with new quick fixes and decide which ones are useful and frequently appear in real code. We propose to rely on the collective wisdom of programmers and learn quick fixes from revision histories in software repositories. We propose Revisar, a tool for discovering common Java edit patterns in code repositories. Given code repositories and their revision histories, Revisar (i) identifies code edits from revisions, (ii) clusters edits into sets that can be described using an edit pattern, (iii) compiles, when possible, the edit patterns into executable plugins for the Google's ErrorProne tool. The patterns can then be inspected by the designers of code analyzers. We ran Revisar on nine popular GitHub projects, and it discovered 920 edit patterns across projects. We then acted as designers of code analyzers, inspected the most common 381 patterns and classified 32 as quick fixes. To assess the quality of the quick fixes, we performed a survey with 164 programmers from 124 projects, showing the 10 edit patterns that appeared in most projects. Programmers supported 9 (90 requests applying our patterns to 9 projects and, at the time of the writing, programmers supported 17 (85READ FULL TEXT VIEW PDF
Static analysis tools, or linters, detect violation of source code
With an ever-increasing amount of open source software, the popularity o...
Our modern world produces an increasing quantity of data, and especially...
The Internet has enabled the emergence of massive online collaborative
Background: Some developer activity traditionally performed manually, su...
During the life span of large software projects, developers often apply ...
Software cost estimation based on multivariate data from completed proje...
Learning Quick Fixes from Code Repositories
Programmers often detect code patterns that may lead to undesired behaviors (e.g., inefficiencies) and apply simple code edits to “fix” these patterns. These patterns are often hard to spot because they depend on the style and properties of the programming language in use. Tools such as Error Prone , ReSharper , Coverity , FindBugs , PMD , and Checkstyle  help programmers by automatically detecting and sometimes removing several suspicious code patterns, potential bugs, or instances of bad code style. For example, PMD can detect instances where the method size is used to check whether a list is empty and proposes to replace such instance with the method isEmpty. For the majority of collections, these two ways to check emptiness are equivalent, but for some collections—e.g., ConcurrentSkipListSet—computing the size of a list is not a constant-time operation . We refer to these kinds of edit patterns as quick fixes.
All the aforementioned tools rely on a predefined catalog of quick fixes (usually expressed as rules), each used to detect and potentially fix a pattern. These catalogs have to be updated often due to the addition of new language features (e.g., new constructs in new versions of Java), new style guidelines, or simply due to the discovery of new patterns. However, coming up with what edit patterns are useful and common is a challenging and time-consuming task that is currently performed in an ad-hoc fashion—i.e., new rules for quick fixes are added on a as-needed basis. 7pt The lack of a systematic way of discovering new quick fixes makes it hard for code analyzers to stay up-to-date with the latest code practices and language features.
For example, consider the edit pattern applied to the code in Figure 0(a). The edit was performed in the Apache Ant source code. In the presented pattern, the original code contains three expressions of the form x.equals("str") that compare a variable x of type string to a string literal "str". Since the variable x may contain a null value, evaluating this expression may cause a NullPointerException. In this particular revision, a programmer from this project addresses the issue by exchanging the order of the arguments of the equals method—i.e., by calling the method on the string literal. This edit fixes the issue since the equals method checks whether the parameter is null. This edit is common and we discovered it occurs in three industrial open source projects across GitHub repositories: Apache Ant, Apache Hive, and Google ExoPlayer. Given that the pattern appears in such large repositories, it makes sense to assume that it could also be useful to other programmers who may not know about it. Despite its usefulness, a quick fix rule for this edit pattern is not included in the catalog of largely used code analysis tools, such as FindBugs, and PMD. Remarkably, even though the edit is applied in Google repositories, this pattern does not appear in Error Prone, a code analyzer developed by Google that is internally run on the Google’s code base.
Key insight Our key insight is that we can “discover” useful patterns and quick fixes by observing how programmers modify code in real repositories with large user bases. In particular, we postulate that an edit pattern that is performed by many programmers across many projects is likely to reveal a good quick fix.
Our technique In this work, we propose Revisar, a technique for automatically discovering common Java code edit patterns in online code repositories. Given code repositories as input, Revisar identifies simple edit patterns by comparing consecutive revisions in the revision histories. The most common edit patterns—i.e., those performed across multiple projects—can then be inspected to detect useful ones and add the corresponding quick fixes to code analyzers. For example, Revisar was able to automatically analyze the concrete edits in Figure 0(a) and generate the quick fix in Figure 0(b). We also sent pull requests applying this quick fix to other parts of the code in the Apache Ant and Google ExoPlayer projects, and these pull requests were accepted.
Since we want to detect patterns appearing across projects and revisions, Revisar has to analyze large amounts of code, a task that requires efficient and precise algorithms. Revisar focuses on edits performed to individual code locations and uses GumTree , a tree edit distance algorithm, to efficiently extract concrete abstract-syntax-tree edits from pairs of revisions—i.e., sequences of tree operations such as insert, delete, and update. Revisar then uses a greedy algorithm to detect subsets of concrete edits that can be described using the same edit pattern. To perform this last task efficiently, Revisar uses a variant of a technique called anti-unification 
, which is commonly used in inductive logic programming. Given a set of concrete edits, the anti-unification algorithm finds the least general generalization of the two edits—i.e., the largest pattern shared by the edits.
Contributions This paper makes the following contributions:
Revisar, an automatic technique for discovering common edit patterns in large online repositories. Revisar uses concepts such as AST edits and -caps, and it applies the technique of anti-unification from inductive logic programming to the problem of mining edit patterns (§ II).
A mixed-methods evaluation of the effectiveness of Revisar at discovering quick fixes and the quality of the discovered quick fixes (§ III and IV). When ran on nine popular GitHub projects, Revisar discovered 89 edit patterns that appeared in 3 or more projects. Moreover, 64% of the discovered patterns did not appear in existing tools. Through a survey on a subset of the discovered patterns, we showed that programmers significantly preferred 8/9 (89%) of our patterns. Finally, programmers accepted 6/10 (60%) pull requests applying our patterns to their projects, showing that developers are willing to apply our patterns to their code.
We now describe Revisar, our technique for automatically discovering common edit patterns in code repositories. Given repositories as input, Revisar: (i) identifies concrete code edits by comparing pairs of consecutive revisions (§ II-A), (ii) clusters edits into sets that can be described using the same edit pattern and learns an edit pattern for each cluster (§ II-B and II-C). Figure 2 shows the work-flow of Revisar.
The initial input of Revisar is a set where each is a revision history from a different project (i.e., a sequence of revisions). For each pair of consecutive revisions, Revisar analyzes the differences between the Abstract Syntax Trees (ASTs)111Revisar uses Eclipse JDT  to extract partial type annotations of the ASTs. In our implementation, we use these type annotations to create a richer AST with type information—i.e., every node has a child describing its type. For simplicity, we omit this detail in the rest of the section. of and and uses a Tree Edit Distance (TED) algorithm to identify a set of tree edits that when applied to the AST corresponding to yields the AST corresponding to . In our setting, an edit is one of the following:
insert a leaf node as child of parent node . The current children at positions are shifted one position to the right.
delete a leaf node which is the child of parent node . The deletion may cause new nodes to become leaves when all their children are deleted. Therefore we can delete a whole tree through repeated bottom-up deletions.
replace a leaf node by a leaf node .
move tree to be the child of parent node . The current children at positions are shifted one position to the right. Given a tree , let be the set of nodes in the tree . Intuitively, solving the TED problem amounts to identifying a partial mapping between source tree and target tree nodes. The mapping can then be used to detect which nodes are preserved by the edit and in which positions they appear. When a node is not mapped to any , then was deleted.
Of the many existing tools that are available for computing tree edits over Java source code, Revisar builds on GumTree , a tool that focuses on finding edits that are representative of those intended by programmers instead of just finding the smallest possible set of tree edits. To give an example of edits computed by GumTree, let’s look at the first two lines in Figure 0(a). Figure 3 illustrates the ASTs corresponding to args[i].equals("--launchdiag") and "--launchdiag".equals(args[i]), respectively.
To produce the modified version of the code in line 2 at Figure 0(a), GumTree learned four edits:
insert("--launchdiag", MethodInvo, 0)
insert(equals, MethodInvo, 1)
delete(equals, MethodInvo, 3)
delete("--launchdiag", MethodInvo, 3) These edits move the string literal "--launchdiag" and the equals node so that they appear in front of args[i].
For our purposes, the edits computed by GumTree are at too low granularity since they modify nodes instead of expressions. In particular, we are interested in detecting edits to entire subtrees—e.g., an edit to a method invocation args[i].equals("--launchdiag") instead of to individual nodes inside it. Revisar identifies subtree-level edits by grouping edits that belong to the same connected components. Concretely, Revisar identifies connected components by analyzing the parent-child and sibling relationships between the nodes appearing in the tree edits. Two edits and belong to the same component if (i) the two nodes and modified by and have the same parent, or (ii) the (resp. ) is the parent of (resp. ). For instance, the previously shown edits are associated to two nodes "--launchdiag" and equals. These nodes are connected since they have the same parent node—i.e., the method invocation.
Once the connected components are identified, Revisar can use them to identify tree-to-tree mappings between subtrees inside the original and modified trees like the one showed in Figure 3. We call this mapping a concrete edit. A concrete edit is a pair consisting of two components (i) the tree in the original version of the program, and (ii) the tree in the modified version of the program. This last step completes the first phase of our algorithm, which, given a set of revisions , outputs a set of concrete edits .
Once Revisar has identified concrete edits—i.e., pairs of trees —it tries to group “similar” concrete edits and to generate an edit pattern consistent with all the edits in each group. An edit pattern is a rule with two components: (i) the template , which is used to decide whether a subtree in the code can be transformed using the rule , (ii) the template , which describes how the tree matching should be transformed by . In this section, we focus on computing from a set of concrete edits —i.e., we assume we are given a group of concrete edits that can be described by the same edit pattern. We will discuss our clustering algorithm for creating the groups in the next section.
A template is an AST where leaves can also be holes (variables) and a tree matches the template if there exists a way to assign concrete values to the holes in and obtain —denoted . Given a template over a set of holes , we use to denote a substitution from to concrete trees and to denote the application of the substitution to the holes in . Figure 4 shows the first two concrete edits from Figure 0(a) and the templates and describing the edit pattern obtained from these examples. Here, the template matches any expression calling the method equals with first argument args[i] and any possible second argument. The two substitutions =="--launchdiag" and =="--noclasspath" yield the expressions args[i].equals("--launchdiag") and args[i].equals("--noclasspath"), respectively. The template is similar to and note that the hole appearing in is the same as the one appearing in .
Given a set of concrete edits , an edit pattern , is consistent with if: (i) the set of holes in is a subset of the set the holes appearing in (ii) for every there exists a substitution such that and .
In the rest of the section, we describe how we obtain the edit pattern from the concrete edits. We start by describing how the input template is computed. Our goal is to compute a template such that every AST can match the template —i.e., . In general, the same set of ASTs can be matched by multiple different templates, which could contain different numbers of nodes and holes. Typically, a template with more concrete nodes and fewer holes is more precise—i.e., will match fewer concrete ASTs—whereas a template with few concrete nodes will be more general—e.g., .equals() is more general than arg[i].equals(). Among the possible templates, we want the least general template, which preserves the maximum common nodes for a given set of trees. The idea is to preserve the maximum amount of shared information between the concrete edits. Even when an edit is too specific, we will obtained the desired template when provided with appropriate concrete edits. In our running example, when encountering an expression of the form x.equals("abc"), we will obtain the desired, more general template .equals().
Remarkably, the problem we just described is tightly related to the notion of anti-unification used in logic programming [11, 12]. Given two trees and , the anti-unification algorithm produces the least general template for which there exist substitutions and such that = and = . In our tool we use the implementation of anti-unification from Baumgartner et al. , which runs in linear time. Using this algorithm, we can generate the least general templates and that are consistent with the input and output trees in the concrete edits. For now, the two templates will have distinct sets of holes, but each template can contain the same hole in multiple locations.
At this point, we have the template for the inputs and the template for the output . However, Revisar needs to analyze whether these templates describe an edit pattern —i.e., whether there is a way to map the holes of to the ones of . This mapping can be computed by finding, for every hole in , a hole in that applies the same substitution with respect to all the concrete edits. To illustrate a case where finding a mapping is not possible, let’s look at Figure 5. Although we can learn templates and , these templates cannot describe an edit pattern since it is impossible to come up with a mapping between the holes of the two templates that is consistent with all the substitutions. In this case, the substitution for in is incompatible with the substitution for in because "--noclasspath" is mapped to "-main", but the content of these substituting trees differs. In addition, in our implementation, we avoid to group concrete edits that are compatible, accordingly to our definition but apply to different methods. For instance, the concrete edits (args[i].equals("--launchdiag"), "--launchdiag".equals(args[i])) and (args[i].equalsIgnoreCase("--launchdiag"), "--launchdiag".equalsIgnoreCase(args[i])) should not be in the same cluster since they are applied to the equals and equalsIgnoreCase methods, respectively.
Since the templates are the least general, if no mapping between the holes exists, there exists no edit pattern consistent with the concrete edits—i.e., our algorithm, given a set of edits, finds a rule in our format consistent with the edits if and only if one exists. Therefore, Revisar finds all correct rules in our format and does not miss potential ones.
Given a set of concrete edits , Revisar returns an edit pattern consistent with if and only if some edit pattern consistent with exists.
In this section, we show how Revisar groups concrete edits into clusters that share the same edit pattern. Revisar’s clustering algorithm receives a set of concrete edits and uses a greedy approach. The clustering algorithm starts with an empty set of clusters. Then, for each concrete edit and for each cluster , Revisar checks, using the algorithm from Section II-B, if adding to the concrete edits of cluster gives an edit pattern. When this happens, the cluster is added to a set of cluster candidates and the cost of adding to is computed. Revisar then adds to candidate cluster of minimum cost, or it creates a new cluster with just if no candidate exists. The complexity of this algorithm is where is the number of edits since, for each edit, we have to search which cluster the edit should be included in.
When multiple clusters can receive a new concrete edit, we use the following cost function to decide which cluster to add the edit to. The cost of adding an edit to cluster with corresponding template is computed as follows. First, Revisar anti-unifies and the tree in the concrete edit we are trying to cluster. Let be the substitution for the result of the anti-unification and let be the tree substituting hole . We define the size of a tree as the number of leaf nodes inside it, which intuitively captures the number of names and constants in the AST. We denote the cost of an anti-unification as the sum of the sizes of each minus the total number of holes (the same metric proposed by Bulychev et al.  in the context of clone detection). Intuitively, we want the sizes of substitutions to be small—i.e., we prefer more specific templates. For instance, assume we have a cluster consisting of a single tree args[i].equals("--launchdiag"). Upon receiving a new tree args[i].equals("--noclasspath"), anti-unifying the two trees yields the template args[i].equals() with substitutions =="--launchdiag" and =="--noclasspath". We have concrete nodes "--launchdiag" and "--noclasspath" each of size one, and a single hole . The cost will be . The final cost of adding an edit to a cluster is the sum of the cost to anti-unify and and the cost to anti-unify and .
Predicting promising clusters For large repositories, the total number of concrete edits may be huge and it will be unfeasible to compare all edits to compute the clusters. To address this problem, Revisar only clusters concrete edits which are “likely” to produce an edit pattern. In particular, Revisar uses d-caps [14, 11, 15], a technique for identifying repetitive edits. Given a number and a template , a -cap is a tree-like structure obtained by replacing all subtrees of depth and left nodes in the template with holes. The -cap works as a hash-index for sets of potential clusters. For instance, let’s look at the left-hand side of Figure 3, which is the tree representation of the node args[i].equals("--launchdiag"). A 1-cap replaces all the nodes at depth one with holes. For our example, args[i], equals, and "--launchdiag" will be replaced with holes, outputting the -cap .(). Revisar uses the -caps as a pre-stepin the clustering algorithm. For all concrete edits with the same -cap for the input tree and for the output tree , Revisar uses the clustering algorithm described in Section II-C to compute the clusters for all concrete edits in these
-cap. This heuristic makes our clustering algorithm practical as it avoids considering all example combinations. However, it also comes at the cost of sacrificing completeness—i.e., two concrete edits for which there is a common edit pattern might be placed in different clusters.
Section III-A states our research questions. Section III-B describes the evaluation for the Revisar technique. Section III-C describes a survey study in which programmers evaluate quick fixes discovered by Revisar. Section III-D describes a validation study in which we submit these quick fixes as pull requests to GitHub projects.
We investigate the following three research questions:
How effective is Revisar in identifying quick fixes?
Do developers prefer quick fixes discovered by Revisar?
Do developers adopt quick fixes discovered by Revisar?
The answer to the first research question characterizes the edit patterns that Revisar is able to discover, and whether these edit patterns can be framed in terms of existing code analysis tool rulesets. The answers to the second and third research questions address whether these identified edit patterns are useful for developers, through different perspectives: preference (RQ2) and adoption (RQ3).
Data collection We selected 9 popular GitHub Java projects (Table I) from a list of previously studied projects . The project selection influences quantity and quality of the discovered patterns. We select mature popular projects that have long history of edits, experienced developers who detect problems during code reviews, and several collaborators (avg. 89.11) with different levels of expertise (low expertise collaborators likely submit pull-requests that need quick fixes). Analyzing many projects is prohibitive given our resources, thus we selected a subset of projects with various sizes/domains. We favored projects containing between 1,000 and 15,000 revisions, to have a sample large enough to identify many patterns but not too big due to the time required to evaluate all revisions. The projects have size ranging from 24,753 to 1,119,579 lines of code. In total, the sample contains 43,113 revisions, which were input into Revisar.
Benchmarks Revisar found 288,899 single-location edits which were clustered in 110,384 clusters. Of these clusters, 39,104 contained more than one edit—i.e., Revisar could generalize multiple examples to a single edit pattern. The 39,104 edit patterns covered 205,934/288,899 single-location edits (71%). The distribution of these edit patterns is reminiscent of a long-tail one: the most-common edit pattern having 2,706 concrete edits, of the edit patterns cover 10% of the concrete edits, and 5.3% of the edit patterns cover 20% of the concrete edits.
We performed the experiments on a PC running Windows 10 with a processor Core i7 and 16GB of RAM. We obtained the revision histories of each repository using JGit . Revisar took 5 days to analyze the 9 projects (approximately 10 seconds per commit). Most of the time is spent checking out revisions, a process that can be done incrementally for future projects.
Edit pattern sampling To facilitate manual investigation of these edit patterns, we empirically identified a “Goldilocks” cut-offs for edit patterns found in or more projects that would allow us to practically inspect each patterns (at : 110,384 edit patterns, 2: 1,759, 3: 493, 4: 196; 5: 89, 6: 47, 7: 19, 8: 6, 9: 1). From this distribution, we choose the 493 patterns found in 3 or more projects as a reasonable cut-off for subsequent analysis.
Analysis Phase I—Spurious pattern elimination To assess the effectiveness of Revisar, we conducted a filtering exercise to discard spurious edit patterns. Specifically, we discarded edit patterns involving renaming operations—e.g., renaming the variable obj to object—and edit patterns in which none of the authors could identify a logical rationale behind the edit, typically because the patterns were part of some broader edit sequence—e.g., changing return true to return null. We eliminated 295 spurious patterns, where 61 of them were renaming operation and, for the rest of them, we could not identify a logically meaningful pattern.
Phase II—Merging duplicate edit patterns Next, we merged edit patterns that represented the same logical quick fix. To do so, we employed a technique of negotiated agreement  in which the second and fourth authors collaboratively identified and discarded logically duplicate edit patterns within the sample. When there was disagreement about whether two edit patterns were logical duplicates, the authors opted to merge these duplicates, thus penalizing the effectiveness of Revisar. In other words, this measure is an upper bound of the number of duplicates within the edit patterns. We merged 109 duplicated patterns into 17 other patterns.
Phase III—Cataloging edit patterns Finally, we classified each of the remaining 493-295-109=89 patterns against the eight Java rule-set from the PMD code analyzer tool (for example, “Performance” and “Code Style”). The full rule-set is shown in Table II. We targeted the PMD rule-set because the PMD developers employed a principled process for designing this rule-set. In particular, a goal of the rule-set was to make the rule-set useful for reporting by third-party tools and techniques.222https://github.com/pmd/pmd/wiki/Rule-Categories The first and third authors independently mapped the edit patterns to one of the PMD rule-sets. We computed Cohen’s Kappa to assess the measure of agreement, and deferred to the first author’s judgment to reconcile disagreement. Finally, the first author used online resources, such as Stack Overflow and documentation from different code analysis tools to tag each quick fix as being available or not available in the catalog of an existing tool.
|Best Practices||Rules which enforce generally accepted best practices.||19|
|Code Style||Rules which enforce a specific coding style.||16|
|Design||Rules which help you discover design issues.||22|
|Documentation||Rules which are related to code documentation.||5|
|Error Prone||Rules which detect constructs that are either broken, extremely confusing or prone to runtime errors.||12|
|Multithreading||Rules which flag issues when dealing with multiple threads of execution.||2|
|Performance||Rules which flag suboptimal code.||13|
|Security||Rules which flag potential security flaws.||0|
We conducted a survey to assess programmers’ judgments about a subset of our discovered edit patterns.
Participants We randomly invited 2,000 programmers to participate in our survey through e-mail. We obtained these e-mail addresses from author metadata in the commit histories from 124 popular and well-known GitHub Java-based projects , such as those from Apache, Google, Facebook, Netflix, and JetBrains. We received 164 responses (response rate 8.2%). Through demographic questions in the survey, 118 participants (72%) self-reported having more than five years of experience with Java. Participants also self-reported using tools to flag code patterns, including IntelliJ (72%), Checkstyle (50%), Sonar (50%), FindBugs (43%), PMD (31%), Eclipse (8%), Error Prone (7%), and others (8%). Participants did not receive compensation for their responses.
Survey protocol To constrain the survey response time to 5-10 minutes, we presented programmers with 9 out of the 89 edit patterns using purposive sampling. In other words, we deliberately selected from a design space of candidate edit patterns to balance patterns found in existing analysis tools versus new edit patterns identified in our study. This selection allowed us to verify whether programmers chose patterns found in existing tools as well as to assess their preferences for edit patterns not found in current tools.
We presented edit patterns as side-by-side panes (adjacent left and right panes), with one pane having the baseline code pattern (“expected bad”) and the other with the quick fix version of the code pattern (“expected good”). Each pair was randomized and labeled simply as pattern A and pattern B, so that programmers could not obviously identify the quick fix version of the pattern. To assess if developers preferred the quick fix version of the pattern, we presented a five-point Likert-type item scale: strongly prefer (A), prefer (A), it does not matter, prefer (B), and strongly prefer (B). For each pair, programmers were allowed to provide an open-ended comment for why they chose the particular code pattern.
Presented edit patterns We presented programmers with the following nine edit patterns:
(Performance) Use characters instead of single-character strings. In Java, we can represent a character both as a String or a character. For operations such as appending a value to a StringBuffer, representing the value as a character improves performance—e.g., change sb.append("a") to sb.append(’a’). This edit improved the performance of some operations in the Guava project by 10-25% .
(Error Prone) Prefer string literal in equals method. Invoking equals on a null variable causes a NullPointerException. When comparing the value in a string variable to a string literal, programmers can overcome this exception by invoking the equals method on the string literal since the String equals method checks whether the parameter is null—for example, using "str".equals(s) instead of s.equals("str").
(Performance) Avoid FileInputStream and FileOutputStream. These classes override the finalize method. As a result, their objects are only cleaned when the garbage collector performs a sweep . Since Java 7, programmers can use Files.newInputStream and Files.newOutputStream instead of FileInputStream and FileOutputStream to improve performance as recommended in this Java JDK bug-report .
(Best practices) Use the collection isEmpty method rather than checking the size. Using the method isEmpty to check whether a collection is empty is preferred to checking that the size of the collection is zero. For most collections these constructions are equivalent, but for others computing the size could be expensive. For instance, in the class ConcurrentSkipListSet, the size method is not constant-time . Thus, prefer list.isEmpty() to list.size() == 0. This edit pattern is included in the PMD catalog of rules.
(Code Style) Infer type in generic instance creation. Since Java 7, programmers can replace type parameters to invoke the constructor of a generic class with an empty set (<>), diamond operator  and allow inference of type parameters by the context. This edit ensures the use of generic instead of the deprecated raw types . The benefit of the diamond operator is clarity since it is more concise.
(Design) Remove raw types. Raw types are generic types without type parameters and were used in versions of Java prior to 5.0. They ensure compatibility with pre-generics code. Since type parameters of raw types are unchecked, unsafe code is caught at runtime  and the Java compiler issues warnings for them . Thus, prefer List<String> a = new ArrayList<>() to List<String> a = new ArrayList().
(Error Prone) Field, parameter, and variable could be final. The final modifier can be used in fields, parameters, and local variables to indicate they cannot be re-assigned . This edit improves clarity and it helps with debugging since it shows what values will change at runtime. In addition, it allows the compiler and virtual machine to optimize the code . The edit pattern that adds the final modifier is included in PMD catalog of rules . IDEs such as Eclipse  and NetBeans  can be configured to perform this edit automatically on saving.
(Error Prone) Avoid using strings to represent paths. Programmers sometimes use String to represent a file system path even though some classes are specifically designed for this task—e.g., java.nio.Path. In these cases, it is useful to change the type of the variable to Path. First, strings can be combined in an undisciplined way, which can lead to invalid paths. Second, different operating systems use different file separators, which can cause bugs. Since detecting this pattern requires a non-trivial analysis, code analyzers do not include it as a rule. Thus, use Path path over String path.
Analysis We treated the Likert-type responses as ordinal data and applied a one-sample Wilcoxon signed-rank test to identify statistical differences for each of the nine edit patterns (
). Specifically, the null hypothesis is that the responses are not statistically different and symmetric around the default value (“it does not matter”). Rejecting the null hypothesis implies that programmers have a non-default preference for one code pattern. Because multiple comparisons can inflate the false discovery rate, we compute adjusted p-values using a Benjamini-Hochberg correction. To ease interpretation, we present the results for each pair as a net stacked distribution.
To further assess the perceived usefulness of our edit patterns, and validate their acceptance, we submitted pull requests to GitHub projects containing the nine quick fixes from our survey.
Project selection From the nine GitHub projects (Table I), we selected five projects that actively considered pull requests (Ant, Error Prone, ExoPlayer, Giraph, and Gson). We supplement these projects those of four popular code analyzer tools (Checkstyle, PMD, SonarQube, and Spotbugs) with the expectation that reviewers of these pull requests could capably assess the usefulness of the proposed quick fixes. Thus, we selected a total of nine projects to submit pull requests.
Pull requests We deliberately submitted pull requests that applied locally to a single region of code within a single file to minimize confounds that would be otherwise introduced in large pull requests. In total, we submitted 16 pull requests across all projects.
We recorded the status of the pull requests as either open (not yet accepted into the project code), merged (accepted into the project code), or rejected (declined to accept into the project code). We describe these pull-request submissions through basic descriptive statistics.
Table II characterizes the identified edit patterns and labels them according to the PMD rule-sets. The discovered patterns covered seven of the eight PMD categories, and only “Security” was not represented. The most common rule-sets—with roughly equal frequencies—were “Design” (22), “Best Practices” (19), “Performance” (13), and “Error Prone” (12). The results suggest that Revisar is effective at discovering quick fixes across a spectrum of rule-sets.
Cohen’s found “very good”  agreement between the raters for these rule-sets (, ), with disagreement being primarily attributable to whether an edit pattern is “Best Practice” or “Error Prone.”
Finally, 57/89 patterns were classified as new ones—i.e., they were not implemented as quick fixes in existing tools.
7pt Revisar could automatically discover 89 edit patterns that covered 7/8 PMD categories. 64% of the discovered patterns did not appear in existing tools.
|Likert Resp. Pct1|
Likert-type item responses: Strongly prefer or prefer baseline (B), Neutral (N), Strongly prefer or prefer quick fix (QF).
Adjusted -value after Benjamini-Hochberg correction.
Net stacked distribution removes the Neutral option and shows the skew between baseline and quick fix preferences.Strongly prefer baseline, Prefer baseline, Prefer quick fix, Strongly prefer quick fix.
A summary of the survey results is presented in Table III. The Wilcoxon signed-rank test identified a significant difference in preference—after Benjamini-Hochberg adjustment—for all nine edit patterns (at ). With the exception of EP1, “String to character,” programmers preferred the quick fix version of the code from Revisar—the strength of this preference is visible through the presentation of the net stacked distribution.
To understand why programmers rejected EP1, we examined the optional programmer feedback for this edit pattern. We found that although programmers recognized that passing a character would have better performance, “slightly more efficient,” and requiring “less overhead,” these benefits were not significant enough to outweigh readability or consistency. For example, five programmers reported that since the name of the class is StringBuffer, it’s more consistent to always pass in a String, even if a character would be more efficient. Other programmers reported that always passing in a String is just “easier mentally” and requires “less cognitive load.”
7pt Programmers preferred the quick fixes suggested by Revisar for eight of the nine edit patterns.
|EP1||1||(33%)||PMD Ant SonarQube|
|Total2||6 / 10||(60%)|
Accepted pull request, Rejected pull request, Open pull request.
Acceptance rate is calculated as accepted pull requests against accepted and rejected pull requests. Open pull requests are not included in this calculation.
Of the 16 pull requests we submitted to GitHub projects, six of these were accepted, four were rejected, and the remaining are open as of the time of this writing (Table IV).333Links to GitHub pull requests temporarily removed for blind review.
SonarQube rejected a pull request for EP1, suggesting that changing a String to a Character is purely pedantic. However, they welcomed additional evidence of the performance benefits and would be willing to reconsider given such evidence. Error Prone programmers indicated that EP2—using the equals method of a string literal—was generally useful but not for the particular use case to which we submitted the pull request. SpotBugs rejected the pull request for EP5—using StringBuilder instead of StringBuffer—because a maintainer did not want to unnecessarily make the commit history noisy unless the change was in a performance critical path. Finally, Gson rejected a EP8 pull request for adding final to a parameter: namely, because their IntelliJ already highlights locals and parameters differently depending on whether they are assigned to. In other words, the programmers already use an alternative means to communicate information about effectively final parameters.
7pt Projects accepted 60% of the pull requests for the Revisar quick fixes.
Each of the three studies have limitations, which we describe in this section.
Evaluation of Revisar We considered only single-location edit patterns, which are representative of most quick fixes in code analyzers. However, a current limitation of the technique is that it cannot identify dependent patterns—for example, when both return type of the method and the return statement must change together. Another limitation of our approach is that we only evaluated Java-based GitHub projects; both the choice of language and the choice of projects influence the quick fixes we identified. A threat to construct validity is that it is difficult to exhaustively determine if a discovered quick fix is actually novel. To mitigate this threat, we catalogued popular code analysis tools and conducted searches to find quick fixes. Similarly, given that the categorization of quick fixes involves human judgments, our results (Table II
) should be interpreted as useful estimators forRevisar.
Survey study. The survey employed purposive—that is, non-random—sampling and evaluated only a limited number of quick fixes that do not exhaustively cover the entire design space of quick fixes. Thus, a threat to external validity is that we should be careful and avoid generalizing the results from this survey to all quick fixes. Moreover, participants in the survey self-reported their experience and may not necessarily have been experts. A construct threat within this study is that programmers are not directly evaluating quick fixes: rather, they are being asked to evaluate two different code snippets—essentially, the input and output to an editor pattern. Responses and explanations for their preferences may have been different had they been explicitly told to evaluate the quick fixes directly.
Pull request validation. The choice of projects we submitted pull requests to also influences the acceptance or rejection of the pull requests. As discussed in Section VI, a construct validity threat is that pull requests are not the typical environment through which programmers apply quick fixes. Consequently, the acceptance and rejection of pull requests are not representative of how programmers would actually apply quick fixes within their development environment. Despite this limitation, the study validates that discovered quick fixes are adopted by projects, and provides explanation in cases for when they are not.
Generating executable rules (RQ1) Revisar generates AST patterns, but ideally one wants to generate executable quick fixes that can be added to code analyzers. When possible, Revisar compiles the generated patterns to executable Refaster rules. Refaster  is a rule-language used in the code analyzer Error Prone . A Refaster edit pattern is described using (i
) a before template to pattern-match target locations, and (ii) an after template to specify how these locations are transformed, which are similar to the before and after templates and used by our rules. In general, our rules cannot be always expressed as Refaster ones. In particular, Refaster cannot describe edit patterns that require AST node types. For instance, the edit pattern in Figure 4 requires knowing that an AST node is a StringLiteral and this AST type cannot be inspected in Refaster. In addition, Refaster can only modify expressions that appear inside a method body—e.g., Refaster cannot modify global field declarations. In the future, we plan to implement an extension of Refaster that can execute all rules in our format.
Programmers consider trade-offs when applying quick fixes (RQ2) The feedback from programmers within our survey study suggests trade-offs that programmers consider when making judgments about applying quick fixes. For example, we saw in Section IV-B that for EP1 some programmers preferred the version of the code with worse performance primarily because they valued consistency and reduction in cognitive load over what they felt was relatively small performance improvements.
But even when programmers significantly preferred the quick fixes from Revisar, they carefully evaluated the trade-offs for their decision. For example, consider EP2, in which the quick fix suggests using the equals method on string literals to prevent a NullPointerException. Programmers recognized this benefit but also argued that the baseline version had better readability. As one programmer notes, when given a variable, they felt it more natural to say, “if variable equals value” than “if value equals variable.”
Programmers also indicated unfamiliarity with new language features as a reason to avoid the quick fix version of the code, for example, in EP3—which uses the newer API of Files.newInputStream rather than FileInputuStream. One programmer noted that newer APIs embed “experience about the shortcomings of the old API” but they were also hesitant to use this version of the code without understanding what the shortcomings actually were.
Finally, using the diamond operator (<>) in EP6 makes the code simpler, concise, and more readable. Nevertheless, 19% of participants still preferred or strongly preferred the less concise baseline version of the code. One programmer suggested compatibility with old versions as a reason for this decision: although the diamond operator has several benefits, it cannot be supported if there is a need to target older versions of the Java specification.
Thus, even when automated techniques such as Revisar discover useful quick fixes, the feedback from our survey suggests that it is also important to provide programmers with rationale for why and when the quick fix should be applied. A first-step towards providing an initial rationale can be to situate quick fixes within an existing taxonomy, as we did with our discovered quick fixes in Table II.
Barriers to accepting pull requests (RQ3) Although our survey indicated that programmers significantly preferred the quick fixes identified by Revisar, maintainers of projects in GitHub did not always accept the corresponding pull requests.
In our pull request study in Section IV-C, the comments from project maintainers suggested reasons for declining a quick fix, even when they recognized that the fixes would be generally useful. For instance, the maintainers of SonarQube did not want to incorporate the quick fixes because it would make the commit history more noisy. Spotbugs was concerned about adopting quick fixes without sufficient testing because the fixes might introduce regressions, or behave unexpectedly in different JVM implementations. Other projects like Error Prone have adopted conventions across their entire code base. Unless these quick fixes are applied universally across the project, such inertia makes it unlikely that these projects would adopt a one-off fix—for example, EP2.
Our analysis suggests that when and how a quick fix is surfaced to the developer is important to its acceptance. It is possible these maintainers would have applied these rejected quick fixes had they been revealed as they were writing code, rather than after the fact.
Systems for mining edit patterns from code Molderez et al.  learn AST-level tree transformations as sets of tree edits and used them to automate repetitive code edits. Since the same pattern can be described with different sets of tree edits, two patterns that are deemed equivalent by Revisar can be deemed nonequivalent in . Finally, the edits learned by it are not publicly available and were not evaluated through a survey. Brown et al.  learn token-level syntactic transformations—e.g., delete a variable or insert an integer literal—from online code commits to generate mutations for mutation testing. Unlike Revisar, they can only mine token level transformations over a predefined set of syntactic constructs and cannot unify across multiple concrete edits. Negara et al.  mine code interactions directly from the IDE to detect repetitive edit patterns and find 10 new refactoring patterns. Revisar does not use continuous interaction data from an IDE and only makes use of public data available in online repositories. Other tools [34, 35] mine fine-grained repair templates from StackOverflow and the Defect4j bug data-set and therefore differ from Revisar. In summary, our paper differs from prior work in that (i) Revisar mines new edit patterns in a sound and complete fashion using a rich syntax of edit patterns, (ii) we assessed the quality of the learned patterns and the corresponding quick fixes through a formal evaluation.
Learning transformations from examples Several techniques use user-given examples to learn repetitive code edits for refactoring [36, 37, 38], for removing code clones , for removing defects from code , for learning ways to fix command-line errors , and for performing code completion [42, 43]. All these techniques rely on user-given examples that describe the same intended transformation or on curated labeled data. This extra information allows the tools to perform more informed types of rule extraction. Instead, Revisar
uses fully unsupervised learning and receives concrete edits as input that may or may not describe useful transformations.
Program repair Some program repair tools learn useful fixing strategies by mining curated sets of bug fixes , user interactions with a debugger , human-written patches [46, 47], and bug reports . All these tools either rely on a predefined set of patches or learn patches from supervised data—e.g., learn how to fix a NullPointerException by mining all concrete edits that were performed to fix that type of exception. Unlike these techniques, Revisar analyzes unsupervised sets of concrete code edits and uses a sound and complete technique for mining a well-defined family of edit patterns. Moreover, Revisar learns arbitrary quick fixes that can improve code quality not just ones used to repair buggy code. Since we do not have a notion of correct edit pattern—i.e., there is no bug to fix—we also analyze the usefulness of the learned quick fixes through a comprehensive evaluation and user study, a component that is not necessary for the code transformations used in program repair.
We presented Revisar, a technique for automatically discovering common Java code edit patterns in online code repositories. Revisar (i) identifies edits by comparing consecutive revisions in code repositories, (ii) clusters edits into sets that can be abstracted into the same edit pattern, and we used Revisar to mine quick fixes from nine popular GitHub Java projects and Revisar successfully learned 89 edit patterns that appeared in more than three projects. To assess whether programmers would like to apply these quick fixes to their code, we performed an online survey with 164 programmers showing 9 of our quick fixes. Overall, programmers supported 89% of our quick fixes. We also issued pull requests in various repositories and 50% were accepted so far.
The results of this work have several implications for toolsmiths. First, Revisar can be used to efficiently collect patterns and their usages in actual repositories and enable toolsmiths to make informed decisions about which quick fixes to prioritize based on patterns programmers actually apply in practice. Second, Revisar allows toolsmiths to discover new quick fixes, without needing their users to explicitly submit quick fix suggestions. Third and finally, the results of this work suggest several logical and useful extensions to further aid toolsmiths—e.g., supporting more complex patterns that appear in code analyzers but are currently beyond the capabilities of Revisar and designing techniques for automatically extracting executable quick fixes from mined patterns.
Journal of Automated Reasoning, vol. 52, no. 2, pp. 155–190, Feb. 2014.