Software developers are known to apply repeatedly code changes within and across code bases kim2009discovering ; kim2006memories . Building on this finding, researchers have shown that mining repair models from real-world bug fixes can indeed improve over the random search of fix ingredients for automated repair martinez2015mining . In recent literature, the software engineering community has mostly proposed automated program repair techniques that generate patches based on templates that are built from manually identified common repair patterns (e.g., add/modify if-conditions, alter method parameters, etc. yue2017characterization ). PAR dongsun2013Automatic and Relifix tan2015relifix are examples of such state-of-the-art techniques which leverage manually extracted templates to either help build human-readable patches or to fix common errors.
Automatically discovering bug fix patterns is, however, a challenging endeavour. Genesis long2017automatic limits the search space to previously successful patches from only three classes of defects of Java programs: null pointer (NP), out of bounds (OOB), and class cast (CC) defects. To discover unknown change types, Fluri et al. fluri2008discovering
have used hierarchical clustering focusing on 41 basic change types. We further note that despite a large body of empirical studies discussing the repetition of code changescampos2017common ; ray2012case ; molderez2017mining ; nguyen2010recurring ; park2012empirical , virtually none provide actionable findings and insights for the Automated Program Repair (APR) field.
Recently, with the development of code differencing tools based on abstract syntax trees (AST), such as GumTree falleri2014Fine and ChangeDistiller fluri2007change , researchers have taken the opportunity to mine repair actions defined as modifications of AST nodes (UPD/MOV/INS/DEL statements, expressions, modifiers, etc.). The mining process (cf. liu2017mining ; rolim2018learning ; martinez2015mining
) generally consists in applying as-is some common unsupervised learning algorithms which take as inputs some flat representations of the AST differences. Unfortunately, tree representation flattening eventually leads to a loss of information with regards to the contexts of the code change, the AST node types involved or the order of change actions. Furthermore, state-of-the-art approaches yield highly generic change patterns (e.g., INSERT IF-Statement) which are not immediately actionable, in contrast with the targeted templates that are required by APR techniques. We refer to such patterns ashigh-level patterns.
We argue that the main challenge in discovering semantically-relevant fix patterns (i.e., patterns that make sense with regards to a bug type, beyond simple syntactical categorisation of change operations) is to keep track of any information that may be relevant to describing the change context, the change operations, and the programming tokens impacted. To that end, we propose in this work a three-step approach to iteratively refine the patterns that can be discovered by clustering different representations of code changes. The contributions of this paper as are follows:
We propose a fix pattern mining approach that explores the hierarchical information of the AST to include more context related the actual code change. This approach is implemented with an augmented version of the GumTree AST differencing tool. Similarly to our approach, Huang et al. have concurrently proposed CLDIFF huang2018cldiff , an augmented version of GumTree, which was however mainly focused for summarizing and visualizing code changes.
The FixMiner approach builds on a three-fold clustering strategy where we iteratively consider the shape of the code to be changed, then the sequence of repair actions performed, before checking the similarity among the program token changes. Similar refinement changes were concurrently proposed by Jiang et al. for shaping the repair space with SimFix jiang2018shaping .
We assess the capability of FixMiner to discover patterns by mining UPDATE fix patterns among 8 000 patches addressing user-reported bugs in 44 open source projects. We further relate the discovered patterns to those that can be found in current datasets (i.e., Defects4J just2014defects4j and Bugs.jar sahaMSR18 ) used by the programming repair community.
Finally, we investigate the relevance of the mined fix patterns by implementing them as part of an Automated Program Repair system. Experimental results on the Defects4J benchmark show that our mined patterns are effective for fixing 25 bugs. We further find that these patterns are relevant as they lead to generate plausible patches that are mostly correct. Contrary to concurrently proposed and related work, we propose the pattern mining tool as a separate and reusable component that can be leveraged in other patch generation systems.
The remainder of this paper is structured as follows. In Section 2, we provide background information on Abstract syntax trees and code differencing, as well as the intuition behind our approach. Section 3 details the process of FixMiner while experiments are presented in Section 4. We discuss insights and threats to validity in Section 5 and related work in Section 6 before concluding in Section 7.
2 Background and Intuition
We now provide background information for understanding the execution as well as the information processed by FixMiner.
2.1 Abstract Syntax Trees
Code representation is an essential step in the analysis and verification of programs. To date, Abstract syntax trees (AST), generally produced for program analysis and transformations, are data structures that provide an efficient form of representing program structures to reason about syntax and even semantics. An AST indeed represents all of the syntactical elements of the programming languages and focuses on the rules rather than elements like braces or semicolons that terminate statements in some popular languages like Java or C. The AST is a hierarchical representation where the elements of each programming statement are broken down recursively into their parts. Each node in the tree thus denotes a construct occurring in the programming language.
Consider the AST representation in Fig.1 of the Java code in Listing 1. Labels of nodes correspond to the name of their production rule in the grammar. Labels indeed suggest the structure, while values of the nodes correspond to the raw tokens in the code. Formally, let be a set of nodes constituting an AST. T has a unique root which is a node referred to as . Each node has a parent with . Note that . Furthermore, each node has a sequence of children (denoted as ) and a label from a given alphabet (). Finally, each node has a string value that is possibly empty (). From the running example of Fig. 1, we note that the illustrated AST has nodes which are labelled with labels matching structural elements the Java language (e.g., MethodDeclaration, IfStatement or StringLiteral and can be associated to values representing the raw tokens in the code (e.g., A node labelled StringLiteral from our AST is associated to value ”Hi!”).
2.2 Code Differencing
Differencing between two version programs is the key pre-processing step of all studies on software evolution. The evolved parts must be captured in a way that makes it easy for developers to understand or other programs to analyse the changes. Humans generally deal well with text-based differencing tools such as the Gnu Diff which represents changes as addition and removal of source code lines as in Listing 2.
The main issue with this text-based differencing is that it does not provide a fine-grained representation of the change (i.e., string replacement) and thus it is poorly suited for systematically analysing the changes: e.g., - and + lines would be the same if the change was on other code elements such as the boolean expression, or even if a spacing was added at the end of the line.
To address the challenges of code differencing, recent algorithms have been proposed based on tree structures (such as the AST). GumTree and ChangeDistiller are examples of such algorithms which produce edit scripts that detail the operations to be performed on the nodes of a given AST to yield another AST corresponding to the new version of the code. In particular, in this work, we build on GumTree’s core algorithms for preparing an edit script as a sequence of edit actions for transforming an AST. The following edit actions are listed:
UPD where a action transforms the AST by replaced the old value of an AST node with the new value .
ADD where a action adds a new node with as value and as label. If the parent is specified, is inserted as the child of , otherwise is the root node.
DEL where a action removes the leaf node from the tree
MOV where a action moves the subtree having node as root to make it the child of a parent node .
An edit action, embeds information about the node (i.e., the relevant node in the whole AST tree of the parsed program), the operator (i.e., UPD,ADD,DEL,MOV) which describes the action performed, and the raw tokens involved in the change.
2.3 Motivation and Intuition
Mining, enumerating and understanding code changes has been a hot topic of interest in software engineering in recent years. Ten years ago, Pan et al. have contributed with a manually-compiled catalog of 27 code change patterns related to bug fixing pan2009toward . Such “bug fix patterns” however are generic patterns (e.g., IF-RMV: removal of an If Predicate) which represent the type of changes that are often fixing bugs. More recently, thanks to the availability of AST differencing tools, researchers have proposed to automatically mine change patterns martinez2013automatically ; osman2014mining ; oumarou2015identifying ; lin2016empirical .
Eventually, mining of fix patterns yields clusters as a sequence of edit actions (edit scripts) that are recurrent. Unfortunately, to date, such “patterns” have been of little use to improve automated repair studies. Following their fix pattern mining exercise, Xuan et al. have developed NOPOL xuan2017Nopol focused on repairing buggy conditional statements (i.e., applying changes to IF-then-Else statements). However, even in this case, insights from their study were not leveraged in the fixing process.
We argue that mining fix patterns can have two distinct uses:
Guiding mutation operations in generate-and-validate automated repair approaches. In this case, there is a need to mine truly recurrent change patterns to which repair semantics can be attached, and to provide accurate, fine-grained patterns that can be actionable in practice.
Automating the generation of repair templates for template-driven automated program repair. In this case, there is a need to provide patterns that go beyond high-level syntactic operations, but also include contextual information (e.g., qualified token names), which can be specific to projects, to ensure the derivation of actionable templates.
Our intuition is that bug fix pattern mining to be relevant and informative must consider several layers of source code information: the shape/type of code being modified (e.g., a modifier change pattern in declaration statements, should not be generalized to other types of statements); the actual change patterns (i.e., a ”remove then add” sequence should not be confused with ”add then remove” as it may have distinct meaning in a hierarchical model such as the AST); and the context of the change (i.e., token information that can help specialize the patterns into templates).
Figure 2 illustrates the different steps of the FixMiner
approach. As an initial step, we collect code changes from software versioning systems (e.g., Git commits). Using heuristics, we identify the relevant bug fixing patches (see details in4). Then for each patch, we compute an enhanced AST difference (Enhanced AST Diff) representation between the two versions of the programs (cf. Section 3.1).
In the FixMiner approach, we focus on three specialized tree representations of the Enhanced AST Diff carrying respectively information about the impacted AST node types, or the repair actions performed hierarchically on the nodes, or the code tokens that are affected in the subtree. FixMiner works in an iterative manner for three different specialized representations of the changes. For each pattern mining iteration:
We build a search index to identify the Enhanced AST Diff to be compare.
We compute the similarity by detecting the distance between two Enhanced AST Diffs.
We construct the clusters of the Enhanced AST Diff that are similar.
Each pattern mining iteration considers a different tree representation, going from the most generic representation (i.e., the shape of the changed code) to the most specific (i.e., the raw tokens changed).
3.1 Step 1 - Enhanced AST diff computation
We consider AST as a suitable data structure for accessing different kinds of information related to the code being changed, and AST diff edit scripts can still be computed to carry essential details on the (1) the shape of code being changed, the sequence of repair actions performed by the patch, as well as the raw tokens involved in the code. To that end, after a literature review, we selected the state-of-the-art GumTree falleri2014Fine AST differencing tool for computing the inputs to FixMiner: given a buggy version and a fixed version of a program, GumTree is claimed to build in a fast, scalable and accurate way the sequence of AST edit actions (a.k.a edit script) between the two associated AST representations of the program.
Consider the Gnu Diff of the fix patch for Defects4J bug Clojure 93 illustrated in Listing 3.
The intended behaviour is to check the parent namespace, by using the position index of dot (’.’) in the string object namespace. However, the method indexOf returns the very first index and the method lastIndexOf, on the other hand, returns the last index of the input element (’.’). This may lead to different position indices of ’.’ in case the string contains several ’.’ elements. The patch fixes the bug which is due to a wrong method reference.
Unfortunately, our experiments show that GumTree focuses only producing the edit script that detail the operations to be performed to transform code from one to another, but loses much contextual information (e.g., qualified token names). Listing 4 provides the corresponding edit script computed by GumTree.
This edit script describes the change action performed (UPD), embeds the information about the node (MethodInvocation / SimpleName) and highlights the change in program tokens (indexOf to lastIndexOf). Although this script is correct in describing the patch operation, it represents the code changes at fine-grained granularity. This fine-grained code changes are often related to high-level AST elements but scattered across the edit script. Moreover, the edit script representation flattens the AST Diff representation into a sequence of edit actions. These lead to loss of contextual informations with regards to the context of the code change such as the high-level AST elements involved or the relationships among code changes or the order of change actions or the qualified token names. Since such contextual informations are lost, the yielded edit script fails to convey the full syntactic and semantic meaning of the code change. Thus, the representation of the code change becomes the first challenge in fix pattern mining.
To address this issue, we first propose to devise and implement an Enhanced AST Diff representation: we represent the patch changes not only as the sequence of edit actions (a.k.a. edit script) for transforming one AST into another, but also by reconstructing the hierarchical structure of the edit actions in order to capture the context of the code change.
Algorithm 1 provides the steps for building an Enhanced AST Diff. Given a patch, we start by computing the set of edit actions (edit script) using GumTree, where the set contains an edit action for each contiguous group of code lines (hunks) that are changed by a patch.
In our enhanced version, we enhance the fine-grained representation of the Gumtree by reconstructing the hierarchical structure edit actions. We construct AST subtrees for the edit actions, by regrouping the edit actions under high-level AST nodes, called root nodes. For each edit action, we traverse the AST tree of the parsed programs produced by Gumtree, until we reach a high-level AST node (root node). Then, when a root node is reached for the given edit action, we construct an AST subtree starting from the discovered root node, down to the edit action node itself, adding all the intermediate nodes in between. This allows us to yield the changes in each hunk of the patches in the form of a tree, referred to as the Enhanced AST Diff.
Listing 5 provides an example Enhanced AST Diff computed for the patch of Clojure 93 from Listing 3. For the simplicity of presentation, we illustrate this Enhanced AST Diff in a textual structure in the paper. Each line indicates a node in the tree, where each node carries 5 different types of information: the repair action, the AST node type, the raw tokens in the patch, the position of the token in the actual source code and the length of the token. The first line indicates the root node, and the three dashed (- - -) lines indicate a child node. Immediate children nodes contain three dashes while their children add another three dashes (- - - - - -), and so on, to reflect the hierarchical structure between the nodes.
Pattern mining iteration. Fix pattern mining requires the computation of similarity among patches in order to group together those patches that may reflect the same changes. Given that the Enhanced AST Diff embeds three specialized representations (Shapes, Actions and Tokens), we create a separate pattern mining iteration for each of them. In each pattern mining iteration, we first build a search index to identify the Enhanced AST Diffs to be compared, then detect the similarity by computing the tree similarity between two Enhanced AST Diffs, and finally construct the clusters of the Enhanced AST Diffs that are similar.
3.2 Step 2 - Search index construction
We build the search index to enable a fast identification of possible combinations of all Enhanced AST Diffs that need to be compared. The Enhanced AST Diffs are spread inside and among the patches, since they are computed for each hunk in patch. Thus it is necessary to label each the Enhanced AST Diff uniquely, to identify all possible combinations for the comparison. We create the unique label as the concatenation of the project name, buggy commit hash , fix commit hash , the full file name in commit, and the position of hunk in the file (e.g, 111ROO/629827_4e7fed_addon-finder#src#main#java#org#springframework#roo#addon#finder#FinderMetadata.java_16), and assign an integer value to this label to create the index.
The search indices are defined as follows:
“Shape” search index.
The “Shape” search index, is used in the first pattern mining iteration, compares the similarity of the Enhanced AST Diffs based only on the structure of the tree in terms of AST node types (cf. 3.4) The search index is constructed by creating the pairwise combination of all Enhanced AST Diff labels, where the label identifies the project name, buggy commit hash, fix commit hash, the full file name in commit, and the position of hunk in the file that needs to compared. “Shape” search index is formulated in Equation 1.
“Action” search index.
The completion of the first pattern mining iteration, produces clusters of Enhanced AST Diffs, where each cluster is representing a common Shape-based fix patterns. However, it does not necessarily implies that these patterns have a common Action-driven fix pattern. Thus, we create a search index for each cluster, in order to compare if the Enhanced AST Diff trees in that cluster have a common Action-driven fix pattern. Formally, the “Action” search index construction is then formulated in Equation 2.
“Token” search index.
The third pattern mining iteration attempts to form clusters of Enhanced AST Diff trees having common tokens for the trees formed in the second pattern mining iteration, where the trees have a common Shape-based and Action-driven fix patterns. Therefore we construct a “Token” search index for each cluster formed in the second pattern mining iteration to identify the labels of the Enhanced AST Diff trees to be compared. The “Token” search index is formulated in Equation 3.
3.3 Step 3 - Enhanced AST Diff Similarity
The goal of our comparisons is indeed to find Enhanced AST Diffs that are similar. In order to find the similarity between the Enhanced AST Diffs, we use the edit distance measure. The tree edit distance is defined as the sequence of edit actions that transform one tree into another. Algorithm 2 defines the steps to compute the Enhanced AST Diff Similarity.
The algorithm starts by collecting the pairs of non repetitive Enhanced AST Diff labels from the search index . As mentioned earlier, each iteration of FixMiner is focusing solely on a single type of information available in the Enhanced AST Diffs. Instead of using heuristics, as in typical hierarchical clustering scenarios where granularity is constructed by the extent of similarity among sets of change operations, we leverage the variety of information types to build levels of abstractions with regards to code shape, the actions and the specificity of tokens, referred as the in the algorithm. Concretely, at each iteration, we restore the Enhanced AST Diffs for the given pair of labels from the cache, then build a simplified version of the Enhanced AST Diff tree: At the first iteration, we consider only trees denoted ASTNodeTrees, whereas in the second iteration we focus on ActionTrees, and TokenTrees for the third iteration. At this point, we use Gumtree gumtree to obtain the mapping and actions for each simplified version of the Enhanced AST Diff pair(,). In order to compute the similarity between trees, we check the size of the actions set(a.k.a edit distance), where the actions set lists the necessary operations to transform one tree to another. The similarity computation in the third fold () is slightly different from the other folds. Since the relevant information in the tree is textual we simply flatten the trees to sequence of tokens and use the Jaro-Winkler jaro1989advances ; winkler1990string algorithm to compute the edit distance between two sequences of the token change actions. Jaro-Winkler calculates the distance (a measure of similarity) between strings. It’s composed of two parts, Jaro’s original algorithm (as Jaro similarity) and Winkler’s extension. The Jaro similarity is the weighted sum of percentage of matched characters from each file and transposed characters . Winkler increased this measure for matching initial characters, by using a prefix scale which gives more favorable ratings to strings that match from the beginning for a set prefix length . Formally, given two strings and , their Jaro-Winkler similarity score is defined in the Equation 4
The similarity score is 0.0 to 1.0, where 0.0 is the least likely and 1.0 is a positive match. For our purposes, anything below a 0.8 is not considered as similar. We remove the tokens that are similar from the sequences of the token change actions, in order to produce action sets as in the different folds, containing the sequences of the token change actions that are different from them. Finally, we check the size of actions set, and tag the pairs having the size zero, since the size zero implies that no operation is necessary to transform one tree to another, thus the trees are similar.
3.4 Step 4 - Fix pattern clustering
The step 3 for each fold of the iteration produces a list of pairs (a.k.a taggedPairs), which is the subset of the search indices indicating the indices of the hunks that are similar. In each pattern mining iteration, we form clusters of fix patterns targets a specific specialized representation, listed as follows:
Shape-based fix patterns.
The first iteration attempts to find patterns in the ASTNodeTrees being changed. We refer to them as Shape-based fix patterns, since they represent the shape of the changed code in a structure of the tree in terms of node types (cf. Listing 6).
Action-driven fix patterns.
The second iteration considers each shape-based fix pattern cluster and attempts to extract hierarchical patterns of repair actions from the ActionTrees that are recurrent in each kind of shape. This step produces patterns (cf. Listing 7) that are already relevant and can be matched to dissection studies already performed in the literature defects4J-dissection . Our patterns, however, remain mapped to a code shape and can thus be used in the context of such shapes to drive precise mutations of automated repair.
Token-specific fix patterns.
The third iteration finally considers each action-driven cluster and attempts to extract patterns that are of a lower-level nature as it takes into account specific recurring tokens (cf. Listing 8). Such patterns are most useful as they can be immediately used to derive ready-to-use templates of bug fixes.
In graph theory, a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph skiena1997stony . We implemented an unsupervised learning process based on the connected component(subgraph) identification in the graph as detailed in Algorithm 3, where the subgraph are consider as the clusters, which are connecting all the Enhanced AST Diff labels that are similar. We collect the indices of the tagged pairs and use these indices as the vertices of a graph. A graph is then formed by connected vertices (edges) that represent pairs of indices that are associated. From this graph, we identify the subgraphs, where the subgraph presents all the Enhanced AST Diff labels that are similar, which is qualified as a cluster.
In order to form clusters, we check the number of vertices represented in subgraphs against a threshold value. We selected a threshold values by investigating the distribution of numbers of Shape-based clusters for different threshold values (cf. Table 1). Based on the median number of clusters across threshold values, we select the threshold value of 10 which yields a number of clusters inline with the median. Thus, we set the threshold to 10 members before a subgraph can be recognized as a cluster of changes.
|Median cluster size|
We now detail the experiments for assessing FixMiner. After enumerating the research questions, we present the experimental setup and describe our findings.
4.1 Research Questions
We assess the FixMiner approach through the following research questions:
What are the clusters that FixMiner can generate for each considered level of abstraction?
Does clusters generated by FixMiner match community-provided dissection labels of fix patterns?
Do the patterns yielded by FixMiner hold any semantics with regards to the associated bug reports?
Are the mined patterns relevant ingredients for generating correct patches as part of an automate program repair pipeline?
4.2 Experimental Setup
To perform our experiments we collect code changes from about 50 large and popular open-source projects from Apache, Commons, JBoss, Spring and Wildfly communities with the following selection criteria: we focused on projects 1) written in Java, (2) with publicly available bug reports, (3) having at least 20 source code files in at least one of its version; finally, to reduce selection bias, we choose projects (4) from a wide range of categories - middleware, databases, data warehouses, utilities, infrastructure. Table 2 details the number of bug-fix patches written by human developers that we considered in each project.
In order to identify the bug-fixing patches in these projects, we leverage the bug linking strategies enforced when developers use the JIRA bug tracking system. We crawl and extract bug report data from the bug tracking systems then perform for each bug link two verifications: i) we check for explicit commit ids (i.e., git hashes) and file path associated to the bug on the bug tracking database: for each file impacted by an identified commit, we consider the corresponding change as a bug fix change. ii)Similarly, we also check commit logs to identify bug report ID and associate the corresponding changes as bug fix changes.
We curate the dataset by selecting only bug reports that are indeed considered as such and are thus resolved and tagged as RESOLVED or FIXED, and whose status is CLOSED. Furthermore, we ensure that the changes are consistent by checking the existence of the full list of files in the code repository which are associated with the bug report. We discard any bug report and its linked patch when a file is not available in the repository. Eventually, our dataset includes 5 044 bug reports with 8 009 patches.
FixMiner leverages GumTree AST differencing tool for computing the inputs. Since GumTree expects the buggy version, fixed version of the patches, and the patch as its input, we reconstruct the suitable forms for each patch in the dataset. We calculate the Enhanced AST Diffs, filtering Enhanced AST Diffs that are are performing a single repair action (either UPD,ADD,DEL,MOV in all nodes rather than the mix of them). We then store the resulting 6 130 different Enhanced AST Diffs to a memory cache in order to speed up the retrieval, by avoiding re-computation in next phases. We use the Enhanced AST Diffs as the unit of count since it represents an atomic change in a patch, i.e., a change that is independent of other changes in a patch. Some patches may indeed implement simultaneously several changes. The literature often refers to them as change hunks when discussing line-based diff representations koyuncu2017impact .
Domain of validity
In practice, the generation of an Enhanced AST Diff is computationally expensive. Indeed, besides the computation cost for producing the Gumtree edit script, regrouping the edit actions via traversing the AST tree incurs an additional computation overhead. Moreover, since a single patch may include several hunks (i.e., contiguous groups of code lines), where each hunk may describe changes that are unrelated to other hunks, it is necessary to compute a separate Enhanced AST Diff for each hunk in the patch. Overall, in our study datasets, we identified 212 212 hunks that would need to be compared against each other, i.e., which should be imported in a huge search index of 224 963 866 elements. In order to limit this search index to ensure search efficacy, we focused on the Enhanced AST Diffs that are performing a single repair action (either UPD, ADD, DEL, MOV in all nodes rather than the mix of them). This constraint leads to 6 130 hunks, and eventually a search index of 18 785 385 elements.
In order to efficiently compute the Enhanced AST Diff, we apply the actor model agha1988concurrent to leverage the concurrent computation in distributed systems. The fundamental idea of the actor model is to use actors as concurrent primitives that can act upon receiving messages. For communication, the actor model uses asynchronous message passing. Each actor has its own mailbox and isolated state, which make it entirely independent of any other instance. Queuing and dequeuing of messages in a mailbox are atomic operations, so there cannot be a race condition. An actor processes incoming messages from his mailbox sequentially, and performs the designated behavior. In our work, the similarity computation of each Enhanced AST Diff pair is treated as the actors, where the messages are the indices of the Enhanced AST Diff pairs that need to be compared. Additionally, to prevent the re-computation of the Enhanced AST Diff, we store them in cached memory in the form of tree data structures (instead of dumping and storing to disk, which would require some re-computation). This in turn significantly reduces the computation for subsequent steps in the approach (i.e, the time to build clusters).
4.3 Statistics on the yielded clusters
We construct the search index for Shape-based patterns, using the formula in Equation 1: the index includes 18 785 385 pairs to be considered to find trees that are similar, using tree edit distance concept. The edit distance threshold is conservatively set to 0, in order to have pure clusters. Overall, given the constraint of minimum 10 members to form a cluster, we were able to form 43 clusters at the first level. These are considered as the highest-level of patterns that FixMiner can yield.
Figure 3 presents the distribution of Shape-based clusters formed by FixMiner for our dataset of bug fixes. 18 clusters have more than 20 Enhanced AST Diffs (associated to change hunks), which represents 72% of all changes (a.k.a change hunks in patches).
The largest cluster (cluster 5 in our results) corresponds to a Method reference modification shape-based pattern, and includes 103 instances. Table 3 enumerates top patterns (in terms of recurrence in patches) discovered in the first iteration. We note that while some clusters (e.g., cluster 13) may appear to be refined versions of others (e.g., cluster 3), FixMiner distinguishes them as their AST present different shapes.
|5||Method reference modification||103|
|3||Variable declaration statement modification||88|
|2||String value modification in Method call||64|
|8||Method call parameter modification||56|
|13||Constant modification in declaration statement||56|
At the end of the refinement on recurrent repair actions at the second iteration, FixMiner yields 37 clusters. Note that this number is smaller as we again set a threshold constraint on cluster size, and thus several similar changes are not deemed to form a pattern. Due to space limitation, Figure 4 shows only the distribution of clusters representing Action-driven patterns found in the top-10 shape-based clusters. These clusters include already 52% of the changes.
We illustrate the refinement of patterns between the first and second iteration of FixMiner by providing examples of changes in Table 4. As for Table 3, the labels that are given to the cluster are based on our manual analysis of summary of changes within the cluster.
|Shape-based pattern label||Action-driven pattern label||Example change in developer patch|
|Variable declaration||Insert method modifier|
|statement:modifier change||Delete method modifier|
|Update method modifier|
|Type change||Variable type change|
|Wrong Variable Reference||Method call parameter value modification Variable replacement by another variable|
|Wrong Method Reference||Assignment expression modification Method call replacement|
|Wrong Method Reference||Method call replacement|
Finally, the third iteration produces 89 clusters representing token-specific patterns. To obtain this large number of clusters from the instances of the 37 action-driven patterns, we have lowered the threshold of minimum cluster size to 3. Figure 5 shows the distribution of the clusters corresponding to instances found in the top clusters from the first iteration of FixMiner.
4.4 Manual (but systematic) assessment of cluster consistency
Protocol. In the absence of a ground truth of clusters for our dataset, in order to evaluate the correctness (in terms of consistency) of the clusters yielded by FixMiner, we manually examine the generated clusters. We focus on clusters generated after the second iteration, i.e., those that correspond to Action-driven patterns. Shape-based patterns are indeed too high-level (i.e., broad in the kinds of instances they include) and are thus trivially consistent. Similarly, Token-specific patterns are very constrained by the matching of tokens. Their consistency is mainly derived from the consistency of their parent clusters (i.e., Action-driven clusters).
Results. For each cluster, we analyzed the patches to determine the purity of the clusters. Out of the 42 clusters, we managed to readily approve 32 providing them with distinct labels which are semantically consistent with the changes. For the remaining 7 clusters, 4 clusters were found to be impure as they contained a mix of several patterns. 3 clusters indeed included consistent changes that make a clear pattern, although we were not able to clearly identify what semantics these changes carry.
Given, the subjective nature of our manual assessment, we further opt to put an objective metric based on the notion of generic/semantic patch padioleau2008documenting ; andersen2010generic . Thus, given a cluster and its change instances, we try to manually derive a SmPL222Semantic Patch Language patch which would be a single generic representation of all the changes in the cluster. Due to space limitation we only include in Figure 6 a couple of examples of SmPL patches corresponding to Action-driven patterns. Overall, we managed to specify the SmPL patches for 34 (i.e., 92%) clusters.
Token-specific patterns are prime candidates for building repair templates for generating likely correct patches in projects. Listing 9 shows the case of a human patch representing the type of changes that are captured by a token-specific pattern for a wrong method reference change.
Starting with version 2.4 the Servlet specification indicates that sendError() and setStatus() on the HttpResponse are treated differently. The former redirects the caller to the configured error page, while the latter still assumes that the caller is going to prepare its own error message page. This is an evolution from the Servlet 2.3 specification where both methods were working similarly. Several fixes to client code implement this change pattern. Such changes are related to the notion of collateral evolutions padioleau2008documenting investigated for Linux device driver APIs and for which the Coccinelle match and transformation engine were proposed to systematically apply change templates across all projects.
4.5 Comparison with Bug datasets dissection categories
Protocol. Defects4J just2014defects4j is a popular dataset which includes a large and manually reviewed set of real-world Java bugs. It has been proposed to enable reproducible studies in the software testing community, but is becoming recently a de-facto benchmark for repair approaches targeting Java programs. All 395 real bugs from our snapshot of the Defects4J dataset are provided with the associated fixes collected from the change history of the associated projects, namely JFreeChart, Google Closure Compiler, Commons Lang, Commons Math, Mockito and Joda-Time. Sobreira et. all defects4J-dissection later presented a study of the anatomy of DefectsJ4 patches, enumerating nine (9) repair patterns at a first level. Some of the higher level patterns such as Expression Fix are further split into sub-patterns, leading to a total of 2 repair patterns.
Results. To check the consistency of FixMiner patterns with Defects4J manual dissection categories by other researchers defects4J-dissection , we build a new dataset which now also includes all Defects4J bug fix changes. Table 5 provides statistics on the overlap between our clusters and Defects4J change categories.
While we identify 42 distinct shape-based patterns among our dataset, these only matched 8 categories of Defects4J. The same discrepancies appear for action-driven and token-specific patterns. This is mainly due to the threshold that we have set as a constraint on the recurrence of changes to derive a pattern. Overall a maximum of 15 change instances of Defects4J are associated with our derived patterns. This is due to the fact that most of the Defects4J changes implement repair actions which are not considered in this version of FixMiner. Indeed, we have focused on mining patterns from changes that UPDATE only existing statements. However, the large number of Defects4J patches are about INSERTING new conditional blocks (79 patches), new return statement (77 patches), etc. We have performed a similar study with Bugs.jar sahaMSR18 , a more comprehensive dataset of 1082 bugs and associated fixes. We find much more cases of UPDATE change patterns overlapping with our clusters as detailed in Table 5. Figure 7 details the distribution differences of patches for Bugs.jar and Defects4J in all FixMiner update change patterns.
4.6 Assessment of fix patterns based on defect description
Protocol. We propose to investigate the semantic relevance of the clusters derived by FixMiner based on the description of associated bug reports. Our hypothesis is that a pattern is semantically relevant if the changes of the corresponding cluster are linked to bug reports which are more closer among themselves than any pair of bug reports on average in the dataset.
Results. Figure 8
provides boxplot distribution of the similarity of bug reports within each cluster, and for the whole dataset after the second and third iterations. We use TF-IDF to represent each bug report as a vector, and leverage Cosine similarity to compute similarity metrics among vectors. First, we note that the average similarity within a cluster is significantly higher than the average similarity in the dataset: a cluster has most changes that are associated with similar bug reports. The median similarity is also relatively higher (except for clusters 8_1 and 17_1). The difference is substantially high for third-level iteration, confirming that our token-specific patterns are grouping patches which have a stronger correlation w.r.t. to the semantics of the bugs that they address.
4.7 Evaluation of fix patterns’ relevance for APR
Protocol. We evaluate the performance of FixMiner by investigating the relevance of the mined patterns in a scenario of automated program repair. To that end, we propose to implement an APR system where patches are generated based on templates. Our prototype system, which we also refer to as FixMiner in the remainder of this paper and for the sake of simplicity, is developed following the principles of the PAR kim2013automatic state-of-the-art approach. In contrast with PAR where the templates were engineered by a manual investigation of example bug fixes, in FixMiner, the templates for repair are engineered based on automatically inferred fix patterns such as the fix pattern in Listing 10 of updating method names in a return statement.
Results. Overall, we implemented the 28 Action-driven patterns inferred with FixMiner from the full dataset of patches excluding all Defects4J patches: 21 patterns are from UPDATE changes, 6 are from INSERT changes and 1 is from DELETE changes.
Our repair pipeline is leveraging the Gzoltar333We used Gzoltar version 0.1.1 campos2012gzoltar framework spectrum-based fault localizations. This framework is widely used in the repair community martinez2016astor ; xiong2017Precise ; xin2017leveraging ; wen2018context , allowing for comparable assessment. In this study, we evaluated the performance of FixMiner against the Defects4J444Version 1.2.0 - https://github.com/rjust/defects4j/releases/tag/v1.2.0 just2014defects4j benchmark which is also becoming a standard benchmark for Java-targeted APR research martinez2016astor ; Xuan2016History ; chen2017contract ; martinez2017automatic . Table 6 details statistics on the experimental benchmark.
|Apache commons-lang (Lang)||65||22K||2,245|
|Apache commons-math (Math)||106||85K||3,602|
|Closure compiler (Closure)||133||90K||7,927|
In the table, column “Bugs” denotes the total number of bugs in Defects4J benchmark, column “LOC” denotes the number of thousands of lines of code, and column “Tests” denotes the total number of test cases for each project.
We compare the performance of FixMiner against 12 state-of-the-art APR tools whose evaluation results are directly comparable since they have been done on Defects4J. Table 7 provides the comparative results where we indicate the number of correctly fixed bugs and the number of plausible patches that were generated by each tool. A plausible patch is a patch that leads the program to pass all test cases. A correct patch is a plausible which is syntactically and semantically similar to the developer-provided patches.
Overall, FixMiner successfully repaired 25 bugs from the Defects4J benchmark by generating correct patches. This performance is only surpassed to date by ELIXIR saha2017elixir and SimFix jiang2018shaping . The latter was concurrently developed with FixMiner.
In each column, we provide numbers: is the number of correctly fixed bugs; is the number of bugs for which a plausible patch is generated by the APR tool (i.e., a patch that makes the program pass all test cases). Precision (P) means the precision of correctly fixed bugs in bugs fixed by each APR tool. The data about jGenProg, jKali and Nopol are extracted from the experimental results reported by Martinez et al. martinez2017automatic . The results of other tools are obtained from their papers in the literature (jMutRepair martinez2016astor , HDRepair Xuan2016History , ACS xiong2017Precise , ssFix xin2017leveraging , ELIXIR saha2017elixir , JAID chen2017contract , SketchFix(SF) hua2018towards , CapGen wen2018context and SimFix jiang2018shaping ).
Nevertheless, while these tools generate more correct patches than FixMiner, they also generate many more plausible patches which are however not correct. For example, 80% of FixMiner’s plausible patches are actually correct, while it is the case for 63% and 70% of respectively ELIXIR and SimFix plausible patches are correct. To date only CapGen wen2018context achieves similar performance at yielding patches with slighter higher probability (at 84%) to be correct. The high performance of CapGen confirms our intuition that context-awareness, which we provide with the Enhanced AST diff, is essential for improving patch correctness.
Among the bugs in the used version of Defects4J benchmark, 267 bugs have not yet been fixed by any tools in the literature. Table 8 enumerates 128 bugs that are currently fixed (both correct and plausible) in the literature. 89 of them can be correctly fixed by at least one APR tool. FixMiner generates correct patches for 25 bugs. Finally, we find that, thanks to its automatically mined patterns, FixMiner was able to fix six (6) bugs which were not not fixed so by any state-of-the-art APR tools (cf. Figure 9).
“✓” indicates that the bug is correctly fixed, “✗” indicates the produced patch is plausible but not correct. “(✓)” indicates that a correct patch is generated by JAID, but is not the first plausible patch to be generated)”.
|Project FixMiner SimFix CapGen SketchFix JAID ssFix ACS ELIXIR HDRepair jGenProg jKali jMutRepair Nopol Chart-1 ✓ ✓ ✓ ✓ (✓) ✓ ✓ ✗ ✗ ✗ ✓ Chart-3 ✓ ✗ ✗ ✗ Chart-4 ✓ Chart-5 ✗ ✗ ✓ Chart-7 ✓ ✗ ✗ Chart-8 ✓ ✓ ✓ ✗ Chart-9 ✓ (✓) ✓ Chart-11 ✓ ✓ ✓ ✓ Chart-12 ✗ ✗ Chart-13 ✗ ✗ ✗ ✗ ✗ ✗ Chart-14 ✗ ✓ Chart-15 ✗ ✗ Chart-17 ✗ Chart-18 ✗ Chart-19 ✓ Chart-20 ✓ ✓ ✓ Chart-21 ✗ Chart-22 ✗ Chart-24 ✓ ✓ ✓ ✓ ✓ Chart-25 ✗ ✗ ✗ ✗ ✗ Chart-26 ✓ ✗ ✓ ✗ ✗ ✗ Closure-5 ✗ Closure-10 ✓ ✗ Closure-14 ✓ ✓ ✓ ✗ Closure-18 ✓ Closure-31 (✓) Closure-33 ✓ Closure-38 ✓ Closure-40 ✓ Closure-51 ✗ Closure-57 ✓ Closure-62 ✓ ✓ ✓ (✓) ✗ Closure-63 ✓ ✓ (✓) Closure-70 ✗ ✓ ✗ Closure-73 ✓ ✓ ✗ ✓ ✗ Closure-79 ✗ Closure-106 ✗ Closure-115 ✓ ✓ Closure-125 ✗ Closure-126 ✓ (✓) ✗ Lang-6 ✓ ✓ ✓ ✓ ✓ Lang-7 ✓ Lang-10 ✗ ✗ Lang-16 ✓ Lang-21 ✓ Lang-24 ✗ ✓ ✓ Lang-26 ✓ ✓ Lang-27 ✓ ✗ Lang-33 ✓ ✓ ✓ ✓ Lang-35 ✓ Lang-38 (✓) ✓ Lang-39 ✓ ✗ ✗ ✗ ✗ Lang-41 ✓ Lang-43 ✗ ✓ ✓ ✓ ✓ ✗ Lang-44 ✗ ✗ ✓ Lang-45 ✗ (✓) Lang-46 ✗ Lang-50 ✓ Lang-51 ✗ (✓) ✗ ✓ ✗ Lang-53 ✗ Lang-55 ✓ (✓) ✓ Lang-57 ✓ ✓ ✓ ✗ Lang-58 ✓ ✗ ✓ Lang-59 ✓ ✓ ✓ ✓ ✓ ✗||Project FixMiner SimFix CapGen SketchFix JAID ssFix ACS ELIXIR HDRepair jGenProg jKali jMutRepair Nopol Lang-60 ✓ Lang-61 ✗ Lang-63 ✗ Math-1 ✗ Math-2 ✗ ✗ ✗ ✗ Math-3 ✓ Math-4 ✓ Math-5 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ Math-6 ✗ Math-8 ✗ ✗ ✗ Math-10 ✓ Math-20 ✗ ✗ Math-22 ✓ ✓ Math-25 ✓ Math-28 ✗ ✗ ✗ ✗ ✗ Math-30 ✓ ✓ ✓ ✓ Math-32 (✓) ✗ ✗ ✗ Math-33 ✓ ✓ ✓ ✓ ✓ ✓ ✗ Math-34 ✓ ✓ ✗ Math-35 ✓ ✓ Math-40 ✗ ✗ ✗ ✗ Math-41 ✓ ✓ Math-42 ✗ Math-49 ✗ ✗ ✗ Math-50 ✓ ✓ (✓) ✓ ✓ ✓ ✓ ✓ ✗ ✓ Math-53 ✓ ✓ (✓) ✓ ✓ ✓ Math-57 ✓ ✓ ✓ ✓ ✓ ✗ ✗ Math-58 ✓ ✓ ✓ ✗ ✗ Math-59 ✓ ✓ ✓ ✓ ✓ Math-61 ✓ Math-63 ✓ ✓ ✗ Math-65 ✓ Math-69 ✗ Math-70 ✓ ✓ ✓ ✓ ✓ ✓ ✗ ✓ Math-71 ✓ ✗ ✗ Math-72 ✗ Math-73 ✗ ✗ ✗ ✗ ✓ ✗ Math-75 ✓ ✓ ✓ ✓ Math-78 ✗ ✗ ✗ Math-79 ✓ ✓ ✗ ✓ Math-80 ✗ ✗ (✓) ✓ ✗ ✗ ✗ ✗ Math-81 ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ Math-82 ✓ ✗ ✗ ✓ (✓) ✓ ✓ ✗ ✗ ✗ ✓ ✗ Math-84 ✗ ✗ ✗ ✗ Math-85 ✓ ✗ ✓ ✓ (✓) ✓ ✓ ✗ ✗ ✓ ✗ Math-87 ✗ Math-88 ✗ ✗ ✗ Math-89 ✓ Math-90 ✓ Math-93 ✓ Math-95 ✗ ✗ Math-97 ✗ ✗ Math-98 ✓ Math-99 ✓ Math-104 ✗ ✗ Math-105 ✗ ✗ Time-4 ✗ ✓ ✗ ✗ Time-7 ✓ Time-11 ✗ ✗ ✗ ✗ ✗ Time-15 ✓ ✓ Time-19 ✓ ✗|
5 Discussions and Threats to Validity
To run the experiments with FixMiner, we leveraged a computing system with 24 Intel Xeon E5-2680 v3 cores with 2.GHz per core and 3TB RAM. The construction of the enhanced AST Diff trees took about 17minutes. Since these trees are later cached in memory, the bottleneck in computation lies in the edit distance computation among trees. Nevertheless, we recorded that comparing 1 108 060 pairs of trees took about 18 minutes.
The main limitation of FixMiner currently stems from our support for Enhanced AST Diffs that are performing a single repair action (either UPD, ADD, DEL, MOV in all nodes rather than the mix of them). This decision was taken to focus on a specific and pure set of changes so as to ensure a reliable manual assessment. In future work, FixMiner will investigate more complex changes combining several repair action types.
Threats to validity.
The selection of our bug-fix datasets carries some threats to validity that we have limited by considering known projects, and heuristics used in previous works. We also make our best effort to link commits with bug reports as tagged by developers. Some false positives may be included if one considers a strict and formal definition of what constitutes a bug.
6 Related Work
Patch generation is one of the key tasks in software maintenance since it is time-consuming and tedious. If this task is automated, the cost and time of developers for maintenance will be dramatically reduced. To address the issue, many automated techniques have been proposed for program repair monperrus2018automatic . GenProg claire2012GenProg
, which leverages genetic programming, is a pioneering work on program repair. It relies on mutation operators that insert, replace, or delete code elements. Although these mutations can create a limited number of variants, GenProg could fix several bugs (in their evaluation, test cases were passed for 55 out of 105 real program bugs) automatically, although most of them have been found to be incorrect patches later. PACHIKAdallmeier2009generating leverages object behavior models. SYDIT meng2011systematic and LASE meng2013lase automatically extracts an edit script from a program change. While several techniques have focused on fixability, Kim et al. dongsun2013Automatic
pointed out that patch acceptability should be considered as well in program repair. Automatically generated patches often have nonsensical structures and logic even though those patches can fix program bugs with respect to program behavior (i.e., w.r.t. test cases). To address this issue, they proposed PAR, which leverages manually-crafted fix patterns. Similarly Long and Rinard proposed Prophetlong2016automatic and Genesis long2017automatic which generates patches by leveraging fix patterns extracted from the history of changes in repositories. Recently, several approaches bhatia2016automated ; gupta2017deepfix
leveraging deep learning have been proposed for learning to fix bugs. Overall, we note that the community is going in the direction of implementing repair strategies based on fix patterns or templates. Our work is thus essential in this direction as it provides a scalable, accurate and actionable tool for mining such relevant patterns.
Code differencing is an important research and practice concern in software engineering. Although commonly used by human developers in manual tasks, differencing at the text line level granularity myers1986ano is generally unsuitable for automated analysis of changes and the associated semantics. AST differencing work has benefited in the last decade for the extensive investigations that the research community has performed for general tree differencing bille2005survey ; chawathe1996change ; chilowicz2009syntax ; al2005diffx . ChangeDistiller fluri2007change and GumTree falleri2014Fine constitute the current state-of-the-art for AST differencing in Java. In this work, we have selected GumTree as the base tool for the computation of edit scripts as its results have been validated by humans, and it has been shown to be more accurate and fine-grained edit scripts. Nevertheless, we have further enhanced the edit script yielding an algorithm that keeps track of contextual information. Our approach echoes a recently published work by Huang et al. huang2018cldiff : their CLDIFF tool similarly enriches the AST produced by GumTree to enable the generation of concise code differences. The tool however was not available at the time of our experiments.
The literature includes a large body of work on mining change patterns.
Studies on code change redundancies.
A number of empirical studies have confirmed that code changes are repeatedly performed in software code bases kim2009discovering ; kim2006memories ; molderez2017mining ; yue2017characterization . Same changes are prevalent because multiple occurrences of the same bug require the same change. Similarly, when an API evolves, or when migrating to a new library/framework, all calling code must be adapted by same collateral changes padioleau2008documenting . Finally, code refactoring or routine code cleaning can lead to similar changes. In a manual investigation, Pan et al. pan2009toward have identified 27 extractable repair templates for Java software. Among other findings, they observed that if-condition changes are the most frequently applied to fix bugs. Their study, however, does not discuss whether most bugs are related to If-condition or not. This is important as it clarifies the context to perform if-related changes. Recently, Nguyen et al. nguyen2010recurring have empirically found that 17-45% of bug fixes are recurring. Our focus in this paper is not to perform empirical studies, which are obviously valuable to community, but to also provide tool-support automated approach to infer change patterns in a dataset so as to drive APR mutation, to learn from changes examples as done by Rolim et al. rolim2017learning , or build systematically repair templates. Our patterns are less generic than the ones in previous work (e.g., as in pan2009toward ; nguyen2010recurring ): We further provide three levels of abstraction to target different APR scenarios. Concurrently to our work, Jiang et al. have proposed SimFix jiang2018shaping which implements a similar idea of leveraging code redundancies for shaping the program repair space. In FixMiner however, the pattern mining phase is independent from the patch generation phase, and the resulting toolset can be used by other researchers.
Generic and semantic patch inference.
Ideally, FixMiner is a tool that aims at performing towards finding a generic patch that can be leveraged by automated program repair to correctly update a collection of buggy code fragments. This problem has been recently studied by approaches such as spdiff andersen2010generic ; andersen2012semantic which work on the inference of generic and semantic patches. This approach, however, is known to be poorly scalable and has constraints of producing ready-to-use semantic patches that can be used by the Coccinelle matching and transformation engine brunel2009foundation . There have however a number of prior work that tries to detect and summarize program changes. A seminal work by Chawathe et al. describe a method to detect changes to structured information based on an ordered tree and its updated version chawathe1996change . The goal was to derive a compact description of the changes with the notion of minimum cost edit script which has been used in the recent ChangeDistiller and GumTree tools. The representations of edit operations, however, is either often too overfit to a particular code change or abstract very loosely the change so that it cannot be easily instantiated. Neamtiu et al. neamtiu2005understanding proposed an approach for identifying changes, additions and deletions of C program elements based on structural matching of syntax trees. Two trees that are structurally identical but have differences in their nodes are considered to represent matching program fragments. Kim et al. kim2007automatic have later proposed a method to infer “change-rules” that capture many changes. They generally express changes related to program headers (method headers, class names, package names, etc.). Weißgerber et al. weissgerber2006identifying have also proposed a technique to identify likely refactorings in the changes that have been performed in Java programs. Overall, these generic patch inference approaches address the challenges of how the patterns that will be leveraged in practice. Our work goes in that direction by providing different levels of patterns.
We have presented FixMiner, a tool-supported approach to mine relevant fix patterns for automated program repair. The approach builds on an enhanced representation of change as an AST Difference tree which includes several information types. We then proceed to form clusters at different abstraction levels, refining from generic patterns representing the shape of the changes to specific patterns associated to tokens (e.g., method names). We manually validate our patterns and show, linking to bug description, that the mined patterns are representative with some semantics of changes that are associated to similar bugs.
We further demonstrate with the implementation of an automated repair pipeline that the patterns mined by our approach are relevant for generating correct patches for 25 bugs in the Defects4J benchmark. These correct patches correspond to 80% of all plausible patches generated by the tool.
Availability All the data and tool support is available on the anonymous site at https://github.com/fixminer
Agha, G., Hewitt, C.: Concurrent programming using actors: Exploiting
In: Readings in Distributed Artificial Intelligence, pp. 398–407. Elsevier (1988)
- (2) Al-Ekram, R., Adma, A., Baysal, O.: diffx: an algorithm to detect changes in multi-version xml documents. In: Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research, pp. 1–11. IBM Press (2005)
- (3) Andersen, J., Lawall, J.L.: Generic patch inference. Automated software engineering 17(2), 119–148 (2010)
- (4) Andersen, J., Nguyen, A.C., Lo, D., Lawall, J.L., Khoo, S.C.: Semantic patch inference. In: Automated Software Engineering (ASE), 2012 Proceedings of the 27th IEEE/ACM International Conference on, pp. 382–385. IEEE (2012)
- (5) Bhatia, S., Singh, R.: Automated correction for syntax errors in programming assignments using recurrent neural networks. arXiv preprint arXiv:1603.06129 (2016)
- (6) Bille, P.: A survey on tree edit distance and related problems. Theoretical computer science 337(1-3), 217–239 (2005)
- (7) Brunel, J., Doligez, D., Hansen, R.R., Lawall, J.L., Muller, G.: A foundation for flow-based program matching: using temporal logic and model checking. In: Acm Sigplan Notices, vol. 44, pp. 114–126. ACM (2009)
- (8) Campos, E.C., Maia, M.A.: Common bug-fix patterns: a large-scale observational study. In: Proceedings of the 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 404–413. IEEE Press (2017)
- (9) Campos, J., Riboira, A., Perez, A., Abreu, R.: Gzoltar: an eclipse plug-in for testing and debugging. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp. 378–381. ACM (2012)
- (10) Chawathe, S.S., Rajaraman, A., Garcia-Molina, H., Widom, J.: Change detection in hierarchically structured information. In: ACM SIGMOD Record, vol. 25, pp. 493–504. ACM (1996)
- (11) Chen, L., Pei, Y., Furia, C.A.: Contract-based program repair without the contracts. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, pp. 637–647. IEEE, Urbana, IL, USA (2017)
- (12) Chilowicz, M., Duris, E., Roussel, G.: Syntax tree fingerprinting for source code similarity detection. In: Program Comprehension, 2009. ICPC’09. IEEE 17th International Conference on, pp. 243–247. IEEE (2009)
- (13) Dallmeier, V., Zeller, A., Meyer, B.: Generating fixes from object behavior anomalies. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, pp. 550–554. IEEE Computer Society (2009)
- (14) Falleri, J.R.: Gumtree. https://github.com/GumTreeDiff/gumtree (Last Access: Mar. 2018.)
- (15) Falleri, J.R., Morandat, F., Blanc, X., Martinez, M., Monperrus, M.: Fine-grained and accurate source code differencing. In: Proceedings of ACM/IEEE International Conference on Automated Software Engineering, pp. 313–324. ACM, Vasteras, Sweden (2014)
Fluri, B., Gall, H.C.: Classifying change types for qualifying change couplings.In: Program Comprehension, 2006. ICPC 2006. 14th IEEE International Conference on, pp. 35–45. IEEE (2006)
- (17) Fluri, B., Giger, E., Gall, H.C.: Discovering patterns of change types. In: Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering, pp. 463–466. IEEE, L’Aquila, Italy (2008)
- (18) Fluri, B., Wuersch, M., PInzger, M., Gall, H.: Change distilling: Tree differencing for fine-grained source code change extraction. IEEE Transactions on software engineering 33(11) (2007)
- (19) Gupta, R., Pal, S., Kanade, A., Shevade, S.: Deepfix: Fixing common c language errors by deep learning. In: AAAI, pp. 1345–1351 (2017)
- (21) Hua, J., Zhang, M., Wang, K., Khurshid, S.: Towards practical program repair with on-demand candidate generation. In: Proceedings of the 40th International Conference on Software Engineering, pp. 12–23. ACM (2018)
- (22) Huang, K., Chen, B., Peng, X., Zhou, D., Wang, Y., Liu, Y., Zhao, W.: Cldiff: generating concise linked code differences. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 679–690. ACM (2018)
- (23) Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. Journal of the American Statistical Association 84(406), 414–420 (1989)
- (24) Jiang, J., Xiong, Y., Zhang, H., Gao, Q., Chen, X.: Shaping program repair space with existing patches and similar code. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 298–309. ACM (2018)
- (25) Just, R., Jalali, D., Ernst, M.D.: Defects4j: A database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the 2014 International Symposium on Software Testing and Analysis, pp. 437–440. ACM, San Jose, CA, USA (2014)
- (26) Kim, D., Nam, J., Song, J., Kim, S.: Automatic patch generation learned from human-written patches. In: Proceedings of the 35th International Conference on Software Engineering, pp. 802–811. IEEE, San Francisco, CA, USA (2013)
- (27) Kim, D., Nam, J., Song, J., Kim, S.: Automatic patch generation learned from human-written patches. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 802–811. IEEE Press (2013)
- (28) Kim, M., Notkin, D.: Discovering and representing systematic code changes. In: Proceedings of the 31st International Conference on Software Engineering, pp. 309–319. IEEE Computer Society (2009)
- (29) Kim, M., Notkin, D., Grossman, D.: Automatic inference of structural changes for matching across program versions. In: ICSE, vol. 7, pp. 333–343. Citeseer (2007)
- (30) Kim, S., Pan, K., Whitehead Jr, E.: Memories of bug fixes. In: Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering, pp. 35–45. ACM (2006)
- (31) Koyuncu, A., Bissyandé, T., Kim, D., Klein, J., Monperrus, M., Le Traon, Y.: Impact of Tool Support in Patch Construction. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 237–248. ACM, New York, NY, USA (2017)
- (32) Le, X.D., Lo, D., Le Goues, C.: History driven program repair. In: Proceedings of the IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER, vol. 1, pp. 213–224. IEEE, Suita, Osaka, Japan (2016)
- (33) Le Goues, C., Nguyen, T., Forrest, S., Weimer, W.: Genprog: A generic method for automatic software repair. IEEE Trans. Software Eng. 38(1), 54–72 (2012)
- (34) Lin, W., Chen, Z., Ma, W., Chen, L., Xu, L., Xu, B.: An empirical study on the characteristics of python fine-grained source code change types. In: Software Maintenance and Evolution (ICSME), 2016 IEEE International Conference on, pp. 188–199. IEEE (2016)
- (35) Liu, K., Kim, D., Bissyandé, T.F., Yoo, S., Traon, Y.L.: Mining fix patterns for findbugs violations. arXiv preprint arXiv:1712.03201 (2017)
- (36) Livshits, B., Zimmermann, T.: Dynamine: finding common error patterns by mining software revision histories. In: ACM SIGSOFT Software Engineering Notes, vol. 30, pp. 296–305. ACM (2005)
- (37) Long, F., Amidon, P., Rinard, M.: Automatic inference of code transforms for patch generation. In: Proceedings of the 11th Joint Meeting on Foundations of Software Engineering, pp. 727–739. ACM, Paderborn, Germany (2017)
- (38) Long, F., Rinard, M.: Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 298–312. ACM, St. Petersburg, FL, USA (2016)
- (39) Martinez, M., Duchien, L., Monperrus, M.: Automatically extracting instances of code change patterns with ast analysis. In: Software Maintenance (ICSM), 2013 29th IEEE International Conference on, pp. 388–391. IEEE (2013)
- (40) Martinez, M., Durieux, T., Sommerard, R., Xuan, J., Monperrus, M.: Automatic repair of real bugs in java: A large-scale experiment on the defects4j dataset. Empirical Software Engineering 22(4), 1936–1964 (2017)
- (41) Martinez, M., Monperrus, M.: Mining software repair models for reasoning on the search space of automated program fixing. Empirical Software Engineering 20(1), 176–205 (2015)
- (42) Martinez, M., Monperrus, M.: Astor: A program repair library for java. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, pp. 441–444. ACM, Saarbrücken, Germany (2016)
- (43) Meng, N., Kim, M., McKinley, K.S.: Systematic editing: generating program transformations from an example. ACM SIGPLAN Notices 46(6), 329–342 (2011)
- (44) Meng, N., Kim, M., McKinley, K.S.: Lase: locating and applying systematic edits by learning from examples. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 502–511. IEEE Press (2013)
- (45) Molderez, T., Stevens, R., De Roover, C.: Mining change histories for unknown systematic edits. In: Proceedings of the 14th International Conference on Mining Software Repositories, pp. 248–256. IEEE Press (2017)
- (46) Monperrus, M.: Automatic software repair: a bibliography. ACM Computing Surveys (CSUR) 51(1), 17 (2018)
- (47) Myers, E.W.: Ano (nd) difference algorithm and its variations. Algorithmica 1(1-4), 251–266 (1986)
- (48) Neamtiu, I., Foster, J.S., Hicks, M.: Understanding source code evolution using abstract syntax tree matching. ACM SIGSOFT Software Engineering Notes 30(4), 1–5 (2005)
- (49) Nguyen, T.T., Nguyen, H.A., Pham, N.H., Al-Kofahi, J., Nguyen, T.N.: Recurring bug fixes in object-oriented programs. In: Software Engineering, 2010 ACM/IEEE 32nd International Conference on, vol. 1, pp. 315–324. IEEE (2010)
- (50) Osman, H., Lungu, M., Nierstrasz, O.: Mining frequent bug-fix code changes. In: Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), 2014 Software Evolution Week-IEEE Conference on, pp. 343–347. IEEE (2014)
- (51) Oumarou, H., Anquetil, N., Etien, A., Ducasse, S., Taiwe, K.D.: Identifying the exact fixing actions of static rule violation. In: Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22nd International Conference on, pp. 371–379. IEEE (2015)
- (52) Padioleau, Y., Lawall, J., Hansen, R.R., Muller, G.: Documenting and automating collateral evolutions in linux device drivers. In: Acm sigops operating systems review, vol. 42, pp. 247–260. ACM (2008)
- (53) Pan, K., Kim, S., Whitehead, E.J.: Toward an understanding of bug fix patterns. Empirical Software Engineering 14(3), 286–315 (2009)
- (54) Park, J., Kim, M., Ray, B., Bae, D.H.: An empirical study of supplementary bug fixes. In: Proceedings of the 9th IEEE Working Conference on Mining Software Repositories, pp. 40–49. IEEE Press (2012)
- (55) Ray, B., Kim, M.: A case study of cross-system porting in forked projects. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, p. 53. ACM (2012)
- (56) Rolim, R., Soares, G., D’Antoni, L., Polozov, O., Gulwani, S., Gheyi, R., Suzuki, R., Hartmann, B.: Learning syntactic program transformations from examples. In: Proceedings of the 39th International Conference on Software Engineering, pp. 404–415. IEEE Press (2017)
- (57) Rolim, R., Soares, G., Gheyi, R., D’Antoni, L.: Learning quick fixes from code repositories. arXiv preprint arXiv:1803.03806 (2018)
- (58) Saha, R., Lyu, Y., Lam, W., Yoshida, H., Prasad, M.: Bugs.jar: A large-scale, diverse dataset of real-world java bugs. In: Proceedings of the 15th Working Conference on Mining Software Repositories, MSR ’18. IEEE (2018)
- (59) Saha, R.K., Lyu, Y., Yoshida, H., Prasad, M.R.: Elixir: Effective object-oriented program repair. In: Automated Software Engineering (ASE), 2017 32nd IEEE/ACM International Conference on, pp. 648–659. IEEE (2017)
- (60) Skiena, S.S.: The stony brook algorithm repository. URL http://www. cs. sunysb. edu/algorith/implement/nauty/implement. shtml (1997)
- (61) Sobreira, V., Durieux, T., Madeiral, F., Monperrus, M., Maia, M.A.: Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J. In: Proceedings of SANER (2018)
- (62) Tan, S.H., Roychoudhury, A.: relifix: Automated repair of software regressions. In: Proceedings of the 37th International Conference on Software Engineering-Volume 1, pp. 471–482. IEEE Press (2015)
- (63) Weissgerber, P., Diehl, S.: Identifying refactorings from source-code changes. In: Automated Software Engineering, 2006. ASE’06. 21st IEEE/ACM International Conference on, pp. 231–240. IEEE (2006)
- (64) Wen, M., Chen, J., Wu, R., Hao, D., Cheung, S.C.: Context-aware patch generation for better automated program repair. In: Proceedings of the 40th International Conference on Software Engineering, pp. 1–11. ACM (2018)
- (65) Winkler, W.E.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. (1990)
- (66) Xin, Q., Reiss, S.P.: Leveraging syntax-related code for automated program repair. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, pp. 660–670. IEEE (2017)
- (67) Xiong, Y., Wang, J., Yan, R., Zhang, J., Han, S., Huang, G., Zhang, L.: Precise condition synthesis for program repair. In: Proceedings of the 39th International Conference on Software Engineering, pp. 416–426. IEEE, Buenos Aires, Argentina (2017)
- (68) Xuan, J., Martinez, M., DeMarco, F., Clement, M., Marcote, S.L., Durieux, T., Le Berre, D., Monperrus, M.: Nopol: Automatic repair of conditional statement bugs in java programs. IEEE Transactions on Software Engineering 43(1), 34–55 (2017)
- (69) Yue, R., Meng, N., Wang, Q.: A characterization study of repeated bug fixes. In: Software Maintenance and Evolution (ICSME), 2017 IEEE International Conference on, pp. 422–432. IEEE (2017)