TBar: Revisiting Template-based Automated Program Repair

03/20/2019 ∙ by Kui Liu, et al. ∙ University of Luxembourg 0

Fix patterns (a.k.a fix templates) are the main ingredients that drive a significant portion of automated program repair (APR) studies in the literature. As fix patterns become widely adopted in various approaches, it becomes critical to thoroughly assess the effectiveness of existing templates to establish a clear baseline for APR. In this paper, we revisit the performance of template-based APR to build comprehensive knowledge about the effectiveness of fix patterns, and to highlight the importance of complementary steps such as fault localization or donor code retrieval. To that end, we first investigate the literature to collect, summarize and label recurrently-used fix patterns. Based on the investigation, we build TBar, a straightforward APR tool that systematically attempts to apply these fix patterns to program bugs. We thoroughly evaluate TBar on the Defects4J benchmark. In particular, we assess the actual qualitative and quantitative diversity of fix patterns, as well as their effectiveness in yielding plausible or correct patches. Eventually, we find that, assuming a perfect fault localization, TBar is able to correctly/plausibly fix 74/102 bugs, while a previous baseline (i.e., kPAR) fixes only 36/55 bugs. Replicating a standard and practical pipeline of APR assessment, we demonstrate that TBar can correctly fix 43 bugs from Defects4J, an unprecedented performance in the literature (including all approaches, i.e., template-based, stochastic mutation-based or synthesis-based APR).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Automated Program Repair (APR) has progressively become an essential research field in software maintenance. APR research is indeed promising to improve modern software development by reducing the time and costs associated with program debugging tasks. In particular, given that faults in software cause substantial financial losses to the software industry (NIST, 2019; Britton et al., 2013), there is a momentum in minimizing the time-to-fix intervals by APR. Recently, various APR approaches (Nguyen et al., 2013; Weimer et al., 2009; Le Goues et al., 2012b; Kim et al., 2013; Coker and Hafiz, 2013; Ke et al., 2015; Mechtaev et al., 2015; Long and Rinard, 2015; Le et al., 2016a, b; Long and Rinard, 2016b; Chen et al., 2017; Le et al., 2017; Long et al., 2017; Xuan et al., 2017; Xiong et al., 2017; Jiang et al., 2018; Wen et al., 2018; Hua et al., 2018; Liu et al., 2019b; Liu et al., 2019a) have been proposed, aiming at reducing manual debugging efforts through automatically generating patches.

An early strategy of APR is to generate concrete patches based on fix patterns (Kim et al., 2013) (also referred to as fix templates (Liu and Zhong, 2018) or program transformation schemas (Hua et al., 2018)). This strategy is now common in the literature and has been implemented in several APR systems (Kim et al., 2013; Saha et al., 2017; Durieux et al., 2017; Liu and Zhong, 2018; Hua et al., 2018; Koyuncu et al., 2018; Martinez and Monperrus, 2018; Liu et al., 2019b; Liu et al., 2019a). Kim et al. (Kim et al., 2013) initiated the usefulness of fix patterns and proposed PAR, an APR tool with 10 fix templates. Saha et al. (Saha et al., 2017) have later proposed ELIXIR by adding three new templates on top of PAR (Kim et al., 2013). Durieux et al. (Durieux et al., 2017) proposed NPEfix to repair bugs throwing null pointer exceptions by using nine pre-defined fix patterns. Long et al. designed Genesis (Long et al., 2017) to automatically infer fix patterns for specific three classes of defects. Liu and Zhong (Liu and Zhong, 2018) explored posts from Stack Overflow111https://stackoverflow.com/ to mine fix patterns for program repair. Hua et al. proposed SketchFix (Hua et al., 2018), a runtime on-demand APR tool with six pre-designed fix schemas. Recently, Liu et al. (Liu et al., 2019b) leveraged the fix patterns of FindBugs static violations (Liu et al., 2018b) to fix semantic bugs.

Although the literature has reported promising results with fix patterns-based APR, to the best of our knowledge, no extensive assessment on the effectiveness of various patterns is performed. A few most recent approaches (Liu and Zhong, 2018; Hua et al., 2018; Liu et al., 2019b) have reported which benchmark bugs are fixed by each (or some) of their fix patterns. Nevertheless, many relevant questions on the effectiveness of fix patterns remain unanswered.

This paper. Our work thoroughly investigates to what extent fix patterns are effective for program repair. In particular, emphasizing on the recurrence of some patterns in APR, we dissect their actual contribution to repair performance. Eventually, we explore three aspects of fix patterns:

  • [leftmargin=*]

  • Diversity: How diverse are the fix patterns used by the state-of-the-art? We survey the literature to identify and summarize the available patterns with a clear taxonomy.

  • Repair performance: How effective are the different patterns? In particular, we investigate the variety of real-world bugs that can be fixed, the dissection of repair results, and their tendency to yield plausible or correct patches.

  • Sensitivity to fault localization noise: Are all fix patterns similarly sensitive to the false positives yielded by (currently imperfect) fault localization tools? We investigate sensitivity by assessing plausible patches as well as the suspiciousness rank of correctly-fixed bug locations.

Towards realizing this study, we implement an automated patch generation system, TBar (Template-Based automated program repair), with a super-set of fix patterns that are collected, summarized, curated and labeled from the literature data. We evaluate TBar on the Defects4J (Just et al., 2014) benchmark, and provide the replication package in a public repository:

https://github.com/SerVal-DTF/TBar

Overall, our investigations have yielded the following findings:

  1. [leftmargin=2mm]

  2. Record performance: TBar creates a new higher baseline of repair performance: 74/102 bugs are correctly/plausibly fixed with perfect fault localization information and 43/81 bugs are fixed with realistic fault localization output, respectively.

  3. Fix pattern selection: Most bugs are correctly fixed only by a single fix pattern while other patterns generate plausible patches. This implies that appropriate pattern prioritization can prevent from plausible/incorrect patches. Otherwise, APR tools might be overfitted in plausible but incorrect patches.

  4. Fix ingredient retrieval: It is challenging for template-based APR to select appropriate donor code, which is an ingredient of patch generation when using fix patterns. Inappropriate donor code may cause plausible but incorrect patch generation. This motivates a new research direction: donor code prioritization.

  5. Fault localization noise: It turns out that fault localization accuracy has a large impact on repair performance when using fix patterns in APR (e.g., applying a fix pattern to incorrect location yields plausible/incorrect patches).

2. Fix Patterns

For this study, we systematically review222For conference proceedings and journals, we consider ICSE, FSE, ASE, ISSTA, ICSME, SANER, TSE, TOSEM, and EMSE. The search keywords are ‘program’+‘repair’, ‘bug’ +‘fix’. the APR literature to identify program repair approaches that leverage fix patterns. Concretely, we consider the program repair website333http://program-repair.org, a bibliography survey of automated program repair (Monperrus, 2018), proceedings of software engineering conference venues and journals as the source of relevant literature. We focus on approaches dealing with Java program bugs, and manually collect, from the paper descriptions as well as the associated artefacts, all pattern instances that are explicitly mentioned. Table 1 summarizes the relevant literature that we enumerated and the quantity of identified fix patterns targeting Java programs. Note that the techniques described in the last four papers (i.e., HDRepair, ssFix, CapGen and SimFix papers) do not directly use fix patterns: they leverage code change operators or rules, which we consider similar to using fix patterns.

Authors APR tool name # of fix patterns Publication Venue Publication Year
Pan et al. (Pan et al., 2009) - 27 EMSE 2009
Kim et al. (Kim et al., 2013) PAR 10 (16) ICSE 2013
Martinez et al. (Martinez and Monperrus, 2016) jMutRepair 2 ISSTA 2016
Durieux et al. (Durieux et al., 2017) NPEfix 9 SANER 2017
Long et al. (Long et al., 2017) Genesis 3 (108) FSE 2017
D. Le et al. (Le et al., 2017) S3 4 FSE 2017
Saha et al. (Saha et al., 2017) ELIXIR 8 (11) ASE 2017
Hua et al. (Hua et al., 2018) SketchFix 6 ICSE 2018
Liu and Zhong (Liu and Zhong, 2018) SOFix 12 SANER 2018
Koyuncu et al. (Koyuncu et al., 2018) FixMiner 28 UL Tech Report 2018
Liu et al. (Liu et al., 2018b) - 174 TSE 2018
Rolim et al. (Rolim et al., 2018) REVISAR 9 UFERSA Tech Report 2018
Liu et al. (Liu et al., 2019b) AVATAR 13 SANER 2019
D. Le et al. (Le et al., 2016b) HDRepair 11 SANER 2016
Xin and Reiss (Xin and Reiss, 2017) ssFix 34 ASE 2017
Wen et al. (Wen et al., 2018) CapGen 30 ICSE 2018
Jiang et al. (Jiang et al., 2018) SimFix 16 ISSTA 2018

In the PAR paper (Kim et al., 2013), 10 fix patterns are presented, but 16 fix patterns are released online444https://sites.google.com/site/autofixhkust/home/fix-templates. In Genesis, 108 code transformation schemas are inferred for three kinds of defects. In ELIXIR, there is one fix pattern that consists of four sub-fix patterns. These APR tools do not explicitly leverage fix patterns but code change operators or rules, which are similar to fix patterns.

Table 1. Literature review on fix patterns for Java programs.

2.1. Fix Patterns Inference

Fix patterns have been explored with the following four ways:

  1. [leftmargin=*]

  2. Manual Summarization: Pan et al. (Pan et al., 2009) identified 27 fix patterns from patches of five Java projects to characterize the fix ingredients of patches. They do not however apply the identified patterns to fix actual bugs. Motivated by this work, Kim et al. (Kim et al., 2013) summarized 10 fix templates manually extracted from 62 656 human-written patches collected from Eclipse JDT.

  3. Mining: Long et al. (Long et al., 2017) proposed Genesis, to automatically infer fix patterns for three kinds of defects from existing patches. Liu and Zhong (Liu and Zhong, 2018) explored fix patterns for Java programs from Q&A posts in Stack Overflow. Koyuncu et al. (Koyuncu et al., 2018) proposed to mine fix patterns at the abstract syntax tree level from bug fixes by leveraging code change differentiating tool (Falleri et al., 2014). Furthermore, Liu et al. (Liu et al., 2018b) and Rolim et al. (Rolim et al., 2018) proposed to mine fix patterns from static analysis violations. In general, mining approaches yield a large number of fix patterns, which are not always about addressing deviations in program behavior. For example, many patterns are about code style (Liu et al., 2019b). Recently, AVATAR was proposed as a program repair tool that only selects static analysis violation fix patterns which are likely related to program behaviour (Liu et al., 2019b)

    . Its authors enumerate a number of heuristics to identify patterns that change data, delete code, change data type, etc. In our study, we follow their approach to focus on patterns which are relevant to APR.

  4. Pre-definition: Durieux et al. (Durieux et al., 2017) pre-defined nine repair strategies for null pointer exceptions by unifying the related repair templates proposed in previous studies (Dobolyi and Weimer, 2008; Kent, 2008; Long et al., 2014). On the top of PAR (Kim et al., 2013), Saha et al. (Saha et al., 2017) further defined three new fix templates to improve the repair performance. Hua et al. (Hua et al., 2018) proposed an APR tool with six pre-defined so-called code transformation schemas. We also consider operator mutations (Martinez and Monperrus, 2016) as pre-defined fix patterns. Indeed the number of operators and mutation possibilities is limited and pre-set. Xin and Reiss (Xin and Reiss, 2017) proposed an approach leveraging syntax-related code to fix bugs with 34 predefined code change rules at the AST level. Ten of the rules are not for transforming the buggy code but for the simple replacement of multi-statement code fragments. We discard these rules from our study to limit bias.

  5. Statistics: Besides formatted fix patterns, researchers (Wen et al., 2018; Jiang et al., 2018) also explored to automate program repair with code change instructions (at the abstract syntax tree level) that are statistically recurrent in existing patches (Zhong and Su, 2015; Martinez and Monperrus, 2015; Liu et al., 2018c; Wen et al., 2017; Jiang et al., 2018). The strategy is then to select the top-n most frequent code change instructions as fix ingredients to synthesize patches.

2.2. Fix Patterns Taxonomy

After manually assessing all fix patterns presented in the literature (cf. Table 1), we identified 15 categories of patterns labeled based on the code context (e.g., a cast expression), the code change actions (e.g., insert an “if” statement with “instanceof” check) as well as the targets (e.g., ensure the program will no throw a ClassCastException.). A given category may include one or several specialized sub-categories. Below, we present the labeled categories and provide the associated 35 Code Change Schemas described in simplified GNU diff pattern for easy understanding.

FP1. Insert Cast Checker. Inserting an instanceof check before one buggy statement if this statement contains at least one unchecked cast expression. Implemented in: PAR, Genesis, AVATAR, HDRepair, SOFix, SketchFix, CapGen, and SimFix.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

x              +  if (exp instanceof T) {
x                           var = (T) exp; ......
x              +  }

where exp is an expression (e.g., a variable expression) and T is the casting type, while “” means the subsequent statements dependent on the variable var. Note that, “” denotes that the fix pattern is not specifically illustrated in the corresponding APR tools since the tools have some abstract fix patterns that can cover the fix pattern. The same notation applies to the following descriptions.

FP2. Insert Null Pointer Checker. Inserting a null check before a buggy statement if, in this statement, a field or an expression (of non-primitive data type) is accessed without a null pointer check. Implemented in: PAR, ELIXIR, NPEfix, Genesis, FixMiner, AVATAR, HDRepair, SOFix, SketchFix, CapGen, and SimFix.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

FP2.1:     +  if (exp != null) {
x                           ...exp...; ......
x              +  }
FP2.2:     +  if (exp == null) return DEFAULT_VALUE;
x                    ...exp...;
FP2.3:     +  if (exp == null) exp = exp1;
x                    ...exp...;
FP2.4:     +  if (exp == null) continue;
x                    ...exp...;
FP2.5:     +  if (exp == null)
x              +           throw new IllegalArgumentException(...);
x                    ...exp...;

where DEFAULT_VALUE is set based on the return type (RT) of the encompassing method as below:

(1)

exp1 is a compatible expression in the buggy program (i.e., that has the same data type as exp). FP2.4 is specific to the case of a buggy statement within a loop (i.e., for or while).

FP3. Insert Range Checker. Inserting a range checker for the sub-element access of an array or collection if it is unchecked. Implemented in: PAR, ELIXIR, Genesis, FixMiner, SketchFix, AVATAR, SOFix and SimFix.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

x              +  if (index < exp.length) {
x                           ...exp[index]...; ......
x              +  }
OR
x              +  if (index < exp.size()) {
x                           ...exp.get(index)...; ......
x              +  }

where exp is an expression representing an array or collection.

FP4. Insert Missed Statement. Inserting a missed statement before, or after, or surround a buggy statement. The missed statement is either an expression statement with a method invocation, or a return statement, or a try-catch statement, or an if statement. Implemented in: ELIXIR, HDRepair, SOFix, SketchFix, CapGen, FixMiner, and SimFix.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

FP4.1:     +  method(exp);
FP4.2:     +  return DEFAULT_VALUE;
FP4.3:     +  try {
x                           statement; ......
x              +  } catch (Exception e) {   ...   }
FP4.4:     +  if (conditional_exp) {
x                           statement; ......
x              +  }

where exp is an expression from a buggy statement. It may be empty if the method does not take any argument. FP4.4 does not include fix patterns FP1, FP2, and FP3, which are used in specific contexts.

FP5. Mutate Class Instance Creation. Replacing a class instance creation expression with a cast super.clone() method invocation if the class instance creation is in an overridden clone method. Implemented in: AVATAR.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

x              public Object clone() {
x              -        ... new T();
x              +        ... (T) super.clone();
x              }

where T is the class name of the current class containing the buggy statement.

FP6. Mutate Conditional Expression. Mutating a conditional expression that returns a boolean value (i.e., true or false) by either updating it, or removing a sub conditional expression, or inserting a new conditional expression into it. Implemented in: PAR, ssFix, S3, HDRepair, ELIXIR, SketchFix, CapGen, SimFix, and AVATAR.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

FP6.1:     -  ...condExp1...
x              +  ...condExp2...
FP6.2:     -  ...condExp1 Op condExp2...
x              +  ...condExp1...
FP6.3:     -  ...condExp1...
x              +  ...condExp1 Op condExp2...

where condExp1 and condExp2 are conditional expressions. Op is the logical operator ‘——’ or ‘&&’. The mutation of operators in conditional expressions is not summarized in this fix pattern but in FP11.

FP7. Mutate Data Type. Replacing the data type in a variable declaration or a cast expression with another data type. Implemented in: PAR, ELIXIR, FixMiner, SOFix, CapGen, SimFix, AVATAR, and HDRepair.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

FP7.1:     -  T1 var ...;
x              +  T2 var ...;
FP7.2:     -  ...(T1) exp...;
x              +  ...(T2) exp...;

where both T1 and T2 denote two different data types. exp means the being casted expression (including variable).

FP8. Mutate Integer Division Operation. Mutating the integer division expressions to return a float value, by mutating its divisor or divider to make them be of type float. Released by Liu et al. (Liu et al., 2018b), it is not implemented in any APR tool yet.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

FP8.1:     -  ...dividend / divisor...
x              +  ...dividend / (double or float) divisor...
FP8.2:     -  ...dividend / divisor...
x              +  ...(double or float) dividend / divisor...
FP8.3:     -  ...dividend / divisor...
x              +  ...(1.0 / divisor) * dividend...

where dividend and divisor are integer number literals or integer-returned expressions (including variables).

FP9. Mutate Literal Expression. Mutating boolean, number, or String literals in a buggy statement with other relevant literals, or correspondingly-typed expressions. Implemented in: HDRepair, S3, FixMiner, SketchFix, CapGen, SimFix and ssFix.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

FP9.1:     -  ...literal1...
x              +  ...literal2...
FP9.2:     -  ...literal1...
x              +  ...exp...

where literal1 and literal2 are of the same type literals, but having different values (e.g., literal1 is true, literal2 is false). exp denotes any expression value of the same type as literal1.

FP10. Mutate Method Invocation Expression. Mutating the buggy method invocation expression by adapting its method name or arguments. This pattern consists of four sub fix patterns:

  1. Replacing the method name with another one which has a compatible return type and same parameter type(s) as the buggy method that was invoked.

  2. Replacing at least one argument with another expression which has a compatible data type. Replacing a literal or variable is not included in this fix pattern, but rather in FP9 and FP13 respectively.

  3. Removing argument(s) if the method invocation has the suitable overridden methods.

  4. Inserting argument(s) if the method invocation has the suitable overridden methods.

Implemented in: PAR, HDRepair, ssFix, ELIXIR, FixMiner, SOFix, SketchFix, CapGen, and SimFix.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

FP10.1:     -  ...method1(args)...
x                +  ...method2(args)...
FP10.2:     -  ...method1(arg1, arg2, ...)...
x                +  ...method1(arg1, arg3, ...)...
FP10.3:     -  ...method1(arg1, arg2, ...)...
x                + ... method1(arg1, ...)...
FP10.4:     -  ...method1(arg1, ...)...
x                +  ...method1(arg1, arg2, ...)...

where method1 and method2 are the names of invoked methods. args, arg1, arg2 and arg3 denote the argument expressions in the method invocation. Note that, code changes on class instance creation, constructor and super constructor expressions are also included in these four fix patterns.

FP11. Mutate Operators. Mutating an operation expression by mutating its operator(s). We divide this fix pattern into three sub-fix patterns following the operator types and mutation actions.

  1. Replacing one operator with another operator from the same operator class (e.g., relational or arithmetic).

  2. Changing the priority of arithmetic operators.

  3. Replacing instanceof operator with (in)equality operators.

Implemented in: HDRepair, ssFix, ELIXIR, S3, jMutRepair, SOFix, FixMiner, SketchFix, CapGen, SimFix, AVATAR, and PAR.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

FP11.1:     -  ...exp1 Op1 exp2...
x                +  ...exp1 Op2 exp2...
FP11.2:     -  ...(exp1 Op1 exp2) Op2 exp3...
x                +  ...exp1 Op1 (exp2 Op2 exp3)...
FP11.3:     -  ...exp instanceof T...
x                +  ...exp != null...

where exp denotes the expressions in the operation and Op is the associated operator.

FP12. Mutate Return Statement. Replacing the expression (excluding literals, variables, and conditional expressions) in a return statement with a compatible expression. Implemented in: ELIXIR, SketchFix, and HDRepair.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

x              -   return exp1;
x              +   return exp2;

where exp1 and exp2 represent the returned expressions.

FP13. Mutate Variable. Replacing a variable in a buggy statement with a compatible expression (including variables and literals). Implemented in: S3, SOFix, FixMiner, SketchFix, CapGen, SimFix, AVATAR, and ssFix.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

FP13.1:     -  ...var1...
x                +  ...var2...
FP13.2:     -  ...var1...
x                +  ...exp...

where var1 denotes a variable in the buggy statement. var2 and exp represent respectively a compatible variable and expression of the same type as var1.

FP14. Move Statement. Moving a buggy statement to a new position. Implemented in: PAR.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

x              -        statement;
x                       ......
x              +        statement;

where statement represents the buggy statement.

FP15. Remove Buggy Statement. Deleting entirely the buggy statement from the program. Implemented in: HDRepair, SOFix, FixMiner, CapGen, and AVATAR.
[tile,size=fbox,boxsep=0mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=0mm0ptblack!5!white,colback=black!5!white]

FP15.1:             ......
x              -        statement;
x                        ......
FP15.2:    -  methodDeclaration(Arguments) {
x              -        ......; statement;......
x              -  }

where statement denotes any identified buggy statement, and method represents the encompassing method.

2.3. Analysis of Collected Patterns

We provide a study of the collected fix patterns following quantitative (overall set) and qualitative (per fix pattern) aspects. Table 2 assesses the fix patterns in terms of four qualitative dimensions:

  1. [leftmargin=*]

  2. Change Action: what high-level operations are applied on a buggy code entity in the program? On the one hand, Update operations replace the buggy code entity with another donor code, while Delete operations just remove the buggy code entity from the program. On the other hand, Insert operations insert an otherwise missing code entity into the program, and Move operations change the position of the buggy code entity to a more suitable location in the program.

  3. Change Granularity: what kinds of code entities are directly impacted by the change actions? This entity can be an entire Method, a whole Statement or specifically targeting an Expression within a statement.

  4. Bug Context: what specific AST nodes of code entities are used to match fix patterns.

  5. Change Spread: the number of statements impacted by each fix pattern.

Fix Pattern Change Action Change Graunlarity Bug Context Change Spread
FP1 Insert statement cast expression single
FP2.1 Insert statement a variable or an expression returning non- primitive-type data single
FP2.(2,3,4,5) dual
FP3 Insert statement element access of array or collection variable single
FP4.(1,2,3,4) Insert statement any statement single
FP5 Update expression class instance creation expression and clone method single
FP6.1 Update expression conditional expression single
FP6.2 Delete
FP6.3 Insert
FP7.1 Update expression variable declaration expression single
FP7.2 Update expression cast expression single
FP8.(1,2,3) Update expression integral division expression single
FP9.(1,2) Update expression literal expression single
FP10.1 Update expression, or statement method invocation, class instance creation, constructor, or super constructor single
FP10.2
FP10.3 Delete
FP10.4 Insert
FP11.1 Update expression assignment or infix-expression single
FP11.2 Update expression arithmetic infix-expression single
FP11.3 Update expression instance of expression single
FP12 Update expression return statement single
FP13.(1, 2) Update expression variable expression single
FP14 Move statement any statement single or multiple
FP15.1 Delete statement any statement single or multiple
FP15.2 Delete method any statement multiple
Table 2. Change properties of fix patterns.
Action Type # fix patterns Granularity # fix patterns Spread # fix patterns
Update 17 Expression 21 Single- 30
Delete 4 Statement 17 Statement
Insert 13 Method 1 Multiple- 7
Move 1 Statements
Table 3. Diversity of fix patterns w.r.t change properties.

Quantitatively, as summarized in Table 3, 17 fix patterns are related to Update change actions, 4 fix patterns implement Delete actions, 13 fix patterns Insert extra code, and only 1 fix pattern is associated to Move change action.

In terms of change granularity, 21 and 17 fix patterns are applied respectively at the expression and statement code entity levels 555Among these, four sub-fix patterns (FP10) can be applied to either expressions or statements, given that constructor and super-constructor code entities in Java program are grouped into statement level in terms of abstract syntax tree by Eclipse JDT.. Only 1 fix pattern is suitable at the method level.

Overall, we note that 30 fix patterns are applicable to a single statement, while 7 fix patterns can mutate multiple statements at the same time. Among these patterns, FP14 and FP15.1 can both mutate single and multiple statements.

Figure 1. The overall workflow of TBar.

3. Setup for Repair Experiments

In order to assess the effectiveness of fix patterns in the taxonomy presented in Section 2, we design program repair experiments using the fix patterns as the main ingredients. The produced APR system is then assessed on a widely-used benchmark in the repair community to allow reliable comparison against the state-of-the-art.

3.1. TBar: a Baseline APR System

Building on the investigations of recurrently-used fix patterns, we build TBar, a template-based APR system which integrates the 35 fix patterns presented in Section 2. We expect the research community to consider TBar as a baseline APR system: new approaches must come up with novel techniques for solving auxiliary issues (e.g., repair precision, search space optimization, fault locations re-prioritization, etc.) to boost automated program repair beyond the performance that a straightforward application of common fix patterns can offer. Figure 1 overviews the workflow that we have implemented in TBar. We describe in the following subsections the role and operation of each process as well as all necessary implementation details.

3.1.1. Fault Localization

x  
Fault localization is necessary for template-based APR as it allows to identify a list of suspicious code locations (i.e., buggy statements) on which to apply the fix patterns. TBar leverages the GZoltar666http://www.gzoltar.com (Campos et al., 2012) framework to automate the execution of test cases for each buggy program. In this framework, we use the Ochiai (Abreu et al., 2007) ranking metric to compute the suspiciousness scores of statements that are likely to be the faulty code locations. This ranking metric has been demonstrated in several empirical studies (Steimann et al., 2013; Xie et al., 2013; Xuan and Monperrus, 2014a; Pearson et al., 2017) to be effective for localizing faults in object-oriented programs. The GZoltar framework for fault localization is also widely used in the literature of APR (Martinez and Monperrus, 2016; Xiong et al., 2017; Xuan et al., 2017; Xin and Reiss, 2017; Wen et al., 2018; Koyuncu et al., 2018; Liu et al., 2018a; Jiang et al., 2018; Liu et al., 2019a; Liu et al., 2019b), allowing for a fair assessment of TBar’s performance against the state-of-the-art.

3.1.2. Fix Pattern Selection

x  
In the execution of the repair pipeline, once the fault localization process yields a list of suspicious code locations, TBar iteratively attempts to select the encoded fix patterns from its database of fix patterns for each statement in the locations list. The selection of fix patterns is conducted in a naïve way based on the context information of each suspicious statement (i.e., all nodes in its AST). Specifically, TBar parses the program and traverses each node of the suspicious statement AST from its first child node to its last leaf node. If a node can match any bug context presented in Table 2, a related fix pattern will be matched to generate patch candidates with the corresponding code change schema. If the node is not a leaf node, TBar keeps traversing its children nodes. For example, if the first child node of a suspicious statement is a method invocation expression, it will be first matched with FP10. Mutate Method Invocation Expression fix pattern. If the children nodes of the method invocation start from a variable reference, it will be matched with FP13. Mutate Variable fix pattern as well. Other fix patterns follow the same manner. After all expression nodes of a suspicious statement are matched with fix patterns, TBar further matches fix patterns from statement and method levels respectively.

3.1.3. Patch Generation and Validation

x  
After a fix pattern is selected for a suspicious statement, the statement will be mutated by following the code change schema of the corresponding fix pattern to generate patch candidates. Each generated patch candidate will be validated with all test cases of the buggy program. If it can make the buggy program pass all test cases successfully, the patch candidate will be considered as a plausible patch. Once such a plausible patch is identified, TBar stops trying other patch candidates for this bug.

Considering that some buggy programs have several buggy locations, if a patch candidate can make a buggy program pass a sub-set of previously failing test cases without failing any previously passing test cases, this patch is considered as a plausible sub-patch of this buggy program. TBar will further validate other patch candidates, until either a plausible patch is generated, or all patch candidates are validated, or TBar exhausts the time limitation set for repair attempts.

If a plausible patch is generated, we further manually check the equivalence between this patch and the ground-truth patch provided by developers and available in the Defects4J benchmark. If the plausible patch is equivalent (syntactically or semantically) to the ground-truth patch, the plausible patch is considered as correct. Otherwise, it is only considered as plausible.

We offer a replication package with extensive details on pattern implementation within TBar. Source code is also made publicly available with a GPL licence.

3.2. Assessment Benchmark

For our empirical assessments, we selected the Defects4J (Just et al., 2014) dataset as the evaluation benchmark of TBar. This benchmark includes test cases for buggy Java programs with the associated developer fixes. Defects4J is an ideal benchmark for the objective of this study, since it has been widely used by most recent state-of-the-art APR systems targeting Java program bugs. Table 4 provides summary statistics on the bugs and test cases available in the version 1.2.0777https://github.com/rjust/defects4j/releases/tag/v1.2.0 of Defects4J which we use in this study.

Project Chart (C) Closure (Cl) Lang (L) Math (M) Mockito (Mc) Time (T) Total
# bugs 26 133 65 106 38 27 395
# test cases 2,205 7,927 2,245 3,602 1,457 4,130 21,566
# fixed bugs by all APR tools (cf. (Liu et al., 2019a; Liu et al., 2019b)) 13 16 28 37 3 4 101
Table 4. Defects4J dataset information.

Overall, we note that, to date, 101 Defects4J bugs have been correctly fixed by at least one APR tool published in the literature. Nevertheless, we recall that SimFix (Jiang et al., 2018) currently holds the record number of bugs fixed by a single tool, which is 34.

4. Assessment

This section presents and discusses the results of repair experiments with TBar. In particular, we conduct two experiments for:

  • Experiment #1: Assessing the effectiveness of the various fix patterns implemented in TBar. To avoid the bias that fault localization can introduce with its false positives (cf. (Liu et al., 2019a)), we directly provide perfect localization information to TBar.

  • Experiment #2: Evaluating the TBar baseline APR system in a normal program repair scenario. We investigate in particular the tendency of fix patterns to produce more or less incorrect patches.

4.1. Repair Suitability of Fix Patterns

Our first experiment focuses on assessing the patch generation performance of fix patterns for real bugs. In particular, we investigate three research questions in Experiment #1. [tile,size=fbox,boxsep=1mm,boxrule=0pt,top=0pt,bottom=0pt,title=Research Questions for Experiment #1, borderline west=1mm0ptblack!50!white,colback=black!5!white]

  • How many real bugs from Defects4J can be correctly fixed by fix patterns from our taxonomy?

  • Can each Defects4J bug be fixed by different fix patterns?

  • What are the properties of fix patterns that are successfully used to fix real bugs?

In a recent study, Liu et al. (Liu et al., 2019a) reported how fault localization techniques substantially affect the repair performance of APR tools. Given that, in this experiment, the APR tool (namely TBar) is only used as a means to apply the fix patterns in order to assess their effectiveness, we must eliminate the fault localization bias. Therefore, we assume that the bug positions at statement level are known, and we directly provide it to the patch generation step of TBar, without running any fault localization tool (which is part of the normal APR workflow, see Figure 1). To ensure readability across our experiments, we denote this version of the APR system as (where stands for perfect localization). Table 5 summarizes the experimental results of .

Fixed Bugs C Cl L M Mc T Total
# of Fully Fixed Bugs 12/13 20/26 13/18 22/35 3/3 3/6 73/101
# of Partially Fixed Bugs 2/4 3/6 1/4 1/5 0/0 1/1 8/20

We provide numbers: is the number of correctly fixed bugs; is the number of bugs fixed with plausible patches. The same notation applies to Table 7.

Table 5. Number of bugs fixed by fix patterns with .

Among 395 bugs in the Defects4J benchmark, can generate plausible patches for 101 bugs. Among these bugs, 73 bugs are fixed with correct patches. We also note that can partially fix888Partial fix: a patch makes the buggy program pass a part of previously failed test cases without causing any new failed test cases (Liu et al., 2019a). 20 bugs with plausible patches, and 8 of them are correct. In a previous study, the kPAR (Liu et al., 2019a) baseline tool (i.e., a Java implementation of the PAR (Kim et al., 2013) seminal template-based APR tool) was correctly/plausibly fixing 36/55 Defects4J bugs when assuming perfect localization.

   public String generateToolTipFragment(String toolTipText) {
-     return " title=\"" + toolTipText
+     return " title=\"" + ImageMapUtilities.htmlEscape(toolTipText)
             + "\" alt=\"\"";
   }
   Code Change Action:
   Replace variable "toolTipText" with a method invocation expression "ImageMapUtilities.htmlEscape(toolTipText)".
   Matchable fix pattern: FP9.2.
Figure 2. Patch and code change action of fixing bug C-10.
Bug ID FP1 FP2 FP3 FP4 FP5 FP6 FP7 FP8 FP9 FP10 FP11 FP12 FP13 FP14 FP15
1 2 3 4 5 1 2 3 4 1 2 3 1 2 1 2 3 1 2 1 2 3 4 1 2 3 1 2 1 2
C-1 1/4
C-4 2/3
C-7 1/2
C-8 1/1
C-9 1/2
C-11 1/1
C-12 1/1
C-14 2/3
C-18 1/5
C-19 1/1
C-20 1/1
C-24 1/1
C-25 1/3
C-26 2/3
Cl-2 1/3
Cl-4 1/1
Cl-6 1/6
Cl-10 1/1
Cl-11 1/5
Cl-13 1/1
Cl-18 1/2
Cl-21 1/5
Cl-22 1/5
Cl-31 1/2
Cl-38 1/3
Cl-40 1/1
Cl-46 1/1
Cl-62 1/5
Cl-63 1/5
Cl-70 1/1
Cl-73 1/1
Cl-85 1/1
Cl-86 1/1
Cl-102 2/2
Cl-106 1/2
Cl-115 1/5
Cl-126 1/6
L-6 1/1
L-7 1/4
L-10 2/2
L-15 1/5
L-22 1/5
L-24 1/1
L-26 1/1
L-33 1/1
L-39 1/3
L-47 1/1
L-51 1/1
L-57 2/5
L-59 1/1
L-63 1/7
M-4 1/1
M-5 1/2
M-11 4/4
M-15 1/1
M-22 1/2
M-30 1/1
M-33 1/3
M-34 1/1
M-35 1/1
M-50 1/9
M-57 1/1
M-58 1/1
M-59 1/2
M-65 1/1
M-70 1/1
M-75 1/1
M-77 2/4
M-79 1/1
M-80 1/4
M-82 1/5
M-85 3/8
M-89 1/1
M-98 1/1
Mc-26 1/1
Mc-29 2/2
Mc-38 2/2
T-3 1/1
T-7 1/2
T-19 1/2
T-26 1/1
# 1 1 6 5 4 1 1 0 3 1 0 1 0 1 3 5 3 0 1 1 1 6 0 3 1 1 3 11 1 0 0 12 2 2 13 2
# 2 1 7 10 6 1 1 0 4 1 0 14 0 15 12 32 3 0 1 1 1 6 7 4 2 2 3 24 2 0 1 43 19 5 25 4

⚫ indicates that the bug is correctly fixed and ❍ indicates that the generated patch is plausible but not correct. ◐means that the fix pattern can generate both correct patch and plausible patch for a bug. and denote that the bug can be partially fixed by the corresponding fix pattern. In the last column, we provide x/y numbers: x is the number of fix patterns that can generate correct patches for a bug, and y is the number of fix patterns that can generate plausible patches for a bug. Note that, the bugs that can be plausible but incorrectly fixed by fix patterns are not shown in this table. # 1: number of bugs correctly fixed by a fix pattern. # 2: number of bugs plausible fixed by a fix pattern.

Table 6. Defects4j bugs fixed by fix patterns.

While the results of are promising, a large portion (79%999314 bugs = 395 - 73 - 8.) of Defects4J’s real bugs cannot be correctly fixed with the available fix patterns. We manually investigated these unfixed bugs and make the following observations as research directions for improving the fix rates:

  1. [leftmargin=*]

  2. Insufficient fix patterns. Many bugs are not fixed by simply due to the absence of matching fix patterns. This suggests that the fix patterns collected in the literature are far from being representative for real-world bugs. The community must thus keep contributing with effective techniques for mining fix patterns from existing patches.

  3. Ineffective search of fix ingredients. Template-based program repair is a kind of search-based program repair (Wen et al., 2018): some fix patterns require donor code (i.e., fix ingredients) to generate actual patches. For example, as shown in Figure 2, to apply the relevant fix pattern FP9.2, one needs to identify fix ingredient “ImageMapUtilities.htmlEscape” as the necessary in generating the patch. In the currently naïve implementation of , donor code fragments are searched within the code available around the buggy code location (i.e., in the same file). Therefore, some bugs cannot be fixed by although its fix pattern can match with code change actions. With more effective search strategies (e.g., larger search space such as fix ingredients from other projects as in  (Liu et al., 2018a)), there might be more chances to fix more bugs.

[tile,size=fbox,boxsep=1mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=1mm0ptblue!5!white,colback=blue!5!white] RQ1: The collected fix patterns can be used to correctly fix 74 real bugs from the Defects4J dataset. A larger portion of the dataset remains however unfixed by , notably due to (1) the limitations of the fix patterns set and to (2) the naïve search strategy for finding relevant fix ingredients to build concrete patches from patterns.

Figure 3 summarizes the statistics on the number of bugs that can be fixed by one or several fix patterns. The Y-axis denotes the number of fix patterns (i.e., 1, 2, 3, 4, 5, and ¿5) that can generate plausible patches for a number of bugs (X-axis). The legend indicates that “P” represents the number of plausible patches generated by (i.e., those that are not found to be correct). “#”, where , indicates that a bug can be correctly fixed by only fix patterns (although it may be plausibly fixed by more fix patterns).

Figure 3. The number of bugs plausibly and correctly fixed by single or multiple fix patterns.

Consider for the bottom-most bar in Figure 3: 66 (=28+38) bugs can be plausibly fixed by a single pattern (Y-axis value is 1); it turns out that only 38 of them are correctly fixed. Note that several patterns can generate (plausible) patches for a bug, but not all patches are necessarily correct. For example, in the case of the top-most bar in Figure 3, 5 bugs are each plausibly fixed by over 5 fix patterns. However, only 1 bug is correctly fixed by 3 fix patterns.

In summary, 86% (=) of correctly fixed bugs (74 fully and 7 partially fixed bugs) are exclusively fixed correctly by single patterns. In other words, generally, several fix patterns can generate patches that can pass all test cases but, in most cases, the bug is correctly fixed by only one pattern. This finding suggests that it is necessary to carefully select an appropriate fix pattern when attempting to fix a bug, in order to avoid plausible patches which may prevent the discovery of correct patches by halting the repair process (given that all tests are passing on the plausible patch).

[tile,size=fbox,boxsep=1mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=1mm0ptblue!5!white,colback=blue!5!white] RQ2: Some bugs can be plausibly fixed by different fix patterns. However, in most cases, only one fix pattern is adequate for generating a correct patch. This finding suggests a need for new research on fix pattern prioritization.

Table 6 details which bug is fixed by which fix pattern(s). We note that five fix patterns (i.e., FP3, FP4.3, FP5, FP7.2 and FP11.3) cannot be used to generate a plausible patch for any Defects4J bug. Two fix patterns (i.e., FP9.2 and FP12) lead to plausible patches for some bugs, but none of those patches is correct. These results do not necessarily suggest that the aforementioned fix patterns are useless (or ineffective) in APR. Instead, two reasons can explain their performance:

  • [leftmargin=*]

  • The search for donor code may be inefficient for finding relevant ingredients for applying these patterns

  • The Defects4J dataset does not contain the types of bugs that can be addressed by these fix patterns.

In addition, twenty (20) fix patterns lead to the generation of correct patches for some bugs. Most of these fix patterns are involved in the generation of plausible patches (which turn out to be incorrect). Interestingly, we found the cases of six (6) fix patterns which can generate several101010We remind the reader that in this experiment generates and assesses all possible patch candidates for a given pair ”bug location - fix pattern” with varying ingredients. patch candidates, some which being correct and others being only plausible, for the same 10 bugs (as indicated in Table 6 with ‘◐’). This observation further highlights the importance of selecting a relevant donor code for synthesizing patches: selecting an inappropriate donor code can lead to the generation of a plausible (but incorrect) patch, which will impede the generation of correct patches in a typical repair pipeline.

[tile,size=fbox,boxsep=1mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=1mm0ptblue!5!white,colback=blue!5!white] Aside from fix patterns, fix ingredients collected in donor code are essential to be properly selected to avoid patches that are plausible but may yet be incorrect.

We further inspect properties of fix patterns, such as change actions, granularity, and the number of changed statements in patches. The statistics are shown in Figure 4, highlighting the number of plausible (but incorrect) and correct patches for the different property dimensions through which fix patterns can be categorized.

Figure 4. # fixed bugs in terms of the fix pattern qualitative.

More bugs are fixed by Update change actions than any by any other actions. Similarly, fix patterns targeting expressions fix more bugs correctly than patterns targeting statement and method granularity levels. However, fix patterns mutating whole statements have a higher rate of correct patches among their plausible generated patches. Finally, fix patterns changing only single statements can correctly fix more bugs than those touching multiple statements. Fix patterns targeting multi-statements have however a higher rate of correctness.

[tile,size=fbox,boxsep=1mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=1mm0ptblue!5!white,colback=blue!5!white] RQ3: There are noticeable differences between successful repair among fix patterns depending on their properties related to implemented change actions, change granularity and change spread.

Proj. jGenProg jKali jMutRepair HDRepair Nopol ACS ELIXIR JAID ssFix CapGen SketchFix FixMiner LSRepair SimFix kPAR AVATAR TBar
Fully fixed Partially fixed
C 0/7 0/6 1/4 0/2 1/6 2/2 4/7 2/4 3/7 4/4 6/8 5/8 3/8 4/8 3/10 5/12 9/14 0/4
Cl 0/0 0/0 0/0 0/7 0/0 0/0 0/0 5/11 2/11 0/0 3/5 5/5 0/0 6/8 5/9 8/12 8/12 1/5
L 0/0 0/0 0/1 2/6 3/7 3/4 8/12 1/8 5/12 5/5 3/4 2/3 8/14 9/13 1/8 5/11 5/14 0/3
M 5/18 1/14 2/11 4/7 1/21 12/16 12/19 1/8 10/26 12/16 7/8 12/14 7/14 14/26 7/18 6/13 19/36 0/4
Mc 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 1/1 0/0 1/2 2/2 1/2 0/0
T 0/2 0/2 0/1 0/1 0/1 1/1 2/3 0/0 0/4 0/0 0/1 1/1 0/0 1/1 1/2 1/3 1/3 1/2
Total 5/27 1/22 3/17 6/23 5/35 18/23 26/41 9/31 20/60 21/25 19/26 25/31 19/37 34/56 18/49 27/53 43/81 2/18
P(%) 18.5 4.5 17.6 26.1 14.3 78.3 63.4 29.0 33.3 84.0 73.1 80.6 51.4 60.7 36.7 50.9 53.1 11.1

“P” is the probability of generated plausible patches to be correct. The data of other APR tools are excerpted from the corresponding work. kPAR 

(Liu et al., 2019a) is an open-source implementation of PAR (Kim et al., 2013).

Table 7. Comparing TBar against the state-of-the-art APR tools.
FP1 FP2 FP3 FP4 FP5 FP6 FP7 FP8 FP9 FP10 FP11 FP12 FP13 FP14 FP15
1 2 3 4 5 1 2 3 4 1 2 3 1 2 1 2 3 1 2 1 2 3 4 1 2 3 1 2 1 2
Correct 1 4 2 1 0 1 0 1 0 0 0 0 0 0 3 3 0 0 0 1 2 0 1 1 1 1 7 1 0 0 9 1 0 2 2
Avg position* (1) (16) (1) (5) - (5) - (5) - - - - - - (23) (16) - - - (9) (1) - (2) (62) (6) (1) (12) (18) - - (5) (1) - (2) (1)
Plausible (all) 1 7 4 1 0 1 0 3 0 0 0 0 1 0 11 4 0 0 0 1 4 0 2 2 1 1 12 1 0 0 25 4 1 7 5
Avg position* (1) (12) (191) (5) - (5) - (20) - - - - (8) - (27) (15) - - - (9) (18) - (4) (49) (6) (1) (15) (18) - - (8) (20) (15) (26) (16)

*Average position of the exact buggy position in the list of suspicious statements yield by fault localization tool. The exact buggy positions of some bugs cannot be yield by fault localizaiton tool.

Table 8. Per-pattern repair performance.

4.2. Repair Performance Comparison: TBar vs State-of-the-art APR tools

Our second experiment evaluates TBar in a realistic setting for patch generation, allowing for reliable comparison against the state-of-the-art in the literature. Concretely, we investigate two research questions in Experiment #2. [tile,size=fbox,boxsep=1mm,boxrule=0pt,top=0pt,bottom=0pt,title=Research Questions for Experiment #2, borderline west=1mm0ptblack!50!white,colback=black!5!white]

  • What performance can be achieved by TBar in a standard and practical repair scenario?

  • To what extent are the different fix patterns sensitive to noise in fault localization (i.e., spotting buggy code locations)?

In this experiment we implement a realistic scenario, using a normal fault localization (i.e., no assumption of perfect localization as for ) on Defects4J bugs. To enable a fair comparison with performance results recorded in the literature, TBar leverages a standard configuration in the literature (Liu et al., 2019a) with GZoltar (Campos et al., 2012) and Ochiai (Abreu et al., 2007). Furthermore, TBar does not utilize any additional technique to improve the accuracy of fault localization, such as crashed stack trace (used by ssFix (Xin and Reiss, 2017)), predicate switching (Zhang et al., 2006) (used by ACS (Xiong et al., 2017)), or test case purification (Xuan and Monperrus, 2014b) (used by SimFix (Jiang et al., 2018)).

With respect to the patch generation step, contrary to the experiment with where all positions of multi-locations bugs were known (cf. Section 4.1), TBar adapts a “first-generated and first-selected” strategy to progressively apply fix patterns, one at a time, in various suspicious code locations: TBar generates a patch , using a fix pattern that matches a given bug. If passes a subset of previously-failing test cases without failing any previously-passing test case, TBar selects as a plausible patch for the bug. Then, TBar continues to validate another patch (which can be generated by the same fix pattern on the same code entity with other ingredients, or on another code location). When passes a subset of test cases as , if is generated for the same buggy code entity as , will be abandoned; otherwise, TBar takes as another plausible patch as well. Through this process, TBar creates a patch set = { , , …} of plausible patches. Here, as soon as any patch can pass all the given test cases for a given bug, TBar takes it as a plausible patch for the given bug, which is regarded as a fully-fixed bug, and all will be abandoned. Otherwise, our tool yields as the set of plausible patches that can each partially fix the given bug.

We run the TBar APR system against the buggy programs of the Defects4J dataset. Table 7 presents the performance of TBar in comparison with recent state-of-the-art APR tools from the literature. TBar can fix 81 bugs with plausible patches, 43 of which are correctly fixed. No other APR tool had reached this number of fixed bugs. Nevertheless, its precision (ratio of correct vs. plausible patches) is lower than some recent tools such as CapGen and SimFix which employs sophisticated techniques to select fix ingredients. Nonetheless, it is noteworthy that, despite using fix patterns catalogued in the literature, we can fix three bugs (namely Cl-86,L-47,M-11) which had never been fixed by any APR system.

[tile,size=fbox,boxsep=1mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=1mm0ptblue!5!white,colback=blue!5!white] RQ4: TBar outperforms all recent state-of-the-art APR tools that were evaluated on the Defects4J dataset. It correctly fixes 43 bugs, while the runner-up (SimFix) is reported to correctly fix 34 bugs.

It is noteworthy that TBar performs significantly less than (43 vs. 74 correctly fixed bugs). This result is in line with a recent study (Liu et al., 2019a), which demonstrated that fault localization imprecision is detrimental to APR repair performance. Table 6 summarizes information about the number of bugs each fix pattern contributed to fix with . While only 4 fix patterns did not lead to the generation of any plausible patch when assuming perfect localization. With TBar, it is the case for 13 fix patterns (see Table 8). This observation further confirms the impact of fault localization noise.

We propose to examine the locations where TBar applied fix patterns to generate its plausible but incorrect patches. As shown in Figure 5, TBar has made changes on incorrect positions (i.e., non-buggy locations) for 24 out of the 38 fully-fixed and 15 out of the 16 partially-fixed bugs.

Figure 5. The mutated code positions of plausibly but incorrectly fixed bugs.

Even when TBar applies a fix pattern to the precise buggy location, the generated patch may be incorrect. As shown in Figure 5, 14 patches that fully fix Defects4J bugs mutate the correct locations: in 3 cases, the fix patterns were inappropriate; in 2 other cases, TBar failed to locate relevant donor code; for the remaining, TBar does not support the required fix patterns.

Finally, Figure 6 illustrates the impact of fault localization performance: unfixed bugs (but correctly fixed by ) are generally more poorly localized than correctly fixed bugs. Similarly, we note that many plausible but incorrect patches are generated for bugs which are not well localized (i.e., several false positive buggy locations are mutated leading to plausible but incorrect patches).

Figure 6. Distribution of the positions of buggy code locations in fault localization list of suspicious statements. C and P denote Correctly- and Plausibly- (but incorrectly) fixed bugs, respectively. F and U denote Fixed and Unfixed bugs.

Average positions bugs (in fault localization suspicious list) are also provided in Table 8. It appears that some fix patterns (e.g., FP2.1, FP6.3, FP10.2) can correctly fix bugs that are poorly localized, showing less sensitivity to fault localization noise than others.

[tile,size=fbox,boxsep=1mm,boxrule=0pt,top=0pt,bottom=0pt, borderline west=1mm0ptblue!5!white,colback=blue!5!white] RQ5: Fault localization noise has a significant impact on the performance of TBar. Fix patterns are diversely sensitive to the false positive locations that are recommended as buggy positions.

5. Discussion

Overall, our investigations reveal that a large catalogue of fix patterns can help improve APR performance. However, at the same time, there are other challenges that must be dealt with: more accurate fault localization, effective search of relevant donor code, fix pattern prioritization. While we will work on some of these research directions in future work, we discuss in this section some threats to validity of the study and practical limitations of TBar.

Threats to Validity

Threats to external validity include the target language of this study, i.e., Java. Fix patterns studied in this paper only cover the fix patterns targeting at Java program bugs released by the state-of-the-art pattern-based APR systems. However, we believe that most fix patterns presented in this study could be applied to other languages since fix patterns are illustrated as abstract syntax tree level. Another threat to external validity could be the fix pattern diversity. Our study may not consider all available fix patterns so far in the literature. To reduce this threat, we systematically reviewed research work on pattern-based program repair in the literature.

Our strategy of fix pattern selection can be a threat to internal validity. While our strategy naïvely matches patterns based on the context information around buggy locations. More advanced strategies would give a higher probability to select appropriate patterns to fix more bugs.

Limitations

TBar applies a single fix pattern to a single buggy location once. Our assumption here is that it is not necessary to iteratively apply fix patterns to the same location. This might be a limitation of our tool. If the tool applies multiple fix patterns to a single location iteratively, the search space could be expanded. However, this can cause search space explosion.

6. Related Work

Fault Localization. In general, most APR pipelines start with fault localization (FL), as shown in Figure 1. Once the buggy position is localized, ARP tools can mutate the buggy code entity to generate patches, which might fix a given bug. To identify defect locations in a program, several automated FL techniques have been proposed (Wong et al., 2016): slice-based (Wong et al., 2010; Mao et al., 2014), spectrum-based (Abreu et al., 2009a; Perez et al., 2017), statistics-based (Liblit et al., 2005; Liu et al., 2006), etc.

Spectrum-based FL (SBFL) techniques are widely adopted in APR pipelines since they identify bug position at the statement level. SBFL techniques rely on the ranking metrics (e.g., Trantula (Jones and Harrold, 2005), Ochiai (Abreu et al., 2009b), Op2 (Naish et al., 2011), Barinel (Abreu et al., 2009a), Dstar(Wong et al., 2014)) to calculate the suspiciousness of each statement. GZoltar (Campos et al., 2012) and Ochiai have been widely integrated into APR systems since their effectiveness has been demonstrated in several empirical studies (Steimann et al., 2013; Xie et al., 2013; Xuan and Monperrus, 2014a; Pearson et al., 2017). As reported by Liu et al. (Liu et al., 2019a) and studied in this paper, this FL configuration still has a limitation on localizing buggy locations. Therefore, researchers tried to enhance FL techniques with new techniques, such as predicate switching (Zhang et al., 2006; Xiong et al., 2017) and test case purification (Xuan and Monperrus, 2014b; Jiang et al., 2018).

Patch Generation. Patch generation is another key process of APR pipeline, which is, in other words, searching for another shape of a program (i.e., a patch) in the space of all possible programs (Le Goues et al., 2012a; Long and Rinard, 2016a). If the search space is small, it might not include the correct patches. (Wen et al., 2018). To reduce this threat, a straightforward strategy is to expand the search space, however, which could lead to other two problems: 1) at worst, there still is no correct patch in it; and 2) the expanded search space includes more plausible patches that enlarge the possibility of generating plausible patches before correct ones (Wen et al., 2018; Liu et al., 2018a).

To improve repair performance, many APR systems have been explored to address the search space problem. GenProg (Weimer et al., 2009; Le Goues et al., 2012b) leveraged stochastic method to search patches. Synthesis-based APR systems (Long and Rinard, 2015; Xuan et al., 2017; Xiong et al., 2017) explored to limit the search space on conditional bug fixes by synthesizing new conditional expressions with variables identified from the buggy code. Pattern-based APR tools (Kim et al., 2013; Le et al., 2016b; Saha et al., 2017; Long et al., 2017; Durieux et al., 2017; Le et al., 2017; Liu and Zhong, 2018; Hua et al., 2018; Jiang et al., 2018; Liu et al., 2019b) are designed to purify the search space by following code change schemas to mutate buggy code entities with retrieved donor code. Other APR pipelines focus on specific search methods for donor code or patch synthesizing strategies, to address the search space problem, such as contract-based (Wei et al., 2010; Chen et al., 2017), symbolic execution based (Nguyen et al., 2013), learning based (Long and Rinard, 2016b; Gupta et al., 2017; Rolim et al., 2017; Soto and Le Goues, 2018; Bhatia et al., 2018; White et al., 2019), and donor code searching (Mechtaev et al., 2015; Ke et al., 2015) APR techniques. Various existing APR tools have achieved promising results on fixing real bugs, but there is still an opportunity to improve the performance; for example, mining more fix patterns, improving pattern selection and donor code retrieving strategy, exploring a new strategy for patch generation, and prioritizing bug positions.

Patch Correctness. The ultimate goal of APR systems is to automatically generate a correct patch that can resolve the program defects. At the beginning, patch correctness is evaluated by passing all test cases (Weimer et al., 2009; Kim et al., 2013; Le et al., 2016b). However, these patches could be overfitting (Qi et al., 2015; Le et al., 2018) and even worse than the bug (Smith et al., 2015). Since then, APR systems are evaluated with the precision of generating correct patches (Xiong et al., 2017; Wen et al., 2018; Jiang et al., 2018; Liu et al., 2019b). Recently, researchers start to explore automated frameworks that can identify patch correctness for APR systems automatically (Xiong et al., 2018; Le et al., 2019).

7. Conclusion

In software engineering literature, fix patterns (a.k.a. fix templates) have been studied in various scenarios to understand bug fixes in the wild. They are further implemented in different program repair pipelines to generate patches automatically. Although template-based program repair tools have achieved promising results, no extensive investigation on the effectiveness fix patterns was conducted. We fill this gap in this work by revisiting the repair performance of fix patterns via a systematic study assessing the effectiveness of a variety of fix patterns summarized from the literature. In particular, we build a straightforward template-based APR tool, TBar, which we evaluate on the Defects4J benchmark. On the one hand, assuming a perfect fault localization, TBar is capable of fixing 74/102 bugs correctly/plausibly. On the other hand, in a normal/practical APR pipeline, TBar can correctly fix 43 bugs despite the noise of fault localization false positives. This constitutes a record performance in the literature on Java program repair. We expect TBar to be established as the new baseline APR system, leading researchers to propose better techniques for substantial improvement of the state-of-the-art.

References

  • (1)
  • Abreu et al. (2007) Rui Abreu, Arjan JC Van Gemund, and Peter Zoeteweij. 2007. On the accuracy of spectrum-based fault localization. In Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION. IEEE, 89–98.
  • Abreu et al. (2009b) Rui Abreu, Peter Zoeteweij, Rob Golsteijn, and Arjan JC Van Gemund. 2009b. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software 82, 11 (2009), 1780–1792.
  • Abreu et al. (2009a) Rui Abreu, Peter Zoeteweij, and Arjan JC Van Gemund. 2009a. Spectrum-based multiple fault localization. In Proceedings of the 24th International Conference on Automated Software Engineering. IEEE, 88–99.
  • Bhatia et al. (2018) Sahil Bhatia, Pushmeet Kohli, and Rishabh Singh. 2018. Neuro-symbolic program corrector for introductory programming assignments. In Proceedings of the 40th International Conference on Software Engineering. ACM, 60–70.
  • Britton et al. (2013) Tom Britton, Lisa Jeng, Graham Carver, Paul Cheak, and Tomer Katzenellenbogen. 2013. Reversible debugging software. Judge Bus. School, Univ. Cambridge, Cambridge, UK, Tech. Rep (2013).
  • Campos et al. (2012) José Campos, André Riboira, Alexandre Perez, and Rui Abreu. 2012. Gzoltar: an eclipse plug-in for testing and debugging. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. IEEE/ACM, 378–381.
  • Chen et al. (2017) Liushan Chen, Yu Pei, and Carlo A Furia. 2017. Contract-based program repair without the contracts. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE, 637–647.
  • Coker and Hafiz (2013) Zack Coker and Munawar Hafiz. 2013. Program transformations to fix C integers. In Proceedings of the 35th IEEE/ACM International Conference on Software Engineering. IEEE/ACM, 792–801.
  • Dobolyi and Weimer (2008) Kinga Dobolyi and Westley Weimer. 2008. Changing java’s semantics for handling null pointer exceptions. In Proceedings of the 19th International Symposium on Software Reliability Engineering. IEEE, 47–56.
  • Durieux et al. (2017) Thomas Durieux, Benoit Cornu, Lionel Seinturier, and Martin Monperrus. 2017. Dynamic patch generation for null pointer exceptions using metaprogramming. In Proceedings of the 24th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 349–358.
  • Falleri et al. (2014) Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. ACM, 313–324.
  • Gupta et al. (2017) Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017.

    DeepFix: Fixing Common C Language Errors by Deep Learning. In

    Proceedings of the 31st AAAI Conference on Artificial Intelligence

    . AAAI Press, 1345–1351.
  • Hua et al. (2018) Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid. 2018. Towards practical program repair with on-demand candidate generation. In Proceedings of the 40th International Conference on Software Engineering. ACM, 12–23.
  • Jiang et al. (2018) Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping Program Repair Space with Existing Patches and Similar Code. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 298–309.
  • Jones and Harrold (2005) James A Jones and Mary Jean Harrold. 2005. Empirical evaluation of the tarantula automatic fault-localization technique. In Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering. ACM, 273–282.
  • Just et al. (2014) René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 23rd ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 437–440.
  • Ke et al. (2015) Yalin Ke, Kathryn T Stolee, Claire Le Goues, and Yuriy Brun. 2015. Repairing programs with semantic code search (t). In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 295–306.
  • Kent (2008) Stephen W Kent. 2008. Dynamic error remediation: A case study with null pointer exceptions. University of Texas Master’s Thesis (2008).
  • Kim et al. (2013) Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic patch generation learned from human-written patches. In Proceedings of the 35th International Conference on Software Engineering. IEEE, 802–811.
  • Koyuncu et al. (2018) Anil Koyuncu, Kui Liu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon. 2018. FixMiner: Mining Relevant Fix Patterns for Automated Program Repair. arXiv preprint arXiv:1810.01791 (2018).
  • Le et al. (2017) Xuan-Bach D Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Willem Visser. 2017. S3: syntax-and semantic-guided repair synthesis via programming by examples. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. ACM, 593–604.
  • Le et al. (2016a) Xuan-Bach D Le, Quang Loc Le, David Lo, and Claire Le Goues. 2016a. Enhancing automated program repair with deductive verification. In Proceedings of the 32nd International Conference on Software Maintenance and Evolution. IEEE, 428–432.
  • Le et al. (2018) Xuan Bach D Le, Ferdian Thung, David Lo, and Claire Le Goues. 2018. Overfitting in semantics-based automated program repair. Empirical Software Engineering (2018), 1–27.
  • Le et al. (2019) Xuan-Bach D. Le, Lingfeng Bao, David Lo, Xin Xia, and Shanping Li. 2019. On Reliability of Patch Correctness Assessment. In Proceedings of the 41th International Conference on Software Engineering.
  • Le et al. (2016b) Xuan-Bach D. Le, David Lo, and Claire Le Goues. 2016b. History Driven Program Repair. In Proceedings of the 23rd International Conference on Software Analysis, Evolution, and Reengineering, Vol. 1. IEEE, 213–224.
  • Le Goues et al. (2012a) Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012a. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering. IEEE, 3–13.
  • Le Goues et al. (2012b) Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012b. GenProg: A generic method for automatic software repair. IEEE Transactions on Software Engineering 38, 1 (2012), 54–72.
  • Liblit et al. (2005) Ben Liblit, Mayur Naik, Alice X Zheng, Alex Aiken, and Michael I Jordan. 2005. Scalable statistical bug isolation. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation. ACM, 15–26.
  • Liu et al. (2006) Chao Liu, Long Fei, Xifeng Yan, Jiawei Han, and Samuel P Midkiff. 2006. Statistical debugging: A hypothesis testing-based approach. IEEE Transactions on Software Engineering 32, 10 (2006), 831–848.
  • Liu et al. (2018a) Kui Liu, Koyuncu Anil, Kisub Kim, Dongsun Kim, and Tegawendé F. Bissyandé. 2018a. LSRepair: Live Search of Fix Ingredients for Automated Program Repair. In Proceedings of the 25th Asia-Pacific Software Engineering Conference. 658–662.
  • Liu et al. (2018b) Kui Liu, Dongsun Kim, Tegawendé F Bissyandé, Shin Yoo, and Yves Le Traon. 2018b. Mining fix patterns for findbugs violations. IEEE Transactions on Software Engineering (2018).
  • Liu et al. (2018c) Kui Liu, Dongsun Kim, Anil Koyuncu, Li Li, Tegawendé F Bissyandé, and Yves Le Traon. 2018c. A closer look at real-world patches. In Proceedings of the 34th International Conference on Software Maintenance and Evolution. IEEE, 275–286.
  • Liu et al. (2019a) Kui Liu, Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, and Yves Le Traon. 2019a. You Cannot Fix What You Cannot Find! An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems. In Proceedings of the 12th IEEE International Conference on Software Testing, Verification and Validation. IEEE.
  • Liu et al. (2019b) Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019b. AVATAR : Fixing Semantic Bugs with Fix Patterns of Static Analysis Violations. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE.
  • Liu and Zhong (2018) Xuliang Liu and Hao Zhong. 2018. Mining stackoverflow for program repair. In Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 118–129.
  • Long et al. (2017) Fan Long, Peter Amidon, and Martin Rinard. 2017. Automatic inference of code transforms for patch generation. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. ACM, 727–739.
  • Long and Rinard (2015) Fan Long and Martin Rinard. 2015. Staged program repair with condition synthesis. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, 166–178.
  • Long and Rinard (2016a) Fan Long and Martin Rinard. 2016a. An analysis of the search spaces for generate and validate patch generation systems. In Proceedings of the 38th International Conference on Software Engineering. ACM, 702–713.
  • Long and Rinard (2016b) Fan Long and Martin Rinard. 2016b. Automatic patch generation by learning correct code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, 298–312.
  • Long et al. (2014) Fan Long, Stelios Sidiroglou-Douskos, and Martin Rinard. 2014. Automatic runtime error repair and containment via recovery shepherding. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, Vol. 49. ACM, 227–238.
  • Mao et al. (2014) Xiaoguang Mao, Yan Lei, Ziying Dai, Yuhua Qi, and Chengsong Wang. 2014. Slice-based statistical fault localization. Journal of Systems and Software 89 (2014), 51–62.
  • Martinez and Monperrus (2015) Matias Martinez and Martin Monperrus. 2015. Mining software repair models for reasoning on the search space of automated program fixing. Empirical Software Engineering 20, 1 (2015), 176–205.
  • Martinez and Monperrus (2016) Matias Martinez and Martin Monperrus. 2016. Astor: A program repair library for java. In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 441–444.
  • Martinez and Monperrus (2018) Matias Martinez and Martin Monperrus. 2018. Ultra-Large Repair Search Space with Automatically Mined Templates: The Cardumen Mode of Astor. In Proceedings of the International Symposium on Search Based Software Engineering. Springer, 65–86.
  • Mechtaev et al. (2015) Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. Directfix: Looking for simple program repairs. In Proceedings of the 37th International Conference on Software Engineering-Volume 1. IEEE Press, 448–458.
  • Monperrus (2018) Martin Monperrus. 2018. Automatic software repair: a bibliography. Comput. Surveys 51, 1 (2018), 17:1–17:24.
  • Naish et al. (2011) Lee Naish, Hua Jie Lee, and Kotagiri Ramamohanarao. 2011. A model for spectra-based software diagnosis. ACM Transactions on Software Engineering and Methodology 20, 3 (2011), 11:1–11:32.
  • Nguyen et al. (2013) Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. SemFix: program repair via semantic analysis. In Proceedings of the 35th International Conference on Software Engineering. IEEE, 772–781.
  • NIST (2019) NIST. Last Accessed: Jan. 2019.. Software Errors Cost U.S. Economy $59.5 Billion Annually. http://www.abeacha.com/NIST_press_release_bugs_cost.htm.
  • Pan et al. (2009) Kai Pan, Sunghun Kim, and E James Whitehead. 2009. Toward an understanding of bug fix patterns. Empirical Software Engineering 14, 3 (2009), 286–315.
  • Pearson et al. (2017) Spencer Pearson, José Campos, René Just, Gordon Fraser, Rui Abreu, Michael D Ernst, Deric Pang, and Benjamin Keller. 2017. Evaluating and improving fault localization. In Proceedings of the 39th International Conference on Software Engineering. IEEE/ACM, 609–620.
  • Perez et al. (2017) Alexandre Perez, Rui Abreu, and Arie van Deursen. 2017. A test-suite diagnosability metric for spectrum-based fault localization approaches. In Proceedings of the 39th International Conference on Software Engineering. IEEE/ACM, 654–664.
  • Qi et al. (2015) Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In Proceedings of the 24th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 24–36.
  • Rolim et al. (2017) Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning syntactic program transformations from examples. In Proceedings of the 39th IEEE/ACM International Conference on Software Engineering. IEEE/ACM, 404–415.
  • Rolim et al. (2018) Reudismam Rolim, Gustavo Soares, Rohit Gheyi, and Loris D’Antoni. 2018. Learning Quick Fixes from Code Repositories. arXiv preprint arXiv:1803.03806 (2018).
  • Saha et al. (2017) Ripon K Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R Prasad. 2017. ELIXIR: Effective object-oriented program repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE, 648–659.
  • Smith et al. (2015) Edward K Smith, Earl T Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the cure worse than the disease? Overfitting in automated program repair. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, 532–543.
  • Soto and Le Goues (2018) Mauricio Soto and Claire Le Goues. 2018. Using a probabilistic model to predict bug fixes. In Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 221–231.
  • Steimann et al. (2013) Friedrich Steimann, Marcus Frenkel, and Rui Abreu. 2013. Threats to the validity and value of empirical assessments of the accuracy of coverage-based fault locators. In Proceedings of the 2013 International Symposium on Software Testing and Analysis. ACM, 314–324.
  • Wei et al. (2010) Yi Wei, Yu Pei, Carlo A Furia, Lucas S Silva, Stefan Buchholz, Bertrand Meyer, and Andreas Zeller. 2010. Automated fixing of programs with contracts. In Proceedings of the 19th international symposium on Software testing and analysis. ACM, 61–72.
  • Weimer et al. (2009) Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009.

    Automatically finding patches using genetic programming. In

    Proceedings of the 31st International Conference on Software Engineering. IEEE, 364–374.
  • Wen et al. (2017) Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2017. An empirical analysis of the influence of fault space on search-based automated program repair. arXiv preprint arXiv:1707.05172 (2017).
  • Wen et al. (2018) Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-Aware Patch Generation for Better Automated Program Repair. In Proceedings of the 40th IEEE/ACM International Conference on Software Engineering. IEEE/ACM, 1–11.
  • White et al. (2019) Martin White, Michele Tufano, Matias Martinez, Martin Monperrus, and Denys Poshyvanyk. 2019. Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE.
  • Wong et al. (2010) W Eric Wong, Vidroha Debroy, and Byoungju Choi. 2010. A family of code coverage-based heuristics for effective fault localization. Journal of Systems and Software 83, 2 (2010), 188–208.
  • Wong et al. (2014) W Eric Wong, Vidroha Debroy, Ruizhi Gao, and Yihao Li. 2014. The DStar method for effective software fault localization. IEEE Transactions on Reliability 63, 1 (2014), 290–308.
  • Wong et al. (2016) W Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707–740.
  • Xie et al. (2013) Xiaoyuan Xie, Tsong Yueh Chen, Fei-Ching Kuo, and Baowen Xu. 2013. A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Transactions on Software Engineering and Methodology 22, 4 (2013), 31:1–31:40.
  • Xin and Reiss (2017) Qi Xin and Steven P Reiss. 2017. Leveraging syntax-related code for automated program repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE/ACM, 660–670.
  • Xiong et al. (2018) Yingfei Xiong, Xinyuan Liu, Muhan Zeng, Lu Zhang, and Gang Huang. 2018. Identifying patch correctness in test-based program repair. In Proceedings of the 40th International Conference on Software Engineering. ACM, 789–799.
  • Xiong et al. (2017) Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise condition synthesis for program repair. In Proceedings of the 39th IEEE/ACM International Conference on Software Engineering. IEEE/ACM, 416–426.
  • Xuan et al. (2017) Jifeng Xuan, Matias Martinez, Favio DeMarco, Maxime Clement, Sebastian Lamelas Marcote, Thomas Durieux, Daniel Le Berre, and Martin Monperrus. 2017. Nopol: Automatic repair of conditional statement bugs in java programs. IEEE Transactions on Software Engineering 43, 1 (2017), 34–55.
  • Xuan and Monperrus (2014a) Jifeng Xuan and Martin Monperrus. 2014a. Learning to combine multiple ranking metrics for fault localization. In Proceedings of the 30th International Conference on Software Maintenance and Evolution. IEEE, 191–200.
  • Xuan and Monperrus (2014b) Jifeng Xuan and Martin Monperrus. 2014b. Test case purification for improving fault localization. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 52–63.
  • Zhang et al. (2006) Xiangyu Zhang, Neelam Gupta, and Rajiv Gupta. 2006. Locating faults through automated predicate switching. In Proceedings of the 28th International Conference on Software Engineering. ACM, 272–281.
  • Zhong and Su (2015) Hao Zhong and Zhendong Su. 2015. An empirical study on real bug fixes. In Proceedings of the 37th IEEE/ACM International Conference on Software Engineering-Volume 1. IEEE/ACM, 913–923.