Automatic Repair of Real Bugs in Java: A Large-Scale Experiment on the Defects4J Dataset

11/04/2018
by   Matias Martinez, et al.
0

Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J comes with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to explore the effectiveness of automatic test-suite based repair on Defects4J. The result of our experiment shows that the considered state-of-the-art repair methods can generate patches for 47 out of 224 bugs. However, those patches are only test-suite adequate, which means that they pass the test suite and may potentially be incorrect beyond the test-suite satisfaction correctness criterion. We have manually analyzed 84 different patches to assess their real correctness. In total, 9 real Java bugs can be correctly repaired with test-suite based repair. This analysis shows that test-suite based repair suffers from under-specified bugs, for which trivial or incorrect patches still pass the test suite. With respect to practical applicability, it takes on average 14.8 minutes to find a patch. The experiment was done on a scientific grid, totaling 17.6 days of computation time. All the repair systems and experimental results are publicly available on Github in order to facilitate future research on automatic repair.

READ FULL TEXT
research
05/09/2018

A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark

Automatic program repair papers tend to repeatedly use the same benchmar...
research
05/28/2019

Empirical Review of Java Program Repair Tools: A Large-Scale Experiment on 2,141 Bugs and 23,551 Repair Attempts

In the past decade, research on test-suite-based automatic program repai...
research
11/10/2018

Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs

We propose NOPOL, an approach to automatic repair of buggy conditional s...
research
12/21/2017

ARJA: Automated Repair of Java Programs via Multi-Objective Genetic Programming

Recent empirical studies show that the performance of GenProg is not sat...
research
04/06/2021

A large-scale study on human-cloned changes for automated program repair

Research in automatic program repair has shown that real bugs can be aut...
research
12/11/2017

Open-ended Exploration of the Program Repair Search Space with Mined Templates: the Next 8935 Patches for Defects4J

In this paper our goal is to perform an open-ended exploration of the pr...
research
07/29/2023

Neural-Based Test Oracle Generation: A Large-scale Evaluation and Lessons Learned

Defining test oracles is crucial and central to test development, but ma...

Please sign up or login with your details

Forgot password? Click here to reset