How Often Do Single-Statement Bugs Occur? The ManySStuBs4J Dataset

Program repair is an important but difficult software engineering problem. One way to achieve a "sweet spot" of low false positive rates, while maintaining high enough recall to be usable, is to focus on repairing classes of simple bugs, such as bugs with single statement fixes, or that match a small set of bug templates (Long and Rinard, 2016; Pradel and Sen, 2018). However, it is very difficult to estimate the recall of repair techniques based on templates or based on repairing simple bugs, as there are no datasets about how often the associated bugs occur in code. To fill this gap, we provide two versions of the dataset containing 24412 and 153751 single statement bug-fix changes mined from 100 popular open-source Java Maven projects and from 1000 popular open-source Java projects respectively, annotated by whether they match any of a set of 16 bug templates, inspired by state-of-the-art program repair techniques. We also administer a repository of Maven dependencies for the 100 projects dataset to facilitate tools that require building the projects. We hope that this dataset will prove a resource both for future work in automatic program repair and also for future studies in empirical software engineering. In an initial analysis, we find that for both datasets about 33 bug fixes match the templates, indicating that a remarkable number of single-statement bugs can be repaired with a relatively small set of templates. Further, we find that SStuBs appear with a frequency of about one bug per 1600-2500 lines of code (as measured by the size of the project's latest version), allowing researchers to make an informed case about the potential impact of improved program repair methods.

READ FULL TEXT
research
01/17/2019

Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies

Benchmarks of bugs are essential to empirically evaluate automatic progr...
research
01/28/2022

TSSB-3M: Mining single statement bugs at massive scale

Single statement bugs are one of the most important ingredients in the e...
research
03/22/2021

Applying CodeBERT for Automated Program Repair of Java Simple Bugs

Software debugging, and program repair are among the most time-consuming...
research
09/28/2018

Memory and Resource Leak Defects and their Repairs in Java Projects

Despite huge software engineering efforts and programming language suppo...
research
06/13/2018

Detecting Speech Act Types in Developer Question/Answer Conversations During Bug Repair

This paper targets the problem of speech act detection in conversations ...
research
08/29/2019

Analyzing the Context of Bug-Fixing Changes in the OpenStack Cloud Computing Platform

Many research areas in software engineering, such as mutation testing, a...
research
09/25/2021

Constructing Regression Dataset from Code Evolution History

Bug datasets consisting of real-world bugs are important artifacts for r...

Please sign up or login with your details

Forgot password? Click here to reset