TSSB-3M: Mining single statement bugs at massive scale

01/28/2022
by   Cedric Richter, et al.
0

Single statement bugs are one of the most important ingredients in the evaluation of modern bug detection and automatic program repair methods. By affecting only a single statement, single statement bugs represent a type of bug often overlooked by developers, while still being small enough to be detected and fixed by automatic methods. With the rise of data-driven automatic repair the availability of single statement bugs at the scale of millionth of examples is more important than ever; not only for testing these methods but also for providing sufficient real world examples for training. To provide access to bug fix datasets of this scale, we are releasing two datasets called SSB-9M and TSSB-3M. While SSB-9M provides access to a collection of over 9M general single statement bug fixes from over 500K open source Python projects , TSSB-3M focuses on over 3M single statement bugs which can be fixed solely by a single statement change. To facilitate future research and empirical investigations, we annotated each bug fix with one of 20 single statement bug (SStuB) patterns typical for Python together with a characterization of the code change as a sequence of AST modifications. Our initial investigation shows that at least 40 SStuB pattern, and that the majority of 72 same syntactic modifications as needed for fixing SStuBs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2019

How Often Do Single-Statement Bugs Occur? The ManySStuBs4J Dataset

Program repair is an important but difficult software engineering proble...
research
07/01/2022

Can we learn from developer mistakes? Learning to localize and repair real bugs from real bug fixes

Real bug fixes found in open source repositories seem to be the perfect ...
research
04/03/2022

A Study of Single Statement Bugs Involving Dynamic Language Features

Dynamic language features are widely available in programming languages ...
research
09/25/2021

Constructing Regression Dataset from Code Evolution History

Bug datasets consisting of real-world bugs are important artifacts for r...
research
09/12/2023

PreciseBugCollector: Extensible, Executable and Precise Bug-fix Collection

Bug datasets are vital for enabling deep learning techniques to address ...
research
03/17/2021

On the Rise and Fall of Simple Stupid Bugs: a Life-Cycle Analysis of SStuBs

Bug detection and prevention is one of the most important goals of softw...
research
12/20/2018

An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation

Millions of open-source projects with numerous bug fixes are available i...

Please sign up or login with your details

Forgot password? Click here to reset