PreciseBugCollector: Extensible, Executable and Precise Bug-fix Collection

09/12/2023
by   Ye He, et al.
0

Bug datasets are vital for enabling deep learning techniques to address software maintenance tasks related to bugs. However, existing bug datasets suffer from precise and scale limitations: they are either small-scale but precise with manual validation or large-scale but imprecise with simple commit message processing. In this paper, we introduce PreciseBugCollector, a precise, multi-language bug collection approach that overcomes these two limitations. PreciseBugCollector is based on two novel components: a) A bug tracker to map the codebase repositories with external bug repositories to trace bug type information, and b) A bug injector to generate project-specific bugs by injecting noise into the correct codebases and then executing them against their test suites to obtain test failure messages. We implement PreciseBugCollector against three sources: 1) A bug tracker that links to the national vulnerability data set (NVD) to collect general-wise vulnerabilities, 2) A bug tracker that links to OSS-Fuzz to collect general-wise bugs, and 3) A bug injector based on 16 injection rules to generate project-wise bugs. To date, PreciseBugCollector comprises 1057818 bugs extracted from 2968 open-source projects. Of these, 12602 bugs are sourced from bug repositories (NVD and OSS-Fuzz), while the remaining 1045216 project-specific bugs are generated by the bug injector. Considering the challenge objectives, we argue that a bug injection approach is highly valuable for the industrial setting, since project-specific bugs align with domain knowledge, share the same codebase, and adhere to the coding style employed in industrial projects.

READ FULL TEXT

page 1

page 7

page 8

research
01/24/2020

Advaita: Bug Duplicity Detection System

Bugs are prevalent in software development. To improve software quality,...
research
12/15/2020

A Quantitative Study of Security Bug Fixes of GitHub Repositories

Software is prone to bugs and failures. Security bugs are those that exp...
research
01/28/2022

TSSB-3M: Mining single statement bugs at massive scale

Single statement bugs are one of the most important ingredients in the e...
research
07/08/2018

Automated labeling of bugs and tickets using attention-based mechanisms in recurrent neural networks

We explore solutions for automated labeling of content in bug trackers a...
research
12/15/2021

00

What is the funniest number in cryptography (Episode 2)? 0 [1]. The reas...
research
11/10/2020

Wayback Machine: Capturing the evolutionary behaviour of the bug dependency graph in open-source software systems

The issue tracking system (ITS) is a rich data source for data-driven de...
research
04/21/2022

On Distribution Shift in Learning-based Bug Detectors

Deep learning has recently achieved initial success in program analysis ...

Please sign up or login with your details

Forgot password? Click here to reset