An Automatically Created Novel Bug Dataset and its Validation in Bug Prediction

06/17/2020
by   Rudolf Ferenc, et al.
0

Bugs are inescapable during software development due to frequent code changes, tight deadlines, etc.; therefore, it is important to have tools to find these errors. One way of performing bug identification is to analyze the characteristics of buggy source code elements from the past and predict the present ones based on the same characteristics, using e.g. machine learning models. To support model building tasks, code elements and their characteristics are collected in so-called bug datasets which serve as the input for learning. We present the BugHunter Dataset: a novel kind of automatically constructed and freely available bug dataset containing code elements (files, classes, methods) with a wide set of code metrics and bug information. Other available bug datasets follow the traditional approach of gathering the characteristics of all source code elements (buggy and non-buggy) at only one or more pre-selected release versions of the code. Our approach, on the other hand, captures the buggy and the fixed states of the same source code elements from the narrowest timeframe we can identify for a bug's presence, regardless of release versions. To show the usefulness of the new dataset, we built and evaluated bug prediction models and achieved F-measure values over 0.74.

READ FULL TEXT
research
10/11/2021

Bug Prediction Using Source Code Embedding Based on Doc2Vec

Bug prediction is a resource demanding task that is hard to automate usi...
research
09/06/2023

Method-Level Bug Severity Prediction using Source Code Metrics and LLMs

In the past couple of decades, significant research efforts are devoted ...
research
09/22/2022

Talking Trojan: Analyzing an Industry-Wide Disclosure

While vulnerability research often focuses on technical findings and pos...
research
12/07/2022

Utilizing Source Code Syntax Patterns to Detect Bug Inducing Commits using Machine Learning Models

Detecting Bug Inducing Commit (BIC) or Just in Time (JIT) defect predict...
research
10/26/2021

A Controlled Experiment of Different Code Representations for Learning-Based Bug Repair

Training a deep learning model on source code has gained significant tra...
research
03/15/2021

Does the duration of rapid release cycles affect the bug handling activity?

Software projects are regularly updated with new functionality and bug f...
research
03/02/2021

Follow Your Nose – Which Code Smells are Worth Chasing?

The common use case of code smells assumes causality: Identify a smell, ...

Please sign up or login with your details

Forgot password? Click here to reset