Evaluating SZZ Implementations Through a Developer-informed Oracle

02/05/2021
by   Giovanni Rosa, et al.
0

The SZZ algorithm for identifying bug-inducing changes has been widely used to evaluate defect prediction techniques and to empirically investigate when, how, and by whom bugs are introduced. Over the years, researchers have proposed several heuristics to improve the SZZ accuracy, providing various implementations of SZZ. However, fairly evaluating those implementations on a reliable oracle is an open problem: SZZ evaluations usually rely on (i) the manual analysis of the SZZ output to classify the identified bug-inducing commits as true or false positives; or (ii) a golden set linking bug-fixing and bug-inducing commits. In both cases, these manual evaluations are performed by researchers with limited knowledge of the studied subject systems. Ideally, there should be a golden set created by the original developers of the studied systems. We propose a methodology to build a "developer-informed" oracle for the evaluation of SZZ variants. We use Natural Language Processing (NLP) to identify bug-fixing commits in which developers explicitly reference the commit(s) that introduced a fixed bug. This was followed by a manual filtering step aimed at ensuring the quality and accuracy of the oracle. Once built, we used the oracle to evaluate several variants of the SZZ algorithm in terms of their accuracy. Our evaluation helped us to distill a set of lessons learned to further improve the SZZ algorithm.

READ FULL TEXT

page 4

page 9

research
08/09/2023

Evaluating SZZ Implementations: An Empirical Study on the Linux Kernel

The SZZ algorithm is used to connect bug-fixing commits to the earlier c...
research
11/11/2022

Using Developer Discussions to Guide Fixing Bugs in Software

Automatically fixing software bugs is a challenging task. While recent w...
research
02/28/2023

Large-Scale Evaluation of Method-Level Bug Localization with FinerBench4BL

Bug localization is an important aspect of software maintenance because ...
research
06/17/2019

Assessing the Quality of the Steps to Reproduce in Bug Reports

A major problem with user-written bug reports, indicated by developers a...
research
06/20/2022

PR-SZZ: How pull requests can support the tracing of defects in software repositories

The SZZ algorithm represents a standard way to identify bug fixing commi...
research
09/07/2022

SZZ in the time of Pull Requests

In the multi-commit development model, programmers complete tasks (e.g.,...

Please sign up or login with your details

Forgot password? Click here to reset