Leveraging the Defects Life Cycle to Label Affected Versions and Defective Classes

11/11/2020
by   Bailey Vandehei, et al.
0

Two recent studies explicitly recommend labeling defective classes in releases using the affected versions (AV) available in issue trackers. The aim our study is threefold: 1) to measure the proportion of defects for which the realistic method is usable, 2) to propose a method for retrieving the AVs of a defect, thus making the realistic approach usable when AVs are unavailable, 3) to compare the accuracy of the proposed method versus three SZZ implementations. The assumption of our proposed method is that defects have a stable life cycle in terms of the proportion of the number of versions affected by the defects before discovering and fixing these defects. Results related to 212 open-source projects from the Apache ecosystem, featuring a total of about 125,000 defects, reveal that the realistic method cannot be used in the majority (51 methods to retrieve AVs. Results related to 76 open-source projects from the Apache ecosystem, featuring a total of about 6,250,000 classes, affected by 60,000 defects, and spread over 4,000 versions and 760,000 commits, reveal that the proportion of the number of versions between defect discovery and fix is pretty stable (STDV < 2) across the defects of the same project. Moreover, the proposed method resulted significantly more accurate than all three SZZ implementations in (i) retrieving AVs, (ii) labeling classes as defective, and (iii) in developing defects repositories to perform feature selection. Thus, when the realistic method is unusable, the proposed method is a valid automated alternative to SZZ for retrieving the origin of a defect. Finally, given the low accuracy of SZZ, researchers should consider re-executing the studies that have used SZZ as an oracle and, in general, should prefer selecting projects with a high proportion of available and consistent AVs.

READ FULL TEXT
research
10/31/2017

A Prediction Model of the Project Life-span in Open Source Software Ecosystem

In nature ecosystems, animal life-spans are determined by genes and some...
research
05/15/2019

A Preliminary Theory for Open Source Ecosystem Micro-economics

While there has been substantial empirical work identifying factors that...
research
03/31/2020

On the Need of Removing Last Releases of Data When Using or Validating Defect Prediction Models

To develop and train defect prediction models, researchers rely on datas...
research
08/06/2020

Newcomer Candidate: Characterizing Contributions of a Novice Developer to GitHub

Context: To attract, onboard, and retain any new-comer in Open Source So...
research
04/06/2023

Tag that issue: Applying API-domain labels in issue tracking systems

Labeling issues with the skills required to complete them can help contr...
research
10/06/2022

Trust in Motion: Capturing Trust Ascendancy in Open-Source Projects using Hybrid AI

Open-source is frequently described as a driver for unprecedented commun...

Please sign up or login with your details

Forgot password? Click here to reset