Agile Effort Estimation: Have We Solved the Problem Yet? Insights From A Second Replication Study (GPT2SP Replication Report)

09/01/2022
by   Vali Tawosi, et al.
0

Fu and Tantithamthavorn have recently proposed GPT2SP, a Transformer-based deep learning model for SP estimation of user stories. They empirically evaluated the performance of GPT2SP on a dataset shared by Choetkiertikul et al including 16 projects with a total of 23,313 issues. They benchmarked GPT2SP against two baselines (namely the naive Mean and Median estimators) and the method previously proposed by Choetkiertikul et al. (which we will refer to as DL2SP from now on) for both within- and cross-project estimation scenarios, and evaluated the extent to which each components of GPT2SP contribute towards the accuracy of the SP estimates. Their results show that GPT2SP outperforms DL2SP with a 6 improvement for the cross-project scenarios. However, when we attempted to use the GPT2SP source code made available by Fu and Tantithamthavorn to reproduce their experiments, we found a bug in the computation of the Mean Absolute Error (MAE), which may have inflated the GPT2SP's accuracy reported in their work. Therefore, we had issued a pull request to fix such a bug, which has been accepted and merged into their repository at https://github.com/awsm-research/gpt2sp/pull/2. In this report, we describe the results we achieved by using the fixed version of GPT2SP to replicate the experiments conducted in the original paper for RQ1 and RQ2. Following the original study, we analyse the results considering the Medan Absolute Error (MAE) of the estimation methods over all issues in each project, but we also report the Median Absolute Error (MdAE) and the Standard accuracy (SA) for completeness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/14/2022

Deep Learning for Agile Effort Estimation Have We Solved the Problem Yet?

In the last decade, several studies have proposed the use of automated t...
research
01/26/2020

Reproducibility Challenge NeurIPS 2019 Report on "Competitive Gradient Descent"

This is a report for reproducibility challenge of NeurlIPS 2019 on the p...
research
04/28/2021

[Re] Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

Singh et al. (2020) point out the dangers of contextual bias in visual r...
research
09/11/2019

Iterative versus Exhaustive Data Selection for Cross Project Defect Prediction: An Extended Replication Study

Context: The effectiveness of data selection approaches in improving the...
research
03/12/2018

Replication study: Development and validation of deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs

We have replicated some experiments in 'Development and validation of a ...
research
06/22/2022

Heterogeneous Graph Neural Networks for Software Effort Estimation

Software effort can be measured by story point [35]. Current approaches ...
research
12/01/2022

Duplicate Bug Report Detection: How Far Are We?

Many Duplicate Bug Report Detection (DBRD) techniques have been proposed...

Please sign up or login with your details

Forgot password? Click here to reset