Automatic Detection and Analysis of Technical Debts in Peer-Review Documentation of R Packages

01/11/2022
by   Junaed Younus Khan, et al.
0

Technical debt (TD) is a metaphor for code-related problems that arise as a result of prioritizing speedy delivery over perfect code. Given that the reduction of TDs can have long-term positive impact in the software engineering life-cycle (SDLC), TDs are studied extensively in the literature. However, very few of the existing research focused on the technical debts of R programming language despite its popularity and usage. Recent research by Codabux et al. [21] finds that R packages can have 10 diverse TD types analyzing peer-review documentation. However, the findings are based on the manual analysis of a small sample of R package review comments. In this paper, we develop a suite of Machine Learning (ML) classifiers to detect the 10 TDs automatically. The best performing classifier is based on the deep ML model BERT, which achieves F1-scores of 0.71 - 0.91. We then apply the trained BERT models on all available peer-review issue comments from two platforms, rOpenSci and BioConductor (13.5K review comments coming from a total of 1297 R packages). We conduct an empirical study on the prevalence and evolution of 10 TDs in the two R platforms. We discovered documentation debt is the most prevalent among all types of TD, and it is also expanding rapidly. We also find that R packages of generic platform (i.e. rOpenSci) are more prone to TD compared to domain-specific platform (i.e. BioConductor). Our empirical study findings can guide future improvements opportunities in R package documentation. Our ML models can be used to automatically monitor the prevalence and evolution of TDs in R package documentation.

READ FULL TEXT

page 1

page 7

page 9

research
03/16/2021

Technical Debt in the Peer-Review Documentation of R Packages: a rOpenSci Case Study

Context: Technical Debt is a metaphor used to describe code that is "not...
research
12/02/2020

Empirical Study on the Software Engineering Practices in Open Source ML Package Repositories

Recent advances in Artificial Intelligence (AI), especially in Machine L...
research
10/08/2021

ALL-IN-ONE: Multi-Task Learning BERT models for Evaluating Peer Assessments

Peer assessment has been widely applied across diverse academic fields o...
research
02/07/2022

Exploratory analysis of text duplication in peer-review reveals peer-review fraud and paper mills

Comments received from referees during peer-review were analysed to dete...
research
01/27/2022

An Empirical Study of Yanked Releases in the Rust Package Registry

Cargo, the software packaging manager of Rust, provides a yank mechanism...
research
07/07/2023

ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments

Background: The existence of toxic conversations in open-source platform...
research
01/22/2020

CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking

We present CodeReef - an open platform to share all the components neces...

Please sign up or login with your details

Forgot password? Click here to reset