Prioritizing documentation effort: Can we do better?

06/18/2020
by   Shiran Liu, et al.
0

Code documentations are essential for software quality assurance, but due to time or economic pressures, code developers are often unable to write documents for all modules in a project. Recently, a supervised artificial neural network (ANN) approach is proposed to prioritize important modules for documentation effort. However, as a supervised approach, there is a need to use labeled training data to train the prediction model, which may not be easy to obtain in practice. Furthermore, it is unclear whether the ANN approach is generalizable, as it is only evaluated on several small data sets. In this paper, we propose an unsupervised approach based on PageRank to prioritize documentation effort. This approach identifies "important" modules only based on the dependence relationships between modules in a project. As a result, the PageRank approach does not need any training data to build the prediction model. In order to evaluate the effectiveness of the PageRank approach, we use six additional large data sets to conduct the experiments in addition to the same data sets collected from open-source projects as used in prior studies. The experimental results show that the PageRank approach is superior to the state-of-the-art ANN approach in prioritizing important modules for documentation effort. In particular, due to the simplicity and effectiveness, we advocate that the PageRank approach should be used as an easy-to-implement baseline in future research on documentation effort prioritization, and any new approach should be compared with it to demonstrate its effectiveness.

READ FULL TEXT

page 1

page 12

research
10/29/2019

MAT: A simple yet strong baseline for identifying self-admitted technical debt

In the process of software evolution, developers often sacrifice the lon...
research
03/14/2018

Predicting Oral Disintegrating Tablet Formulations by Neural Network Techniques

Oral Disintegrating Tablets (ODTs) is a novel dosage form that can be di...
research
01/27/2021

An extensive empirical study of inconsistent labels in multi-version-project defect data sets

The label quality of defect data sets has a direct influence on the reli...
research
12/28/2017

Connecting Software Metrics across Versions to Predict Defects

Accurate software defect prediction could help software practitioners al...
research
10/10/2022

Towards Developing and Analysing Metric-Based Software Defect Severity Prediction Model

In a critical software system, the testers have to spend an enormous amo...
research
05/15/2021

Generative Adversarial Network-based Cross-Project Fault Prediction

Background: The early stage of defect prediction in the software develop...
research
04/28/2020

Development of hybrid artificial intelligent based handover decision algorithm

The possibility of seamless handover remains a mirage despite the pletho...

Please sign up or login with your details

Forgot password? Click here to reset