Log In Sign Up

Prevalence, Contents and Automatic Detection of KL-SATD

by   Leevi Rantala, et al.

When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52 contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic Lasso regression has good performance in detecting KL-SATD comments (AUC-ROC 0.88). Finally, we demonstrate that using machine learning we can identify comments that are currently missing but which should have a SATD keyword in them. Automating SATD identification of comments that lack SATD keywords can save time and effort by replacing manual identification of comments. Using KL-SATD offers a potential to bootstrap a complete SATD detector.


page 1

page 2

page 3

page 4


Preprocessing Source Code Comments for Linguistic Models

Comments are an important part of the source code and are a primary sour...

Generating Comments From Source Code with CCGs

Good comments help developers understand software faster and provide bet...

Identifying Self-Admitted Technical Debt in Issue Tracking Systems using Machine Learning

Technical debt is a metaphor indicating sub-optimal solutions implemente...

FixMe: A GitHub Bot for Detecting and Monitoring On-Hold Self-Admitted Technical Debt

Self-Admitted Technical Debt (SATD) is a special form of technical debt ...

Deep Just-In-Time Inconsistency Detection Between Comments and Source Code

Natural language comments convey key aspects of source code such as impl...

Characterizing and Mitigating Self-Admitted Build Debt

Technical Debt is a metaphor used to describe the situation in which lon...

Data Balancing Improves Self-Admitted Technical Debt Detection

A high imbalance exists between technical debt and non-technical debt so...