Using Source Code Density to Improve the Accuracy of Automatic Commit Classification into Maintenance Activities

05/28/2020
by   Sebastian Hönel, et al.
0

Source code is changed for a reason, e.g., to adapt, correct, or adapt it. This reason can provide valuable insight into the development process but is rarely explicitly documented when the change is committed to a source code repository. Automatic commit classification uses features extracted from commits to estimate this reason. We introduce source code density, a measure of the net size of a commit, and show how it improves the accuracy of automatic commit classification compared to previous size-based classifications. We also investigate how preceding generations of commits affect the class of a commit, and whether taking the code density of previous commits into account can improve the accuracy further. We achieve up to 89 commit classification where the model is trained on one project and applied to other projects. Models trained on single projects yield accuracies of up to 93 with a Kappa approaching 0.90. The accuracy of the automatic commit classification has a direct impact on software (process) quality analyses that exploit the classification, so our improvements to the accuracy will also improve the confidence in such analyses.

READ FULL TEXT

page 12

page 13

research
11/14/2017

Boosting Automatic Commit Classification Into Maintenance Activities By Utilizing Source Code Changes

Background: Understanding maintenance activities performed in a source c...
research
03/16/2021

LabelGit: A Dataset for Software Repositories Classification using Attributed Dependency Graphs

Software repository hosting services contain large amounts of open-sourc...
research
08/16/2022

Identifying Source Code File Experts

In software development, the identification of source code file experts ...
research
06/20/2018

A Large-Scale Study on Source Code Reviewer Recommendation

Context: Software code reviews are an important part of the development ...
research
03/10/2020

Refactoring Graphs: Assessing Refactoring over Time

Refactoring is an essential activity during software evolution. Frequent...
research
04/03/2019

Modeling Vocabulary for Big Code Machine Learning

When building machine learning models that operate on source code, sever...
research
04/30/2021

Technical Reports Compilation: Detecting the Fire Drill anti-pattern using Source Code and issue-tracking data

Detecting the presence of project management anti-patterns (AP) currentl...

Please sign up or login with your details

Forgot password? Click here to reset