Using Source Code Density to Improve the Accuracy of Automatic Commit Classification into Maintenance Activities

by   Sebastian Hönel, et al.

Source code is changed for a reason, e.g., to adapt, correct, or adapt it. This reason can provide valuable insight into the development process but is rarely explicitly documented when the change is committed to a source code repository. Automatic commit classification uses features extracted from commits to estimate this reason. We introduce source code density, a measure of the net size of a commit, and show how it improves the accuracy of automatic commit classification compared to previous size-based classifications. We also investigate how preceding generations of commits affect the class of a commit, and whether taking the code density of previous commits into account can improve the accuracy further. We achieve up to 89 commit classification where the model is trained on one project and applied to other projects. Models trained on single projects yield accuracies of up to 93 with a Kappa approaching 0.90. The accuracy of the automatic commit classification has a direct impact on software (process) quality analyses that exploit the classification, so our improvements to the accuracy will also improve the confidence in such analyses.



There are no comments yet.


page 12

page 13


Boosting Automatic Commit Classification Into Maintenance Activities By Utilizing Source Code Changes

Background: Understanding maintenance activities performed in a source c...

LabelGit: A Dataset for Software Repositories Classification using Attributed Dependency Graphs

Software repository hosting services contain large amounts of open-sourc...

A Large-Scale Study on Source Code Reviewer Recommendation

Context: Software code reviews are an important part of the development ...

Refactoring Graphs: Assessing Refactoring over Time

Refactoring is an essential activity during software evolution. Frequent...

Function completion in the time of massive data: A code embedding perspective

Code completion is an important feature of integrated development enviro...

Modeling Vocabulary for Big Code Machine Learning

When building machine learning models that operate on source code, sever...

Technical Reports Compilation: Detecting the Fire Drill anti-pattern using Source Code and issue-tracking data

Detecting the presence of project management anti-patterns (AP) currentl...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.