Leveraging Data Mining Algorithms to Recommend Source Code Changes

04/29/2023
by   AmirHossein Naghshzan, et al.
0

Context: Recent research has used data mining to develop techniques that can guide developers through source code changes. To the best of our knowledge, very few studies have investigated data mining techniques and–or compared their results with other algorithms or a baseline. Objectives: This paper proposes an automatic method for recommending source code changes using four data mining algorithms. We not only use these algorithms to recommend source code changes, but we also conduct an empirical evaluation. Methods: Our investigation includes seven open-source projects from which we extracted source change history at the file level. We used four widely data mining algorithms Apriori, FP-Growth, Eclat, and Relim to compare the algorithms in terms of performance (Precision, Recall and F-measure) and execution time. Results: Our findings provide empirical evidence that while some Frequent Pattern Mining algorithms, such as Apriori may outperform other algorithms in some cases, the results are not consistent throughout all the software projects, which is more likely due to the nature and characteristics of the studied projects, in particular their change history. Conclusion: Apriori seems appropriate for large-scale projects, whereas Eclat appears to be suitable for small-scale projects. Moreover, FP-Growth seems an efficient approach in terms of execution time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2021

Why Developers Refactor Source Code: A Mining-based Study

Refactoring aims at improving code non-functional attributes without mod...
research
02/07/2019

How Different Are Different diff Algorithms in Git? Use --histogram for Code Changes

Automatic identification of the differences between two versions of a fi...
research
08/10/2017

More Accurate Recommendations for Method-Level Changes

During the life span of large software projects, developers often apply ...
research
09/01/2018

Test Prioritization in Continuous Integration Environments

Two heuristics namely diversity-based (DBTP) and history-based test prio...
research
04/20/2019

Interviewing the Most Successful Bot on GitHub: Dr Travis CI on 35+ Million of its Jobs

Travis CI handles automatically thousands of builds every day to, amongs...
research
02/20/2017

Kharita: Robust Map Inference using Graph Spanners

The widespread availability of GPS information in everyday devices such ...
research
12/23/2018

A Multi-Objective Anytime Rule Mining System to Ease Iterative Feedback from Domain Experts

Data extracted from software repositories is used intensively in Softwar...

Please sign up or login with your details

Forgot password? Click here to reset