How Different Are Different diff Algorithms in Git? Use --histogram for Code Changes

02/07/2019
by   Yusuf Sulistyo Nugroho, et al.
0

Automatic identification of the differences between two versions of a file is a common and basic task in several applications of mining code repositories. Git, a version control system, has a diff utility and users can select algorithms of diff from the default algorithm Myers to the advanced Histogram algorithm. From our systematic mapping, we identified three popular applications of diff in recent studies. On the impact on code churn metrics in 14 Java projects, we obtained different values in 1.7 the different diff algorithms. Regarding bug-introducing change identification, we found 6.0 of bug-introducing changes from 10 Java projects. For patch application, we found that the Histogram is more suitable than Myers for providing the changes of code, from our manual analysis. Thus, we strongly recommend using the Histogram algorithm when mining Git repositories to consider differences in source code.

READ FULL TEXT
research
04/29/2023

Leveraging Data Mining Algorithms to Recommend Source Code Changes

Context: Recent research has used data mining to develop techniques that...
research
08/29/2019

Analyzing the Context of Bug-Fixing Changes in the OpenStack Cloud Computing Platform

Many research areas in software engineering, such as mutation testing, a...
research
04/06/2022

DiffSearch: A Scalable and Precise Search Engine for Code Changes

The source code of successful projects is evolving all the time, resulti...
research
01/17/2019

Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies

Benchmarks of bugs are essential to empirically evaluate automatic progr...
research
11/28/2019

Using Distributed Representation of Code for Bug Detection

Recent advances in neural modeling for bug detection have been very prom...
research
08/05/2022

Bug-Fix Variants: Visualizing Unique Source Code Changes across GitHub Forks

Forking is a common practice for developers when building upon on alread...
research
11/09/2022

Reproducibility in medical image radiomic studies: contribution of dynamic histogram binning

The de facto standard of dynamic histogram binning for radiomic feature ...

Please sign up or login with your details

Forgot password? Click here to reset