On the Nature of Code Cloning in Open-Source Java Projects

07/09/2021
by   Yaroslav Golubev, et al.
0

Code cloning plays a very important role in open-source software engineering. The presence of clones within a project may indicate a need for refactoring, and clones between projects are even more interesting, since code migration takes place and violations are possible. But how is code being copied? How prevalent is the process and on what level does it happen? In this general study, we attempt to shed some light on these questions by searching for clones in a large dataset of over 23 thousand Java projects on the level of both files and methods, and by studying the code fragments themselves and their clone pairs. We study the size and the age of code fragments, the prevalence of their clones, relationships between exact and non-exact clones, as well as between method-level and file-level clones. We also discover and describe various anomalies in the code clones that were detected in the dataset. Our research shows that the copying occurs all through the years of the Java code existence and that method-level copying is much more prevalent than file-level copying, with only 35.4 Additionally, some of the discovered anomalies can be useful for future large-scale cloning research as they can be used for removing auto-generated code.

READ FULL TEXT

page 4

page 5

research
02/12/2020

A Study of Potential Code Borrowing and License Violations in Java Projects on GitHub

With an ever-increasing amount of open source software, the popularity o...
research
02/12/2020

Multi-Objective Optimization for Token-Based Clone Detection

Clone detection plays an important role in software engineering. Finding...
research
10/03/2021

Towards Informative Tagging of Code Fragments to Support the Investigation of Code Clones

Investigating the code fragments of code clones detected by code clone d...
research
11/26/2020

On the diversity and frequency of code related to mathematical formulas in real-world Java projects

In this paper, the term formula code refers to fragments of source code ...
research
10/11/2019

Design Smell Analysis for Developing and Established Open Source Java Software

Software design smells are design attributes which violate the fundament...
research
06/14/2021

CodeLabeller: A Web-based Code Annotation Tool for Java Design Patterns and Summaries

The appropriate use of design patterns in code is a vital measurement of...
research
03/11/2020

On Tracking Java Methods with Git Mechanisms

Method-level historical information is useful in research on mining soft...

Please sign up or login with your details

Forgot password? Click here to reset