Multi-Objective Optimization for Token-Based Clone Detection

02/12/2020
by   Yaroslav Golubev, et al.
0

Clone detection plays an important role in software engineering. Finding clones within a single project introduces possible refactoring opportunities, and between different projects it could be used for detecting code reuse or possible licensing violations. In this paper, we propose a modification to token-based clone detection that allows detecting more clone pairs of greater diversity without losing precision by implementing multi-parameter search, i.e. conducting the search several times, aimed at different groups of clones. To combat the increase in operation time that this approach brings about, we propose an optimization that allows to significantly decrease the overlap in detected clones between the searches. The method is applicable to any clone detector tool that uses tokens and similarity measures, highly configurable and can also be run in parallel, making it well-suited for large-scale analysis research. We describe the method and its optimization and evaluate them with two different popular clone detection tools on two datasets of different sizes, consisting of four prominent open source projects. The implementation of the technique allows to increase the number of detected clones by 41.9-52.7 and consider further research possibilities.

READ FULL TEXT

page 1

page 2

page 3

page 6

page 7

page 8

page 9

page 10

research
07/09/2021

On the Nature of Code Cloning in Open-Source Java Projects

Code cloning plays a very important role in open-source software enginee...
research
03/18/2021

Please Don't Go – Increasing Women's Participation in Open Source Software

Women represent less than 24 from various types of prejudice and biases....
research
08/29/2018

Use of Source Code Similarity Metrics in Software Defect Prediction

In recent years, defect prediction has received a great deal of attentio...
research
02/15/2020

Recommendation of Move Method Refactoring Using Path-Based Representation of Code

Software refactoring plays an important role in increasing code quality....
research
02/08/2022

The Weights can be Harmful: Pareto Search versus Weighted Search in Multi-Objective Search-Based Software Engineering

In presence of multiple objectives to be optimized in Search-Based Softw...
research
01/20/2022

Evaluating the Performance of Clone Detection Tools in Detecting Cloned Co-change Candidates

Co-change candidates are the group of code fragments that require a chan...
research
11/13/2017

Detecting Near Duplicates in Software Documentation

Contemporary software documentation is as complicated as the software it...

Please sign up or login with your details

Forgot password? Click here to reset