LVMapper: A Large-variance Clone Detector Using Sequencing Alignment Approach

09/10/2019
by   Ming Wu, et al.
0

To detect large-variance code clones (i.e. clones with relatively more differences) in large-scale code repositories is difficult because most current tools can only detect almost identical or very similar clones. It will make promotion and changes to some software applications such as bug detection, code completion, software analysis, etc. Recently, CCAligner made an attempt to detect clones with relatively concentrated modifications called large-gap clones. Our contribution is to develop a novel and effective detection approach of large-variance clones to more general cases for not only the concentrated code modifications but also the scattered code modifications. A detector named LVMapper is proposed, borrowing and changing the approach of sequencing alignment in bioinformatics which can find two similar sequences with more differences. The ability of LVMapper was tested on both self-synthetic datasets and real cases, and the results show substantial improvement in detecting large-variance clones compared with other state-of-the-art tools including CCAligner. Furthermore, our new tool also presents good recall and precision for general Type-1, Type-2 and Type-3 clones on the widely used benchmarking dataset, BigCloneBench.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2020

An ensemble learning approach for software semantic clone detection

Code clone is a serious problem in software and has the potential to sof...
research
02/08/2021

Evaluating the robustness of source code plagiarism detection tools to pervasive plagiarism-hiding modifications

Source code plagiarism is a common occurrence in undergraduate computer ...
research
04/08/2022

Clone Detection on Large Scala Codebases

Code clones are identical or similar code segments. The wide existence o...
research
02/08/2019

Code Smell Detection using Multilabel Classification Approach

Code smells are characteristics of the software that indicates a code or...
research
06/15/2018

Oreo: Detection of Clones in the Twilight Zone

Source code clones are categorized into four types of increasing difficu...
research
07/12/2020

Industrial Experience of Finding Cryptographic Vulnerabilities in Large-scale Codebases

Enterprise environments need to screen large-scale (millions of lines of...
research
06/21/2021

An empirical evaluation of the usefulness of Tree Kernels for Commit-time Defect Detection in large software systems

Defect detection at commit check-in time prevents the introduction of de...

Please sign up or login with your details

Forgot password? Click here to reset