Oreo: Detection of Clones in the Twilight Zone

06/15/2018
by   Vaibhav Saini, et al.
0

Source code clones are categorized into four types of increasing difficulty of detection, ranging from purely textual (Type-1) to purely semantic (Type-4). Most clone detectors reported in the literature work well up to Type-3, which accounts for syntactic differences. In between Type-3 and Type-4, however, there lies a spectrum of clones that, although still exhibiting some syntactic similarities, are extremely hard to detect -- the Twilight Zone. Most clone detectors reported in the literature fail to operate in this zone. We present Oreo, a novel approach to source code clone detection that not only detects Type-1 to Type-3 clones accurately, but is also capable of detecting harder-to-detect clones in the Twilight Zone. Oreo is built using a combination of machine learning, information retrieval, and software metrics. We evaluate the recall of Oreo on BigCloneBench, and perform manual evaluation for precision. Oreo has both high recall and precision. More importantly, it pushes the boundary in detection of clones with moderate to weak syntactic similarity in a scalable manner.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2020

An ensemble learning approach for software semantic clone detection

Code clone is a serious problem in software and has the potential to sof...
research
01/21/2020

Towards Semantic Clone Detection via Probabilistic Software Modeling

Semantic clones are program components with similar behavior, but differ...
research
02/06/2019

A Comparison of Information Retrieval Techniques for Detecting Source Code Plagiarism

Plagiarism is a commonly encountered problem in the academia. While ther...
research
09/10/2019

LVMapper: A Large-variance Clone Detector Using Sequencing Alignment Approach

To detect large-variance code clones (i.e. clones with relatively more d...
research
02/08/2019

Code Smell Detection using Multilabel Classification Approach

Code smells are characteristics of the software that indicates a code or...
research
07/18/2019

Logical Segmentation of Source Code

Many software analysis methods have come to rely on machine learning app...
research
10/28/2018

Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection

To solve time inefficiency issue, only potential pairs are compared in s...

Please sign up or login with your details

Forgot password? Click here to reset