An ensemble learning approach for software semantic clone detection

10/09/2020
by   Min Fu, et al.
0

Code clone is a serious problem in software and has the potential to software defects, maintenance overhead, and licensing violations. Therefore, clone detection is important for reducing maintenance effort and improving code quality during software evolution. A variety of clone detection techniques have been proposed to identify similar code in software. However, few of them can efficiently detect semantic clones (functionally similar code without any syntactic resemblance). Recently, several deep learning based clone detectors are proposed to detect semantic clones. However, these approaches have high cost in data labelling and model training. In this paper, we propose a novel approach that leverages word embedding and ensemble learning techniques to detect semantic clones. Our evaluation on a commonly used clone benchmark, BigCloneBench, shows that our approach significantly improves the precision and recall of semantic clone detection, in comparison to a token-based clone detector, SourcererCC, and another deep learning based clone detector, CDLH.

READ FULL TEXT
research
11/23/2020

Modeling Functional Similarity in Source Code with Graph-Based Siamese Networks

Code clones are duplicate code fragments that share (nearly) similar syn...
research
05/03/2020

A Machine Learning Based Framework for Code Clone Validation

A code clone is a pair of code fragments, within or between software sys...
research
06/15/2018

Oreo: Detection of Clones in the Twilight Zone

Source code clones are categorized into four types of increasing difficu...
research
09/10/2019

LVMapper: A Large-variance Clone Detector Using Sequencing Alignment Approach

To detect large-variance code clones (i.e. clones with relatively more d...
research
11/18/2019

Fine-Grained Static Detection of Obfuscation Transforms Using Ensemble-Learning and Semantic Reasoning

The ability to efficiently detect the software protections used is at a ...
research
11/21/2021

Challenging Machine Learning-based Clone Detectors via Semantic-preserving Code Transformations

Software clone detection identifies similar code snippets. It has been a...
research
12/23/2022

RMove: Recommending Move Method Refactoring Opportunities using Structural and Semantic Representations of Code

Incorrect placement of methods within classes is a typical code smell ca...

Please sign up or login with your details

Forgot password? Click here to reset