Modeling Functional Similarity in Source Code with Graph-Based Siamese Networks

11/23/2020
by   Nikita Mehrotra, et al.
0

Code clones are duplicate code fragments that share (nearly) similar syntax or semantics. Code clone detection plays an important role in software maintenance, code refactoring, and reuse. A substantial amount of research has been conducted in the past to detect clones. A majority of these approaches use lexical and syntactic information to detect clones. However, only a few of them target semantic clones. Recently, motivated by the success of deep learning models in other fields, including natural language processing and computer vision, researchers have attempted to adopt deep learning techniques to detect code clones. These approaches use lexical information (tokens) and(or) syntactic structures like abstract syntax trees (ASTs) to detect code clones. However, they do not make sufficient use of the available structural and semantic information hence, limiting their capabilities. This paper addresses the problem of semantic code clone detection using program dependency graphs and geometric neural networks, leveraging the structured syntactic and semantic information. We have developed a prototype tool HOLMES, based on our novel approach, and empirically evaluated it on popular code clone benchmarks. Our results show that HOLMES performs considerably better than the other state-of-the-art tool, TBCCD. We also evaluated HOLMES on unseen projects and performed cross dataset experiments to assess the generalizability of HOLMES. Our results affirm that HOLMES outperforms TBCCD since most of the pairs that HOLMES detected were either undetected or suboptimally reported by TBCCD.

READ FULL TEXT

page 13

page 15

page 21

research
02/20/2020

Detecting Code Clones with Graph Neural Networkand Flow-Augmented Abstract Syntax Tree

Code clones are semantically similar code fragments pairs that are synta...
research
09/05/2020

TreeCaps: Tree-Based Capsule Networks for Source Code Processing

Recently program learning techniques have been proposed to process sourc...
research
10/09/2020

An ensemble learning approach for software semantic clone detection

Code clone is a serious problem in software and has the potential to sof...
research
09/24/2021

SEED: Semantic Graph based Deep detection for type-4 clone

Background: Type-4 clones refer to a pair of code snippets with similar ...
research
11/28/2021

Code Clone Detection based on Event Embedding and Event Dependency

The code clone detection method based on semantic similarity has importa...
research
02/15/2020

Recommendation of Move Method Refactoring Using Path-Based Representation of Code

Software refactoring plays an important role in increasing code quality....
research
11/14/2021

Code Representation Learning with Prüfer Sequences

An effective and efficient encoding of the source code of a computer pro...

Please sign up or login with your details

Forgot password? Click here to reset