ASTRO: An AST-Assisted Approach for Generalizable Neural Clone Detection

08/17/2022
by   Yifan Zhang, et al.
0

Neural clone detection has attracted the attention of software engineering researchers and practitioners. However, most neural clone detection methods do not generalize beyond the scope of clones that appear in the training dataset. This results in poor model performance, especially in terms of model recall. In this paper, we present an Abstract Syntax Tree (AST) assisted approach for generalizable neural clone detection, or ASTRO, a framework for finding clones in codebases reflecting industry practices. We present three main components: (1) an AST-inspired representation for source code that leverages program structure and semantics, (2) a global graph representation that captures the context of an AST among a corpus of programs, and (3) a graph embedding for programs that, in combination with extant large-scale language models, improves state-of-the-art code clone detection. Our experimental results show that ASTRO improves state-of-the-art neural clone detection approaches in both recall and F-1 scores.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2017

CodeSum: Translate Program Language to Natural Language

During software maintenance, programmers spend a lot of time on code com...
research
04/28/2020

SCELMo: Source Code Embeddings from Language Models

Continuous embeddings of tokens in computer programs have been used to s...
research
02/25/2022

Multi-View Graph Representation for Programming Language Processing: An Investigation into Algorithm Detection

Program representation, which aims at converting program source code int...
research
02/05/2023

VuLASTE: Long Sequence Model with Abstract Syntax Tree Embedding for vulnerability Detection

In this paper, we build a model named VuLASTE, which regards vulnerabili...
research
11/28/2021

Code Clone Detection based on Event Embedding and Event Dependency

The code clone detection method based on semantic similarity has importa...
research
06/05/2023

LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis

Decompilation aims to recover the source code form of a binary executabl...
research
07/19/2023

Code Detection for Hardware Acceleration Using Large Language Models

Large language models (LLMs) have been massively applied to many tasks, ...

Please sign up or login with your details

Forgot password? Click here to reset