SLACC: Simion-based Language Agnostic Code Clones

02/07/2020
by   George Mathew, et al.
0

Successful cross-language clone detection could enable researchers and developers to create robust language migration tools, facilitate learning additional programming languages once one is mastered, and promote reuse of code snippets over a broader codebase. However, identifying cross-language clones presents special challenges to the clone detection problem. A lack of common underlying representation between arbitrary languages means detecting clones requires one of the following solutions: 1) a static analysis framework replicated across each targeted language with annotations matching language features across all languages, or 2) a dynamic analysis framework that detects clones based on runtime behavior. In this work, we demonstrate the feasibility of the latter solution, a dynamic analysis approach called SLACC for cross-language clone detection. Like prior clone detection techniques, we use input/output behavior to match clones, though we overcome limitations of prior work by amplifying the number of inputs and covering more data types; and as a result, achieve better clusters than prior attempts. Since clusters are generated based on input/output behavior, SLACC supports cross-language clone detection. As an added challenge, we target a static typed language, Java, and a dynamic typed language, Python. Compared to HitoshiIO, a recent clone detection tool for Java, SLACC retrieves 6 times as many clusters and has higher precision (86.7 This is the first work to perform clone detection for dynamic typed languages (precision = 87.3 that lack a common underlying representation (precision = 94.1 first step towards the larger goal of scalable language migration tools.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2021

Cross-Language Code Search using Static and Dynamic Analyses

As code search permeates most activities in software development,code-to...
research
12/30/2020

Analysis of MiniJava Programs via Translation to ML

MiniJava is a subset of the object-oriented programming language Java. S...
research
05/17/2023

Statically Detecting Buffer Overflow in Cross-language Android Applications Written in Java and C/C++

Many applications are being written in more than one language to take ad...
research
09/10/2021

Solver-based Gradual Type Migration

Gradually typed languages allow programmers to mix statically and dynami...
research
08/26/2023

GPTCloneBench: A comprehensive benchmark of semantic clones and cross-language clones using GPT-3 model and SemanticCloneBench

With the emergence of Machine Learning, there has been a surge in levera...
research
06/20/2023

Outside the Sandbox: A Study of Input/Output Methods in Java

Programming languages often demarcate the internal sandbox, consisting o...
research
05/06/2019

Heaps Don't Lie: Countering Unsoundness with Heap Snapshots

Static analyses aspire to explore all possible executions in order to ac...

Please sign up or login with your details

Forgot password? Click here to reset