Cross-Language Code Search using Static and Dynamic Analyses

06/16/2021
by   George Mathew, et al.
0

As code search permeates most activities in software development,code-to-code search has emerged to support using code as a query and retrieving similar code in the search results. Applications include duplicate code detection for refactoring, patch identification for program repair, and language translation. Existing code-to-code search tools rely on static similarity approaches such as the comparison of tokens and abstract syntax trees (AST) to approximate dynamic behavior, leading to low precision. Most tools do not support cross-language code-to-code search, and those that do, rely on machine learning models that require labeled training data. We present Code-to-Code Search Across Languages (COSAL), a cross-language technique that uses both static and dynamic analyses to identify similar code and does not require a machine learning model. Code snippets are ranked using non-dominated sorting based on code token similarity, structural similarity, and behavioral similarity. We empirically evaluate COSAL on two datasets of 43,146Java and Python files and 55,499 Java files and find that 1) code search based on non-dominated ranking of static and dynamic similarity measures is more effective compared to single or weighted measures; and 2) COSAL has better precision and recall compared to state-of-the-art within-language and cross-language code-to-code search tools. We explore the potential for using COSAL on large open-source repositories and discuss scalability to more languages and similarity metrics, providing a gateway for practical,multi-language code-to-code search.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2020

SLACC: Simion-based Language Agnostic Code Clones

Successful cross-language clone detection could enable researchers and d...
research
03/13/2018

Hierarchical Learning of Cross-Language Mappings through Distributed Vector Representations for Code

Translating a program written in one programming language to another can...
research
05/05/2023

On Contrastive Learning of Semantic Similarity forCode to Code Search

This paper introduces a novel code-to-code search technique that enhance...
research
05/10/2018

Ariadne: Analysis for Machine Learning Program

Machine learning has transformed domains like vision and translation, an...
research
08/15/2019

Towards usable automated detection of CPU architecture and endianness for arbitrary binary files and object code sequences

Static and dynamic binary analysis techniques are actively used to rever...
research
05/25/2023

Beryllium: Neural Search for Algorithm Implementations

In this paper, we explore the feasibility of finding algorithm implement...
research
05/09/2023

Learning to Parallelize with OpenMP by Augmented Heterogeneous AST Representation

Detecting parallelizable code regions is a challenging task, even for ex...

Please sign up or login with your details

Forgot password? Click here to reset