Scalable Program Clone Search Through Spectral Analysis

10/24/2022
by   Tristan Benoit, et al.
0

We consider the problem of program clone search, i.e. given a target program and a repository of known programs (all in executable format), the goal is to find the program in the repository most similar to our target program - with potential applications in terms of reverse engineering, program clustering, malware lineage and software theft detection. Recent years have witnessed a blooming in code similarity techniques, yet most of them focus on function-level similarity while we are interested in program-level similarity. Consequently, these recent approaches are not directly suited to program clone search, being either too slow to handle large code bases, not precise enough, or not robust against slight variations introduced by compilation or source code versions. We introduce Programs Spectral Similarity (PSS), the first spectral analysis dedicated to program-level similarity. PSS reaches a sweet spot in terms of precision, speed and robustness. Especially, its one-time spectral feature extraction is tailored for large repositories of programs, making it a perfect fit for program clone search.

READ FULL TEXT

page 11

page 12

page 17

page 18

research
02/08/2021

Academic Source Code Plagiarism Detection by Measuring Program Behavioural Similarity

Source code plagiarism is a long-standing issue in tertiary computer sci...
research
07/26/2020

funcGNN: A Graph Neural Network Approach to Program Similarity

Program similarity is a fundamental concept, central to the solution of ...
research
11/18/2019

Invariant Diffs

Software development is inherently incremental. Nowadays, many software ...
research
01/10/2020

Searching a Database of Source Codes Using Contextualized Code Search

We assume a database containing a large set of program source codes and ...
research
07/11/2021

Repo2Vec: A Comprehensive Embedding Approach for Determining Repository Similarity

How can we identify similar repositories and clusters among a large onli...
research
04/02/2022

Differential Cost Analysis with Simultaneous Potentials and Anti-potentials

We present a novel approach to differential cost analysis that, given a ...
research
06/12/2019

SPoC: Search-based Pseudocode to Code

We consider the task of mapping pseudocode to long programs that are fun...

Please sign up or login with your details

Forgot password? Click here to reset