DeepAI AI Chat
Log In Sign Up

Scalable Program Clone Search Through Spectral Analysis

10/24/2022
by   Tristan Benoit, et al.
CEA
loria.fr
0

We consider the problem of program clone search, i.e. given a target program and a repository of known programs (all in executable format), the goal is to find the program in the repository most similar to our target program - with potential applications in terms of reverse engineering, program clustering, malware lineage and software theft detection. Recent years have witnessed a blooming in code similarity techniques, yet most of them focus on function-level similarity while we are interested in program-level similarity. Consequently, these recent approaches are not directly suited to program clone search, being either too slow to handle large code bases, not precise enough, or not robust against slight variations introduced by compilation or source code versions. We introduce Programs Spectral Similarity (PSS), the first spectral analysis dedicated to program-level similarity. PSS reaches a sweet spot in terms of precision, speed and robustness. Especially, its one-time spectral feature extraction is tailored for large repositories of programs, making it a perfect fit for program clone search.

READ FULL TEXT

page 11

page 12

page 17

page 18

02/08/2021

Academic Source Code Plagiarism Detection by Measuring Program Behavioural Similarity

Source code plagiarism is a long-standing issue in tertiary computer sci...
07/26/2020

funcGNN: A Graph Neural Network Approach to Program Similarity

Program similarity is a fundamental concept, central to the solution of ...
11/18/2019

Invariant Diffs

Software development is inherently incremental. Nowadays, many software ...
01/10/2020

Searching a Database of Source Codes Using Contextualized Code Search

We assume a database containing a large set of program source codes and ...
07/11/2021

Repo2Vec: A Comprehensive Embedding Approach for Determining Repository Similarity

How can we identify similar repositories and clusters among a large onli...
10/29/2007

Code Similarity on High Level Programs

This paper presents a new approach for code similarity on High Level pro...
03/12/2020

Control-flow Flattening Preserves the Constant-Time Policy (Extended Version)

Obfuscating compilers protect a software by obscuring its meaning and im...