DeepAI AI Chat
Log In Sign Up

Scalable Program Clone Search Through Spectral Analysis

by   Tristan Benoit, et al.

We consider the problem of program clone search, i.e. given a target program and a repository of known programs (all in executable format), the goal is to find the program in the repository most similar to our target program - with potential applications in terms of reverse engineering, program clustering, malware lineage and software theft detection. Recent years have witnessed a blooming in code similarity techniques, yet most of them focus on function-level similarity while we are interested in program-level similarity. Consequently, these recent approaches are not directly suited to program clone search, being either too slow to handle large code bases, not precise enough, or not robust against slight variations introduced by compilation or source code versions. We introduce Programs Spectral Similarity (PSS), the first spectral analysis dedicated to program-level similarity. PSS reaches a sweet spot in terms of precision, speed and robustness. Especially, its one-time spectral feature extraction is tailored for large repositories of programs, making it a perfect fit for program clone search.


page 11

page 12

page 17

page 18


Academic Source Code Plagiarism Detection by Measuring Program Behavioural Similarity

Source code plagiarism is a long-standing issue in tertiary computer sci...

funcGNN: A Graph Neural Network Approach to Program Similarity

Program similarity is a fundamental concept, central to the solution of ...

Invariant Diffs

Software development is inherently incremental. Nowadays, many software ...

Searching a Database of Source Codes Using Contextualized Code Search

We assume a database containing a large set of program source codes and ...

Repo2Vec: A Comprehensive Embedding Approach for Determining Repository Similarity

How can we identify similar repositories and clusters among a large onli...

Code Similarity on High Level Programs

This paper presents a new approach for code similarity on High Level pro...

Control-flow Flattening Preserves the Constant-Time Policy (Extended Version)

Obfuscating compilers protect a software by obscuring its meaning and im...