Scalable Program Clone Search Through Spectral Analysis

10/24/2022
by   Tristan Benoit, et al.
0

We consider the problem of program clone search, i.e. given a target program and a repository of known programs (all in executable format), the goal is to find the program in the repository most similar to our target program - with potential applications in terms of reverse engineering, program clustering, malware lineage and software theft detection. Recent years have witnessed a blooming in code similarity techniques, yet most of them focus on function-level similarity while we are interested in program-level similarity. Consequently, these recent approaches are not directly suited to program clone search, being either too slow to handle large code bases, not precise enough, or not robust against slight variations introduced by compilation or source code versions. We introduce Programs Spectral Similarity (PSS), the first spectral analysis dedicated to program-level similarity. PSS reaches a sweet spot in terms of precision, speed and robustness. Especially, its one-time spectral feature extraction is tailored for large repositories of programs, making it a perfect fit for program clone search.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset