BinPro: A Tool for Binary Source Code Provenance

11/02/2017
by   Dhaval Miyani, et al.
0

Enforcing open source licenses such as the GNU General Public License (GPL), analyzing a binary for possible vulnerabilities, and code maintenance are all situations where it is useful to be able to determine the source code provenance of a binary. While previous work has either focused on computing binary-to-binary similarity or source-to-source similarity, BinPro is the first work we are aware of to tackle the problem of source-to-binary similarity. BinPro can match binaries with their source code even without knowing which compiler was used to produce the binary, or what optimization level was used with the compiler. To do this, BinPro utilizes machine learning to compute optimal code features for determining binary-to-source similarity and a static analysis pipeline to extract and compute similarity based on those features. Our experiments show that on average BinPro computes a similarity of 81 matching binaries and source code of the same applications, and an average similarity of 25 applications. This shows that BinPro's similarity score is useful for determining if a binary was derived from a particular source code.

READ FULL TEXT

page 8

page 10

research
04/10/2023

GraphBinMatch: Graph-based Similarity Learning for Cross-Language Binary and Source Code Matching

Matching binary to source code and vice versa has various applications i...
research
09/25/2019

A Survey of Binary Code Similarity

Binary code similarity approaches compare two or more pieces of binary c...
research
11/21/2020

Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned

Binary code similarity analysis (BCSA) is widely used for diverse securi...
research
05/06/2023

Revisiting Lightweight Compiler Provenance Recovery on ARM Binaries

A binary's behavior is greatly influenced by how the compiler builds its...
research
10/30/2021

Trojan Source: Invisible Vulnerabilities

We present a new type of attack in which source code is maliciously enco...
research
07/16/2012

MARFCAT: Transitioning to Binary and Larger Data Sets of SATE IV

We present a second iteration of a machine learning approach to static c...
research
08/08/2018

Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

Binary code analysis allows analyzing binary code without having access ...

Please sign up or login with your details

Forgot password? Click here to reset