BCD: A Cross-Architecture Binary Comparison Database Experiment Using Locality Sensitive Hashing Algorithms

12/10/2021
by   Haoxi Tan, et al.
0

Given a binary executable without source code, it is difficult to determine what each function in the binary does by reverse engineering it, and even harder without prior experience and context. In this paper, we performed a comparison of different hashing functions' effectiveness at detecting similar lifted snippets of LLVM IR code, and present the design and implementation of a framework for cross-architecture binary code similarity search database using MinHash as the chosen hashing algorithm, over SimHash, SSDEEP and TLSH. The motivation is to help reverse engineers to quickly gain context of functions in an unknown binary by comparing it against a database of known functions. The code for this project is open source and can be found at https://github.com/h4sh5/bcddb

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2023

GraphBinMatch: Graph-based Similarity Learning for Cross-Language Binary and Source Code Matching

Matching binary to source code and vice versa has various applications i...
research
11/25/2019

Online Hashing with Efficient Updating of Binary Codes

Online hashing methods are efficient in learning the hash functions from...
research
04/30/2019

Effective and Efficient Indexing in Cross-Modal Hashing-Based Datasets

To overcome the barrier of storage and computation, the hashing techniqu...
research
11/10/2022

Semantic Learning and Emulation Based Cross-platform Binary Vulnerability Seeker

Clone detection is widely exploited for software vulnerability search. T...
research
01/04/2023

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries

Reverse engineering binaries is required to understand and analyse progr...
research
04/21/2022

LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries

Third-party libraries (TPLs) are reused frequently in software applicati...
research
08/15/2019

Towards usable automated detection of CPU architecture and endianness for arbitrary binary files and object code sequences

Static and dynamic binary analysis techniques are actively used to rever...

Please sign up or login with your details

Forgot password? Click here to reset