A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison

07/01/2019
by   Yikun Hu, et al.
0

Binary code similarity comparison is a methodology for identifying similar or identical code fragments in binary programs. It is indispensable in fields of software engineering and security, which has many important applications (e.g., plagiarism detection, bug detection). With the widespread of smart and IoT (Internet of Things) devices, an increasing number of programs are ported to multiple architectures (e.g. ARM, MIPS). It becomes necessary to detect similar binary code across architectures as well. The main challenge of this topic lies in the semantics-equivalent code transformation resulting from different compilation settings, code obfuscation, and varied instruction set architectures. Another challenge is the trade-off between comparison accuracy and coverage. Unfortunately, existing methods still heavily rely on semantics-less code features which are susceptible to the code transformation. Additionally, they perform the comparison merely either in a static or in a dynamic manner, which cannot achieve high accuracy and coverage simultaneously. In this paper, we propose a semantics-based hybrid method to compare binary function similarity. We execute the reference function with test cases, then emulate the execution of every target function with the runtime information migrated from the reference function. Semantic signatures are extracted during the execution as well as the emulation. Lastly, similarity scores are calculated from the signatures to measure the likeness of functions. We have implemented the method in a prototype system designated as BinMatch and evaluate it with nine real-word projects compiled with different compilation settings, on variant architectures, and with commonly-used obfuscation methods, totally performing over 100 million pairs of function comparison.

READ FULL TEXT
research
08/19/2018

BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis

Binary code clone analysis is an important technique which has a wide ra...
research
06/01/2022

Inter-BIN: Interaction-based Cross-architecture IoT Binary Similarity Comparison

The big wave of Internet of Things (IoT) malware reflects the fragility ...
research
08/13/2021

Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection

Binary code similarity detection is a fundamental technique for many sec...
research
08/29/2023

PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model

Binary similarity analysis determines if two binary executables are from...
research
04/07/2021

Towards Optimal Use of Exception Handling Information for Function Detection

Function entry detection is critical for security of binary code. Conven...
research
12/16/2020

Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity

Detecting semantically similar functions – a crucial analysis capability...
research
01/02/2023

Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge

The widespread code reuse allows vulnerabilities to proliferate among a ...

Please sign up or login with your details

Forgot password? Click here to reset