Binary Code Similarity Detection

08/06/2023
by   Zian Liu, et al.
0

Binary code similarity detection is to detect the similarity of code at binary (assembly) level without source code. Existing works have their limitations when dealing with mutated binary code generated by different compiling options. In this paper, we propose a novel approach to addressing this problem. By inspecting the binary code, we found that generally, within a function, some instructions aim to calculate (prepare) values for other instructions. The latter instructions are defined by us as key instructions. Currently, we define four categories of key instructions: calling subfunctions, comparing instruction, returning instruction, and memory-store instruction. Thus if we symbolically execute similar binary codes, symbolic values at these key instructions are expected to be similar. As such, we implement a prototype tool, which has three steps. First, it symbolically executes binary code; Second, it extracts symbolic values at defined key instructions into a graph; Last, it compares the symbolic graph similarity. In our implementation, we also address some problems, including path explosion and loop handling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2018

A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis

Given a closed-source program, such as most of proprietary software and ...
research
08/13/2022

BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer

A recent trend in binary code analysis promotes the use of neural soluti...
research
10/29/2007

Code Similarity on High Level Programs

This paper presents a new approach for code similarity on High Level pro...
research
11/17/2017

Decanting the Contribution of Instruction Types and Loop Structures in the Reuse of Traces

Reuse has been proposed as a microarchitecture-level mechanism to reduce...
research
11/11/2020

Efficient global register allocation

In a compiler, an essential component is the register allocator. Two mai...
research
06/07/2019

Datalog Disassembly

Disassembly is fundamental to binary analysis and rewriting. We present ...
research
08/08/2018

Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

Binary code analysis allows analyzing binary code without having access ...

Please sign up or login with your details

Forgot password? Click here to reset