FastBCSD: Fast and Efficient Neural Network for Binary Code Similarity Detection

06/25/2023
by   Chensen Huang, et al.
0

Binary code similarity detection (BCSD) has various applications, including but not limited to vulnerability detection, plagiarism detection, and malware detection. Previous research efforts mainly focus on transforming binary code to assembly code strings using reverse compilation and then using pre-trained deep learning models with large parameters to obtain feature representation vector of binary code. While these models have proven to be effective in representing binary code, their large parameter size leads to considerable computational expenses during both training and inference. In this paper, we present a lightweight neural network, called FastBCSD, that employs a dynamic instruction vector encoding method and takes only assembly code as input feature to achieve comparable accuracy to the pre-training models while reducing the computational resources and time cost. On the BinaryCorp dataset, our method achieves a similar average MRR score to the state-of-the-art pre-training-based method (jTrans), while on the BinaryCorp 3M dataset, our method even outperforms the latest technology by 0.01. Notably, FastBCSD has a much smaller parameter size (13.4M) compared to jTrans (87.88M), and its latency time is 1/5 of jTrans on NVIDIA GTX 1080Ti.

READ FULL TEXT

page 3

page 4

research
08/13/2022

BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer

A recent trend in binary code analysis promotes the use of neural soluti...
research
05/25/2022

jTrans: Jump-Aware Transformer for Binary Code Similarity

Binary code similarity detection (BCSD) has important applications in va...
research
08/22/2017

Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection

The problem of cross-platform binary code similarity detection aims at d...
research
01/21/2021

PalmTree: Learning an Assembly Language Model for Instruction Embedding

Deep learning has demonstrated its strengths in numerous binary analysis...
research
06/24/2022

Multi-relational Instruction Association Graph for Cross-architecture Binary Similarity Comparison

Cross-architecture binary similarity comparison is essential in many sec...
research
12/13/2021

ROMEO: Exploring Juliet through the Lens of Assembly Language

Automatic vulnerability detection on C/C++ source code has benefitted fr...
research
06/28/2019

A Neural-based Program Decompiler

Reverse engineering of binary executables is a critical problem in the c...

Please sign up or login with your details

Forgot password? Click here to reset