Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection

08/13/2021
by   Shouguo Yang, et al.
0

Binary code similarity detection is a fundamental technique for many security applications such as vulnerability search, patch analysis, and malware detection. There is an increasing need to detect similar code for vulnerability search across architectures with the increase of critical vulnerabilities in IoT devices. The variety of IoT hardware architectures and software platforms requires to capture semantic equivalence of code fragments in the similarity detection. However, existing approaches are insufficient in capturing the semantic similarity. We notice that the abstract syntax tree (AST) of a function contains rich semantic information. Inspired by successful applications of natural language processing technologies in sentence semantic understanding, we propose a deep learning-based AST-encoding method, named ASTERIA, to measure the semantic equivalence of functions in different platforms. Our method leverages the Tree-LSTM network to learn the semantic representation of a function from its AST. Then the similarity detection can be conducted efficiently and accurately by measuring the similarity between two representation vectors. We have implemented an open-source prototype of ASTERIA. The Tree-LSTM model is trained on a dataset with 1,022,616 function pairs and evaluated on a dataset with 95,078 function pairs. Evaluation results show that our method outperforms the AST-based tool Diaphora and the-state-of-art method Gemini by large margins with respect to the binary similarity detection. And our method is several orders of magnitude faster than Diaphora and Gemini for the similarity calculation. In the application of vulnerability search, our tool successfully identified 75 vulnerable functions in 5,979 IoT firmware images.

READ FULL TEXT

page 1

page 8

research
08/22/2017

Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection

The problem of cross-platform binary code similarity detection aims at d...
research
01/02/2023

Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge

The widespread code reuse allows vulnerabilities to proliferate among a ...
research
11/10/2022

Semantic Learning and Emulation Based Cross-platform Binary Vulnerability Seeker

Clone detection is widely exploited for software vulnerability search. T...
research
07/01/2019

A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison

Binary code similarity comparison is a methodology for identifying simil...
research
06/24/2022

Multi-relational Instruction Association Graph for Cross-architecture Binary Similarity Comparison

Cross-architecture binary similarity comparison is essential in many sec...
research
08/22/2023

Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation

Though many deep learning (DL)-based vulnerability detection approaches ...
research
10/23/2018

Unsupervised Features Extraction for Binary Similarity Using Graph Embedding Neural Networks

In this paper we consider the binary similarity problem that consists in...

Please sign up or login with your details

Forgot password? Click here to reset