Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

08/08/2018
by   Fei Zuo, et al.
0

Binary code analysis allows analyzing binary code without having access to the corresponding source code. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas and techniques from Natural Language Processing (NLP), a rich area focused on processing text of various natural languages. We notice that binary code analysis and NLP share a lot of analogical topics, such as semantics extraction, summarization, and classification. This work utilizes these ideas to address two important code similarity comparison problems. (I) Given a pair of basic blocks for different instruction set architectures (ISAs), determining whether their semantics is similar or not; and (II) given a piece of code of interest, determining if it is contained in another piece of assembly code for a different ISA. The solutions to these two problems have many applications, such as cross-architecture vulnerability discovery and code plagiarism detection. We implement a prototype system INNEREYE and perform a comprehensive evaluation. A comparison between our approach and existing approaches to Problem I shows that our system outperforms them in terms of accuracy, efficiency and scalability. And the case studies utilizing the system demonstrate that our solution to Problem II is effective. Moreover, this research showcases how to apply ideas and techniques from NLP to large-scale binary code analysis.

READ FULL TEXT
research
12/23/2018

A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis

Given a closed-source program, such as most of proprietary software and ...
research
11/02/2017

BinPro: A Tool for Binary Source Code Provenance

Enforcing open source licenses such as the GNU General Public License (G...
research
11/02/2021

iCallee: Recovering Call Graphs for Binaries

Recovering programs' call graphs is crucial for inter-procedural analysi...
research
08/06/2023

Binary Code Similarity Detection

Binary code similarity detection is to detect the similarity of code at ...
research
04/13/2022

A Natural Language Processing Approach for Instruction Set Architecture Identification

Binary analysis of software is a critical step in cyber forensics applic...
research
08/19/2018

BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis

Binary code clone analysis is an important technique which has a wide ra...
research
09/13/2019

That's C, baby. C!

Hardly a week goes by at BUGSENG without having to explain to someone th...

Please sign up or login with your details

Forgot password? Click here to reset