Multi-relational Instruction Association Graph for Cross-architecture Binary Similarity Comparison

06/24/2022
by   Qige Song, et al.
0

Cross-architecture binary similarity comparison is essential in many security applications. Recently, researchers have proposed learning-based approaches to improve comparison performance. They adopted a paradigm of instruction pre-training, individual binary encoding, and distance-based similarity comparison. However, instruction embeddings pre-trained on external code corpus are not universal in diverse real-world applications. And separately encoding cross-architecture binaries will accumulate the semantic gap of instruction sets, limiting the comparison accuracy. This paper proposes a novel cross-architecture binary similarity comparison approach with multi-relational instruction association graph. We associate mono-architecture instruction tokens with context relevance and cross-architecture tokens with potential semantic correlations from different perspectives. Then we exploit the relational graph convolutional network (R-GCN) to perform type-specific graph information propagation. Our approach can bridge the gap in the cross-architecture instruction representation spaces while avoiding the external pre-training workload. We conduct extensive experiments on basic block-level and function-level datasets to prove the superiority of our approach. Furthermore, evaluations on a large-scale real-world IoT malware reuse function collection show that our approach is valuable for identifying malware propagated on IoT devices of various architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2022

Inter-BIN: Interaction-based Cross-architecture IoT Binary Similarity Comparison

The big wave of Internet of Things (IoT) malware reflects the fragility ...
research
12/23/2018

A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis

Given a closed-source program, such as most of proprietary software and ...
research
06/25/2023

FastBCSD: Fast and Efficient Neural Network for Binary Code Similarity Detection

Binary code similarity detection (BCSD) has various applications, includ...
research
08/13/2021

Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection

Binary code similarity detection is a fundamental technique for many sec...
research
03/18/2021

Interpretation-enabled Software Reuse Detection Based on a Multi-Level Birthmark Model

Software reuse, especially partial reuse, poses legal and security threa...
research
03/09/2022

BinMLM: Binary Authorship Verification with Flow-aware Mixture-of-Shared Language Model

Binary authorship analysis is a significant problem in many software eng...
research
01/21/2021

PalmTree: Learning an Assembly Language Model for Instruction Embedding

Deep learning has demonstrated its strengths in numerous binary analysis...

Please sign up or login with your details

Forgot password? Click here to reset