Multi-relational Instruction Association Graph for Cross-architecture Binary Similarity Comparison

06/24/2022
by   Qige Song, et al.
0

Cross-architecture binary similarity comparison is essential in many security applications. Recently, researchers have proposed learning-based approaches to improve comparison performance. They adopted a paradigm of instruction pre-training, individual binary encoding, and distance-based similarity comparison. However, instruction embeddings pre-trained on external code corpus are not universal in diverse real-world applications. And separately encoding cross-architecture binaries will accumulate the semantic gap of instruction sets, limiting the comparison accuracy. This paper proposes a novel cross-architecture binary similarity comparison approach with multi-relational instruction association graph. We associate mono-architecture instruction tokens with context relevance and cross-architecture tokens with potential semantic correlations from different perspectives. Then we exploit the relational graph convolutional network (R-GCN) to perform type-specific graph information propagation. Our approach can bridge the gap in the cross-architecture instruction representation spaces while avoiding the external pre-training workload. We conduct extensive experiments on basic block-level and function-level datasets to prove the superiority of our approach. Furthermore, evaluations on a large-scale real-world IoT malware reuse function collection show that our approach is valuable for identifying malware propagated on IoT devices of various architectures.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset