Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity

12/16/2020
by   Kexin Pei, et al.
0

Detecting semantically similar functions – a crucial analysis capability with broad real-world security usages including vulnerability detection, malware lineage, and forensics – requires understanding function behaviors and intentions. This task is challenging as semantically similar functions can be implemented differently, run on different architectures, and compiled with diverse compiler optimizations or obfuscations. Most existing approaches match functions based on syntactic features without understanding the functions' execution semantics. We present Trex, a transfer-learning-based framework, to automate learning execution semantics explicitly from functions' micro-traces and transfer the learned knowledge to match semantically similar functions. Our key insight is that these traces can be used to teach an ML model the execution semantics of different sequences of instructions. We thus train the model to learn execution semantics from the functions' micro-traces, without any manual labeling effort. We then develop a novel neural architecture to learn execution semantics from micro-traces, and we finetune the pretrained model to match semantically similar functions. We evaluate Trex on 1,472,066 function binaries from 13 popular software projects. These functions are from different architectures and compiled with various optimizations and obfuscations. Trex outperforms the state-of-the-art systems by 7.8 obfuscation function matching, respectively. Ablation studies show that the pretraining significantly boosts the function matching performance, underscoring the importance of learning execution semantics.

READ FULL TEXT

page 1

page 12

page 19

research
04/11/2023

Execution traces and reduction sequences

In this note, we defend that the notion of algorithm as a set of executi...
research
10/04/2022

NeuDep: Neural Binary Memory Dependence Analysis

Determining whether multiple instructions can access the same memory loc...
research
06/01/2022

Inter-BIN: Interaction-based Cross-architecture IoT Binary Similarity Comparison

The big wave of Internet of Things (IoT) malware reflects the fragility ...
research
10/27/2022

Comparing One with Many – Solving Binary2source Function Matching Under Function Inlining

Binary2source function matching is a fundamental task for many security ...
research
07/01/2019

A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison

Binary code similarity comparison is a methodology for identifying simil...
research
10/02/2020

XDA: Accurate, Robust Disassembly with Transfer Learning

Accurate and robust disassembly of stripped binaries is challenging. The...
research
01/13/2021

Behavioral Model Inference of Black-box Software using Deep Neural Networks

Many software engineering tasks, such as testing, and anomaly detection ...

Please sign up or login with your details

Forgot password? Click here to reset