Comparing One with Many – Solving Binary2source Function Matching Under Function Inlining

10/27/2022
by   Ang Jia, et al.
0

Binary2source function matching is a fundamental task for many security applications, including Software Component Analysis (SCA). The "1-to-1" mechanism has been applied in existing binary2source matching works, in which one binary function is matched against one source function. However, we discovered that such mapping could be "1-to-n" (one query binary function maps multiple source functions), due to the existence of function inlining. To help conduct binary2source function matching under function inlining, we propose a method named O2NMatcher to generate Source Function Sets (SFSs) as the matching target for binary functions with inlining. We first propose a model named ECOCCJ48 for inlined call site prediction. To train this model, we leverage the compilable OSS to generate a dataset with labeled call sites (inlined or not), extract several features from the call sites, and design a compiler-opt-based multi-label classifier by inspecting the inlining correlations between different compilations. Then, we use this model to predict the labels of call sites in the uncompilable OSS projects without compilation and obtain the labeled function call graphs of these projects. Next, we regard the construction of SFSs as a sub-tree generation problem and design root node selection and edge extension rules to construct SFSs automatically. Finally, these SFSs will be added to the corpus of source functions and compared with binary functions with inlining. We conduct several experiments to evaluate the effectiveness of O2NMatcher and results show our method increases the performance of existing works by 6

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/24/2021

One-to-One or One-to-many? What function inlining brings to binary2source similarity analysis

Binary2source code matching is critical to many code-reuse-related tasks...
research
04/30/2021

GM-MLIC: Graph Matching based Multi-Label Image Classification

Multi-Label Image Classification (MLIC) aims to predict a set of labels ...
research
12/16/2020

Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity

Detecting semantically similar functions – a crucial analysis capability...
research
07/28/2021

XFL: eXtreme Function Labeling

Reverse engineers would benefit from identifiers like function names, bu...
research
11/06/2019

Seq2Emo for Multi-label Emotion Classification Based on Latent Variable Chains Transformation

Emotion detection in text is an important task in NLP and is essential i...
research
01/18/2021

Modeling Heterogeneous Relations across Multiple Modes for Potential Crowd Flow Prediction

Potential crowd flow prediction for new planned transportation sites is ...

Please sign up or login with your details

Forgot password? Click here to reset