One-to-One or One-to-many? What function inlining brings to binary2source similarity analysis

12/24/2021
by   Ang Jia, et al.
0

Binary2source code matching is critical to many code-reuse-related tasks, including code clone detection, software license violation detection, and reverse engineering assistance. Existing binary2source works always apply a "1-to-1" (one-to-one) mechanism, i.e., one function in a binary file is matched against one function in a source file. However, we assume that such mapping is usually a more complex problem of "1-to-n" (one-to-many) due to the existence of function inlining. To the best of our knowledge, few existing works have systematically studied the effect of function inlining on binary2source matching tasks. This paper will address this issue. To support our study, we first construct two datasets containing 61,179 binaries and 19,976,067 functions. We also propose an automated approach to label the dataset with line-level and function-level mapping. Based on our labeled dataset, we then investigate the extent of function inlining, the factors affecting function inlining, and the impact of function inlining on existing binary2source similarity methods. Finally, we discuss the interesting findings and give suggestions for designing more effective methodologies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2022

Comparing One with Many – Solving Binary2source Function Matching Under Function Inlining

Binary2source function matching is a fundamental task for many security ...
research
05/06/2023

LibAM: An Area Matching Framework for Detecting Third-party Libraries in Binaries

Third-party libraries (TPLs) are extensively utilized by developers to e...
research
04/21/2022

LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries

Third-party libraries (TPLs) are reused frequently in software applicati...
research
12/20/2022

CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context

While pre-trained language models (LM) for code have achieved great succ...
research
11/28/2020

Rewrite to Reinforce: Rewriting the Binary to Apply Countermeasures against Fault Injection

Fault injection attacks can cause errors in software for malicious purpo...
research
11/13/2018

SAFE: Self-Attentive Function Embeddings for Binary Similarity

The binary similarity problem consists in determining if two functions a...

Please sign up or login with your details

Forgot password? Click here to reset