Interpretation-enabled Software Reuse Detection Based on a Multi-Level Birthmark Model

03/18/2021
by   Xi Xu, et al.
0

Software reuse, especially partial reuse, poses legal and security threats to software development. Since its source codes are usually unavailable, software reuse is hard to be detected with interpretation. On the other hand, current approaches suffer from poor detection accuracy and efficiency, far from satisfying practical demands. To tackle these problems, in this paper, we propose ISRD, an interpretation-enabled software reuse detection approach based on a multi-level birthmark model that contains function level, basic block level, and instruction level. To overcome obfuscation caused by cross-compilation, we represent function semantics with Minimum Branch Path (MBP) and perform normalization to extract core semantics of instructions. For efficiently detecting reused functions, a process for "intent search based on anchor recognition" is designed to speed up reuse detection. It uses strict instruction match and identical library call invocation check to find anchor functions (in short anchors) and then traverses neighbors of the anchors to explore potentially matched function pairs. Extensive experiments based on two real-world binary datasets reveal that ISRD is interpretable, effective, and efficient, which achieves 97.2% precision and 94.8% recall. Moreover, it is resilient to cross-compilation, outperforming state-of-the-art approaches.

READ FULL TEXT
research
02/11/2021

CENTRIS: A Precise and Scalable Approach for Identifying Modified Open-Source Software Reuse

Open-source software (OSS) is widely reused as it provides convenience a...
research
11/17/2017

Decanting the Contribution of Instruction Types and Loop Structures in the Reuse of Traces

Reuse has been proposed as a microarchitecture-level mechanism to reduce...
research
06/01/2022

Inter-BIN: Interaction-based Cross-architecture IoT Binary Similarity Comparison

The big wave of Internet of Things (IoT) malware reflects the fragility ...
research
06/24/2022

Multi-relational Instruction Association Graph for Cross-architecture Binary Similarity Comparison

Cross-architecture binary similarity comparison is essential in many sec...
research
04/21/2022

LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries

Third-party libraries (TPLs) are reused frequently in software applicati...
research
05/06/2023

LibAM: An Area Matching Framework for Detecting Third-party Libraries in Binaries

Third-party libraries (TPLs) are extensively utilized by developers to e...
research
09/10/2020

Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms

All-pairs compute problems apply a user-defined function to each combina...

Please sign up or login with your details

Forgot password? Click here to reset