Interpretation-enabled Software Reuse Detection Based on a Multi-Level Birthmark Model

by   Xi Xu, et al.

Software reuse, especially partial reuse, poses legal and security threats to software development. Since its source codes are usually unavailable, software reuse is hard to be detected with interpretation. On the other hand, current approaches suffer from poor detection accuracy and efficiency, far from satisfying practical demands. To tackle these problems, in this paper, we propose ISRD, an interpretation-enabled software reuse detection approach based on a multi-level birthmark model that contains function level, basic block level, and instruction level. To overcome obfuscation caused by cross-compilation, we represent function semantics with Minimum Branch Path (MBP) and perform normalization to extract core semantics of instructions. For efficiently detecting reused functions, a process for "intent search based on anchor recognition" is designed to speed up reuse detection. It uses strict instruction match and identical library call invocation check to find anchor functions (in short anchors) and then traverses neighbors of the anchors to explore potentially matched function pairs. Extensive experiments based on two real-world binary datasets reveal that ISRD is interpretable, effective, and efficient, which achieves 97.2% precision and 94.8% recall. Moreover, it is resilient to cross-compilation, outperforming state-of-the-art approaches.



There are no comments yet.


page 1


CENTRIS: A Precise and Scalable Approach for Identifying Modified Open-Source Software Reuse

Open-source software (OSS) is widely reused as it provides convenience a...

Decanting the Contribution of Instruction Types and Loop Structures in the Reuse of Traces

Reuse has been proposed as a microarchitecture-level mechanism to reduce...

A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis

Given a closed-source program, such as most of proprietary software and ...

The demise of the filesystem and multi level service architecture

Many astronomy data centres still work on filesystems. Industry has move...

PAIRS: Control Flow Protection using Phantom Addressed Instructions

Code-reuse attacks continue to pose a significant threat to systems secu...

Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms

All-pairs compute problems apply a user-defined function to each combina...

One-to-One or One-to-many? What function inlining brings to binary2source similarity analysis

Binary2source code matching is critical to many code-reuse-related tasks...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.