Interpretation-enabled Software Reuse Detection Based on a Multi-Level Birthmark Model

03/18/2021
by   Xi Xu, et al.
0

Software reuse, especially partial reuse, poses legal and security threats to software development. Since its source codes are usually unavailable, software reuse is hard to be detected with interpretation. On the other hand, current approaches suffer from poor detection accuracy and efficiency, far from satisfying practical demands. To tackle these problems, in this paper, we propose ISRD, an interpretation-enabled software reuse detection approach based on a multi-level birthmark model that contains function level, basic block level, and instruction level. To overcome obfuscation caused by cross-compilation, we represent function semantics with Minimum Branch Path (MBP) and perform normalization to extract core semantics of instructions. For efficiently detecting reused functions, a process for "intent search based on anchor recognition" is designed to speed up reuse detection. It uses strict instruction match and identical library call invocation check to find anchor functions (in short anchors) and then traverses neighbors of the anchors to explore potentially matched function pairs. Extensive experiments based on two real-world binary datasets reveal that ISRD is interpretable, effective, and efficient, which achieves 97.2% precision and 94.8% recall. Moreover, it is resilient to cross-compilation, outperforming state-of-the-art approaches.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

02/11/2021

CENTRIS: A Precise and Scalable Approach for Identifying Modified Open-Source Software Reuse

Open-source software (OSS) is widely reused as it provides convenience a...
11/17/2017

Decanting the Contribution of Instruction Types and Loop Structures in the Reuse of Traces

Reuse has been proposed as a microarchitecture-level mechanism to reduce...
12/23/2018

A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis

Given a closed-source program, such as most of proprietary software and ...
07/26/2019

The demise of the filesystem and multi level service architecture

Many astronomy data centres still work on filesystems. Industry has move...
11/05/2019

PAIRS: Control Flow Protection using Phantom Addressed Instructions

Code-reuse attacks continue to pose a significant threat to systems secu...
09/10/2020

Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms

All-pairs compute problems apply a user-defined function to each combina...
12/24/2021

One-to-One or One-to-many? What function inlining brings to binary2source similarity analysis

Binary2source code matching is critical to many code-reuse-related tasks...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.