PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model

08/29/2023
by   Xiangzhe Xu, et al.
0

Binary similarity analysis determines if two binary executables are from the same source program. Existing techniques leverage static and dynamic program features and may utilize advanced Deep Learning techniques. Although they have demonstrated great potential, the community believes that a more effective representation of program semantics can further improve similarity analysis. In this paper, we propose a new method to represent binary program semantics. It is based on a novel probabilistic execution engine that can effectively sample the input space and the program path space of subject binaries. More importantly, it ensures that the collected samples are comparable across binaries, addressing the substantial variations of input specifications. Our evaluation on 9 real-world projects with 35k functions, and comparison with 6 state-of-the-art techniques show that PEM can achieve a precision of 96 common settings, outperforming the baselines by 10-20

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2018

BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis

Binary code clone analysis is an important technique which has a wide ra...
research
05/13/2019

Learning Scalable and Precise Representation of Program Semantics

Neural program embedding has shown potential in aiding the analysis of l...
research
07/01/2019

A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison

Binary code similarity comparison is a methodology for identifying simil...
research
11/10/2021

Symbolic Security Predicates: Hunt Program Weaknesses

Dynamic symbolic execution (DSE) is a powerful method for path explorati...
research
06/27/2023

Automated Static Warning Identification via Path-based Semantic Representation

Despite their ability to aid developers in detecting potential defects e...
research
04/10/2018

Semantic embeddings for program behavior patterns

In this paper, we propose a new feature extraction technique for program...
research
07/11/2019

Provenance for Large-scale Datalog

Logic programming languages such as Datalog have become popular as Domai...

Please sign up or login with your details

Forgot password? Click here to reset