Fun2Vec:a Contrastive Learning Framework of Function-level Representation for Binary

09/06/2022
by   Sun RuiJin, et al.
5

Function-level binary code similarity detection is essential in the field of cyberspace security. It helps us find bugs and detect patent infringements in released software and plays a key role in the prevention of supply chain attacks. A practical embedding learning framework relies on the robustness of vector representation system of assembly code and the accuracy of the annotation of function pairs. Supervised learning based methods are traditionally emploied. But annotating different function pairs with accurate labels is very difficult. These supervised learning methods are easily overtrained and suffer from vector robustness issues. To mitigate these problems, we propose Fun2Vec: a contrastive learning framework of function-level representation for binary. We take an unsupervised learning approach and formulate the binary code similarity detection as instance discrimination. Fun2Vec works directly on disassembled binary functions, and could be implemented with any encoder. It does not require manual labeled similar or dissimilar information. We use the compiler optimization options and code obfuscation techniques to generate augmented data. Our experimental results demonstrate that our method surpasses the state-of-the-art in accuracy and have great advantage in few-shot settings.

READ FULL TEXT

page 4

page 22

page 23

page 24

research
05/18/2023

GraphMoco:a Graph Momentum Contrast Model that Using Multimodel Structure Information for Large-scale Binary Function Representation Learning

The ability to compute similarity scores of binary code at the function ...
research
03/15/2022

InfoDCL: A Distantly Supervised Contrastive Learning Framework for Social Meaning

Existing supervised contrastive learning frameworks suffer from two majo...
research
10/10/2021

Weakly Supervised Contrastive Learning

Unsupervised visual representation learning has gained much attention fr...
research
06/13/2020

Adversarial Self-Supervised Contrastive Learning

Existing adversarial learning approaches mostly use class labels to gene...
research
07/13/2022

Unsupervised Visual Representation Learning by Synchronous Momentum Grouping

In this paper, we propose a genuine group-level contrastive visual repre...
research
06/07/2019

Software Ethology: An Accurate and Resilient Semantic Binary Analysis Framework

When reverse engineering a binary, the analyst must first understand the...
research
11/13/2018

SAFE: Self-Attentive Function Embeddings for Binary Similarity

The binary similarity problem consists in determining if two functions a...

Please sign up or login with your details

Forgot password? Click here to reset