GraphMoco:a Graph Momentum Contrast Model that Using Multimodel Structure Information for Large-scale Binary Function Representation Learning

05/18/2023
by   Sun RuiJin, et al.
0

The ability to compute similarity scores of binary code at the function level is essential for cyber security. A single binary file can contain tens of thousands of functions. A deployable learning framework for cybersecurity applications needs to work not only accurately but also efficiently with large amounts of data. Traditional methods suffer from two drawbacks. First, it is very difficult to annotate different pairs of functions with accurate labels. These supervised learning methods can easily be overtrained with inaccurate labels. The second is that they either use the pre-trained encoder or use the fine-grained graph comparison. However, these methods have shortcomings in terms of time or memory consumption. We focus on large-scale Binary Code Similarity Detection (BCSD) and to mitigate the traditional problems, we propose GraphMoco: a graph momentum contrast model that uses multimodal structure information for large-scale binary function representation learning. We take an unsupervised learning approach and make full use of the structural information in the binary code. It does not require manually labelled similar or dissimilar information. Our models perform efficiently on large amounts of training data. Our experimental results show that our method outperforms the state-of-the-art in terms of accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2022

Fun2Vec:a Contrastive Learning Framework of Function-level Representation for Binary

Function-level binary code similarity detection is essential in the fiel...
research
07/25/2023

Speech representation learning: Learning bidirectional encoders with single-view, multi-view, and multi-task methods

This thesis focuses on representation learning for sequence data over ti...
research
08/22/2017

Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection

The problem of cross-platform binary code similarity detection aims at d...
research
06/10/2021

Semantic-aware Binary Code Representation with BERT

A wide range of binary analysis applications, such as bug discovery, mal...
research
11/27/2018

Learning State Representations in Complex Systems with Multimodal Data

Representation learning becomes especially important for complex systems...
research
08/26/2019

Embarrassingly Simple Binary Representation Learning

Recent binary representation learning models usually require sophisticat...
research
06/20/2018

Injecting Relational Structural Representation in Neural Networks for Question Similarity

Effectively using full syntactic parsing information in Neural Networks ...

Please sign up or login with your details

Forgot password? Click here to reset