GraphBinMatch: Graph-based Similarity Learning for Cross-Language Binary and Source Code Matching

04/10/2023
by   Ali TehraniJamsaz, et al.
0

Matching binary to source code and vice versa has various applications in different fields, such as computer security, software engineering, and reverse engineering. Even though there exist methods that try to match source code with binary code to accelerate the reverse engineering process, most of them are designed to focus on one programming language. However, in real life, programs are developed using different programming languages depending on their requirements. Thus, cross-language binary-to-source code matching has recently gained more attention. Nonetheless, the existing approaches still struggle to have precise predictions due to the inherent difficulties when the problem of matching binary code and source code needs to be addressed across programming languages. In this paper, we address the problem of cross-language binary source code matching. We propose GraphBinMatch, an approach based on a graph neural network that learns the similarity between binary and source codes. We evaluate GraphBinMatch on several tasks, such as cross-language binary-to-source code matching and cross-language source-to-source matching. We also evaluate our approach performance on single-language binary-to-source code matching. Experimental results show that GraphBinMatch outperforms state-of-the-art significantly, with improvements as high as 15 score.

READ FULL TEXT

page 1

page 4

research
01/19/2022

Cross-Language Binary-Source Code Matching with Intermediate Representations

Binary-source code matching plays an important role in many security and...
research
11/02/2017

BinPro: A Tool for Binary Source Code Provenance

Enforcing open source licenses such as the GNU General Public License (G...
research
11/21/2012

Scaling Genetic Programming for Source Code Modification

In Search Based Software Engineering, Genetic Programming has been used ...
research
01/19/2021

Improving type information inferred by decompilers with supervised machine learning

In software reverse engineering, decompilation is the process of recover...
research
12/10/2021

BCD: A Cross-Architecture Binary Comparison Database Experiment Using Locality Sensitive Hashing Algorithms

Given a binary executable without source code, it is difficult to determ...
research
04/28/2019

A Feature Based Methodology for Variable Requirements Reverse Engineering

In the past years, software reverse engineering dealt with source code u...

Please sign up or login with your details

Forgot password? Click here to reset