Boosting Neural Networks to Decompile Optimized Binaries

01/03/2023
by   Ying Cao, et al.
0

Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21 state-of-the-art neural decompilation frameworks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2019

Towards Neural Decompilation

We address the problem of automatic decompilation, converting a program ...
research
08/09/2023

Evaluating and Optimizing the Effectiveness of Neural Machine Translation in Supporting Code Retrieval Models: A Study on the CAT Benchmark

Neural Machine Translation (NMT) is widely applied in software engineeri...
research
06/30/2022

Code Translation with Compiler Representations

In this paper, we leverage low-level compiler intermediate representatio...
research
05/24/2019

Compiler Design for Legal Document Translation in Digital Government

One of the main purposes of a computer is automation. In fact, automatio...
research
02/18/2020

A Survey of Deep Learning Techniques for Neural Machine Translation

In recent years, natural language processing (NLP) has got great develop...
research
02/05/2022

Source Matching and Rewriting

A typical compiler flow relies on a uni-directional sequence of translat...
research
02/22/2022

Learning to Combine Instructions in LLVM Compiler

Instruction combiner (IC) is a critical compiler optimization pass, whic...

Please sign up or login with your details

Forgot password? Click here to reset