Semantics-Recovering Decompilation through Neural Machine Translation

12/22/2021
by   Ruigang Liang, et al.
0

Decompilation transforms low-level program languages (PL) (e.g., binary code) into high-level PLs (e.g., C/C++). It has been widely used when analysts perform security analysis on software (systems) whose source code is unavailable, such as vulnerability search and malware analysis. However, current decompilation tools usually need lots of experts' efforts, even for years, to generate the rules for decompilation, which also requires long-term maintenance as the syntax of high-level PL or low-level PL changes. Also, an ideal decompiler should concisely generate high-level PL with similar functionality to the source low-level PL and semantic information (e.g., meaningful variable names), just like human-written code. Unfortunately, existing manually-defined rule-based decompilation techniques only functionally restore the low-level PL to a similar high-level PL and are still powerless to recover semantic information. In this paper, we propose a novel neural decompilation approach to translate low-level PL into accurate and user-friendly high-level PL, effectively improving its readability and understandability. Furthermore, we implement the proposed approaches called SEAM. Evaluations on four real-world applications show that SEAM has an average accuracy of 94.41 (NMT) models. Finally, we evaluate the effectiveness of semantic information recovery through a questionnaire survey, and the average accuracy is 92.64 which is comparable or superior to the state-of-the-art compilers.

READ FULL TEXT
research
05/20/2019

Towards Neural Decompilation

We address the problem of automatic decompilation, converting a program ...
research
12/07/2022

Systematic review of automatic translation of high-level security policy into firewall rules

Firewalls are security devices that perform network traffic filtering. T...
research
06/07/2019

Software Ethology: An Accurate and Resilient Semantic Binary Analysis Framework

When reverse engineering a binary, the analyst must first understand the...
research
06/28/2019

A Neural-based Program Decompiler

Reverse engineering of binary executables is a critical problem in the c...
research
05/03/2018

The Effectiveness of Low-Level Structure-based Approach Toward Source Code Plagiarism Level Taxonomy

Low-level approach is a novel way to detect source code plagiarism. Such...
research
04/07/2023

Revisiting Deep Learning for Variable Type Recovery

Compiled binary executables are often the only available artifact in rev...
research
08/31/2018

Wasabi: A Framework for Dynamically Analyzing WebAssembly

WebAssembly is the new low-level language for the web and has now been i...

Please sign up or login with your details

Forgot password? Click here to reset