Towards Neural Decompilation

05/20/2019
by   Omer Katz, et al.
0

We address the problem of automatic decompilation, converting a program in low-level representation back to a higher-level human-readable programming language. The problem of decompilation is extremely important for security researchers. Finding vulnerabilities and understanding how malware operates is much easier when done over source code. The importance of decompilation has motivated the construction of hand-crafted rule-based decompilers. Such decompilers have been designed by experts to detect specific control-flow structures and idioms in low-level code and lift them to source level. The cost of supporting additional languages or new language features in these models is very high. We present a novel approach to decompilation based on neural machine translation. The main idea is to automatically learn a decompiler from a given compiler. Given a compiler from a source language S to a target language T , our approach automatically trains a decompiler that can translate (decompile) T back to S . We used our framework to decompile both LLVM IR and x86 assembly to C code with high success rates. Using our LLVM and x86 instantiations, we were able to successfully decompile over 97

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2023

Boosting Neural Networks to Decompile Optimized Binaries

Decompilation aims to transform a low-level program language (LPL) (eg.,...
research
12/22/2021

Semantics-Recovering Decompilation through Neural Machine Translation

Decompilation transforms low-level program languages (PL) (e.g., binary ...
research
03/18/2023

Ownership guided C to Rust translation

Dubbed a safer C, Rust is a modern programming language that combines me...
research
06/30/2022

Code Translation with Compiler Representations

In this paper, we leverage low-level compiler intermediate representatio...
research
04/13/2022

Modular and Didactic Compiler Design with XML Inter-Phases Communication

In Compiler Design courses, students learn how a program written in high...
research
02/05/2022

Source Matching and Rewriting

A typical compiler flow relies on a uni-directional sequence of translat...
research
04/27/2020

LIO*: Low Level Information Flow Control in F*

We present Labeled Input Output in F* (LIO*), a verified framework that ...

Please sign up or login with your details

Forgot password? Click here to reset