Augmenting Decompiler Output with Learned Variable Names and Types

08/13/2021
by   Qibin Chen, et al.
0

A common tool used by security professionals for reverse-engineering binaries found in the wild is the decompiler. A decompiler attempts to reverse compilation, transforming a binary to a higher-level language such as C. High-level languages ease reasoning about programs by providing useful abstractions such as loops, typed variables, and comments, but these abstractions are lost during compilation. Decompilers are able to deterministically reconstruct structural properties of code, but comments, variable names, and custom variable types are technically impossible to recover. In this paper we present DIRTY (DecompIled variable ReTYper), a novel technique for improving the quality of decompiler output that automatically generates meaningful variable names and types. Empirical evaluation on a novel dataset of C code mined from GitHub shows that DIRTY outperforms prior work approaches by a sizable margin, recovering the original names written by developers 66.4

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2019

DIRE: A Neural Approach to Decompiled Identifier Naming

The decompiler is one of the most common tools for examining binaries wi...
research
06/08/2019

Recovering Variable Names for Minified Code with Usage Contexts

In modern Web technology, JavaScript (JS) code plays an important role. ...
research
03/23/2021

Variable Name Recovery in Decompiled Binary Code using Constrained Masked Language Modeling

Decompilation is the procedure of transforming binary programs into a hi...
research
12/12/2021

Nalin: Learning from Runtime Behavior to Find Name-Value Inconsistencies in Jupyter Notebooks

Variable names are important to understand and maintain code. If a varia...
research
06/05/2023

LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis

Decompilation aims to recover the source code form of a binary executabl...
research
04/07/2023

Revisiting Deep Learning for Variable Type Recovery

Compiled binary executables are often the only available artifact in rev...
research
03/19/2021

Does Code Structure Affect Comprehension? On Using and Naming Intermediate Variables

Intermediate variables can be used to break complex expressions into mor...

Please sign up or login with your details

Forgot password? Click here to reset