Can We Generate Shellcodes via Natural Language? An Empirical Study

02/08/2022
by   Pietro Liguori, et al.
0

Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach based on Neural Machine Translation (NMT). We then present an empirical study using a novel dataset (Shellcode_IA32), which consists of 3,200 assembly code snippets of real Linux/x86 shellcodes from public databases, annotated using natural language. Moreover, we propose novel metrics to evaluate the accuracy of NMT at generating shellcodes. The empirical analysis shows that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors.

READ FULL TEXT

page 8

page 11

page 12

page 15

page 16

page 22

page 25

page 28

research
09/01/2021

EVIL: Exploiting Software via Natural Language

Writing exploits for security assessment is a challenging task. The writ...
research
04/27/2021

Shellcode_IA32: A Dataset for Automatic Shellcode Generation

We take the first step to address the task of automatically generating s...
research
12/12/2022

Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

AI-based code generators are an emerging solution for automatically writ...
research
02/01/2023

On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot

Software engineering research has always being concerned with the improv...
research
08/12/2021

The paradox of the compositionality of natural language: a neural machine translation case study

Moving towards human-like linguistic performance is often argued to requ...
research
03/29/2022

Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation

Neural Machine Translation (NMT) has reached a level of maturity to be r...
research
01/03/2023

An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation

Existing automated techniques for software documentation typically attem...

Please sign up or login with your details

Forgot password? Click here to reset