Neural Fuzzing: A Neural Approach to Generate Test Data for File Format Fuzzing

12/24/2018
by   Morteza Zakeri Nasrabadi, et al.
0

This article is aimed at the design and implementation of a file format fuzzer. Files are significant inputs to the most of real-world applications. A substantial difficulty with generating input files as test data is to recon the underlying structure and format of the files. In order to distinguish pure data stored in a file from the meta-data describing the file format, a deep learning method based on a neural language model is proposed in this article. The resultant learned model could be applied as a hybrid test data generator, to generate and fuzz both the textual and none-textual sections of the input file. Moreover, the model could be applied to generate test data to fuzz both the meta-data and the ordinary data stored in the file. Our experiments with two known fuzzing tools, AFL and Learn&Fuzz, demonstrate the relatively high code coverage of our proposed method. The experiments also indicate simple neural language models provide a more accurate learning model, than the complicated encoder-decoder models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2020

Looking for non-compliant documents using error messages from multiple parsers

Whether a file is accepted by a single parser is not a reliable indicati...
research
11/27/2018

Wrangling Messy CSV Files by Detecting Row and Type Patterns

It is well known that data scientists spend the majority of their time o...
research
10/11/2021

Integrating Structural Description of Data Format Information into Programming to Auto-generate File Reading Programs

File reading is the basis for data sharing and scientific computing. How...
research
08/12/2017

SigViewer: Visualizing Multimodal Signals Stored in XDF (Extensible Data Format) Files

Multimodal biosignal acquisition is facilitated by recently introduced s...
research
01/20/2022

Statistical detection of format dialects using the weighted Dowker complex

This paper provides an experimentally validated, probabilistic model of ...
research
01/21/2021

Content-Based Textual File Type Detection at Scale

Programming language detection is a common need in the analysis of large...
research
07/13/2023

scda: A Minimal, Serial-Equivalent Format for Parallel I/O

We specify a file-oriented data format suitable for parallel, partition-...

Please sign up or login with your details

Forgot password? Click here to reset