Augmenting Greybox Fuzzing with Generative AI

06/11/2023
by   Jie Hu, et al.
0

Real-world programs expecting structured inputs often has a format-parsing stage gating the deeper program space. Neither a mutation-based approach nor a generative approach can provide a solution that is effective and scalable. Large language models (LLM) pre-trained with an enormous amount of natural language corpus have proved to be effective for understanding the implicit format syntax and generating format-conforming inputs. In this paper, propose ChatFuzz, a greybox fuzzer augmented by generative AI. More specifically, we pick a seed in the fuzzer's seed pool and prompt ChatGPT generative models to variations, which are more likely to be format-conforming and thus of high quality. We conduct extensive experiments to explore the best practice for harvesting the power of generative LLM models. The experiment results show that our approach improves the edge coverage by 12.77% over the SOTA greybox fuzzer (AFL++) on 12 target programs from three well-tested benchmarks. As for vulnerability detection, is able to perform similar to or better than AFL++ for programs with explicit syntax rules but not for programs with non-trivial syntax.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2019

Optimizing seed inputs in fuzzing with machine learning

The success of a fuzzing campaign is heavily depending on the quality of...
research
05/23/2023

Understanding Programs by Exploiting (Fuzzing) Test Cases

Semantic understanding of programs has attracted great attention in the ...
research
09/23/2021

FormatFuzzer: Effective Fuzzing of Binary File Formats

Effective fuzzing of programs that process structured binary inputs, suc...
research
10/13/2021

Covert Message Passing over Public Internet Platforms Using Model-Based Format-Transforming Encryption

We introduce a new type of format-transforming encryption where the form...
research
09/04/2023

Code Representation Pre-training with Complements from Program Executions

Large language models (LLMs) for natural language processing have been g...
research
04/07/2023

ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions

Orchestrating a high-quality data preparation program is essential for s...
research
06/01/2023

AI Chain on Large Language Model for Unsupervised Control Flow Graph Generation for Statically-Typed Partial Code

Control Flow Graphs (CFGs) are essential for visualizing, understanding ...

Please sign up or login with your details

Forgot password? Click here to reset