Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT

by   Yinlin Deng, et al.

Deep Learning (DL) library bugs affect downstream DL applications, emphasizing the need for reliable systems. Generating valid input programs for fuzzing DL libraries is challenging due to the need for satisfying both language syntax/semantics and constraints for constructing valid computational graphs. Recently, the TitanFuzz work demonstrates that modern Large Language Models (LLMs) can be directly leveraged to implicitly learn all the constraints to generate valid DL programs for fuzzing. However, LLMs tend to generate ordinary programs following similar patterns seen in their massive training corpora, while fuzzing favors unusual inputs that cover edge cases or are unlikely to be manually produced. To fill this gap, this paper proposes FuzzGPT, the first technique to prime LLMs to synthesize unusual programs for fuzzing. FuzzGPT is built on the well-known hypothesis that historical bug-triggering programs may include rare/valuable code ingredients important for bug finding. Traditional techniques leveraging such historical information require intensive human efforts to design dedicated generators and ensure the validity of generated programs. FuzzGPT demonstrates that this process can be fully automated via the intrinsic capabilities of LLMs (including fine-tuning and in-context learning), while being generalizable and applicable to challenging domains. While FuzzGPT can be applied with different LLMs, this paper focuses on the powerful GPT-style models: Codex and CodeGen. Moreover, FuzzGPT also shows the potential of directly leveraging the instruct-following capability of the recent ChatGPT for effective fuzzing. Evaluation on two popular DL libraries (PyTorch and TensorFlow) shows that FuzzGPT can substantially outperform TitanFuzz, detecting 76 bugs, with 49 already confirmed as previously unknown bugs, including 11 high-priority bugs or security vulnerabilities.


Fuzzing Deep-Learning Libraries via Large Language Models

Detecting bugs in Deep Learning (DL) libraries is critical for almost al...

Fuzzing Deep-Learning Libraries via Automated Relational API Inference

A growing body of research has been dedicated to DL model testing. Howev...

Toward Understanding Deep Learning Framework Bugs

DL frameworks are the basis of constructing all DL programs and models, ...

ACETest: Automated Constraint Extraction for Testing Deep Learning Operators

Deep learning (DL) applications are prevalent nowadays as they can help ...

When and Why Test Generators for Deep Learning Produce Invalid Inputs: an Empirical Study

Testing Deep Learning (DL) based systems inherently requires large and r...

Fuzzing Automatic Differentiation in Deep-Learning Libraries

Deep learning (DL) has attracted wide attention and has been widely depl...

Montage: A Neural Network Language Model-Guided JavaScript Engine Fuzzer

JavaScript (JS) engine vulnerabilities pose significant security threats...

Please sign up or login with your details

Forgot password? Click here to reset