Fuzzing Deep-Learning Libraries via Large Language Models

by   Yinlin Deng, et al.

Detecting bugs in Deep Learning (DL) libraries is critical for almost all downstream DL systems in ensuring effectiveness and safety for the end users. As such, researchers have started developing various fuzzing or testing techniques targeting DL libraries. Previous work can be mainly classified into API-level fuzzing and model-level fuzzing. However, both types of techniques cannot detect bugs that can only be exposed by complex API sequences - API-level fuzzers cannot cover API sequences, while model-level fuzzers can only cover specific API sequence patterns and a small subset of APIs due to complicated input/shape constraints for tensor computations. To address these limitations, we propose LLMFuzz - the first automated approach to directly leveraging Large Pre-trained Language Models (LLMs) to generate input programs for fuzzing DL libraries. LLMs are trained on billions of code snippets and can autoregressively generate human-like code snippets. Our key insight is that modern LLMs can also include numerous code snippets invoking DL library APIs in their training corpora, and thus can implicitly learn the intricate DL API constraints and directly generate/mutate valid DL programs for fuzzing DL libraries. More specifically, we first directly use a generative LLM (e.g., Codex) to generate highquality seed programs based on input prompts. Then, we leverage an evolutionary fuzzing loop which applies an infilling LLM (e.g., InCoder) to further perform small mutations on the seed programs to generate more diverse API sequences for fuzzing DL libraries. Our experimental results on popular DL libraries demonstrate that LLMFuzz is able to cover 91.11 24.09 state-of-the-art fuzzers on TensorFlow / PyTorch. Furthermore, LLMFuzz is able to detect 65 bugs, with 41 already confirmed as previously unknown bugs.


page 1

page 2

page 3

page 4


Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT

Deep Learning (DL) library bugs affect downstream DL applications, empha...

RULF: Rust Library Fuzzing via API Dependency Graph Traversal

Robustness is a key concern for Rust library development because Rust pr...

TorchBench: Benchmarking PyTorch with High API Surface Coverage

Deep learning (DL) has been a revolutionary technique in various domains...

Muffin: Testing Deep Learning Libraries via Neural Architecture Fuzzing

Deep learning (DL) techniques are proven effective in many challenging t...

ADELT: Transpilation Between Deep Learning Frameworks

We propose Adversarial DEep Learning Transpiler (ADELT) for source-to-so...

Fuzzing Automatic Differentiation in Deep-Learning Libraries

Deep learning (DL) has attracted wide attention and has been widely depl...

HOPPER: Interpretative Fuzzing for Libraries

Despite the fact that the state-of-the-art fuzzers can generate inputs e...

Please sign up or login with your details

Forgot password? Click here to reset