FuzzLLM: A Novel and Universal Fuzzing Framework for Proactively Discovering Jailbreak Vulnerabilities in Large Language Models

09/11/2023
by   Dongyu Yao, et al.
0

Jailbreak vulnerabilities in Large Language Models (LLMs), which exploit meticulously crafted prompts to elicit content that violates service guidelines, have captured the attention of research communities. While model owners can defend against individual jailbreak prompts through safety training strategies, this relatively passive approach struggles to handle the broader category of similar jailbreaks. To tackle this issue, we introduce FuzzLLM, an automated fuzzing framework designed to proactively test and discover jailbreak vulnerabilities in LLMs. We utilize templates to capture the structural integrity of a prompt and isolate key features of a jailbreak class as constraints. By integrating different base classes into powerful combo attacks and varying the elements of constraints and prohibited questions, FuzzLLM enables efficient testing with reduced manual effort. Extensive experiments demonstrate FuzzLLM's effectiveness and comprehensiveness in vulnerability discovery across various LLMs.

READ FULL TEXT
research
03/07/2023

Vulnerability Mimicking Mutants

With the increasing release of powerful language models trained on large...
research
09/19/2023

GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

Large language models (LLMs) have recently experienced tremendous popula...
research
04/10/2022

Is GitHub's Copilot as Bad As Humans at Introducing Vulnerabilities in Code?

Several advances in deep learning have been successfully applied to the ...
research
08/21/2023

Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions

The assessment of cybersecurity Capture-The-Flag (CTF) exercises involve...
research
08/24/2023

Prompt-Enhanced Software Vulnerability Detection Using ChatGPT

With the increase in software vulnerabilities that cause significant eco...
research
06/09/2023

COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models

Prompt-based learning has been proved to be an effective way in pre-trai...
research
09/04/2023

Open Sesame! Universal Black Box Jailbreaking of Large Language Models

Large language models (LLMs), designed to provide helpful and safe respo...

Please sign up or login with your details

Forgot password? Click here to reset