A Chinese Prompt Attack Dataset for LLMs with Evil Content

09/21/2023
by   Chengyuan Liu, et al.
0

Large Language Models (LLMs) present significant priority in text understanding and generation. However, LLMs suffer from the risk of generating harmful contents especially while being employed to applications. There are several black-box attack methods, such as Prompt Attack, which can change the behaviour of LLMs and induce LLMs to generate unexpected answers with harmful contents. Researchers are interested in Prompt Attack and Defense with LLMs, while there is no publicly available dataset to evaluate the abilities of defending prompt attack. In this paper, we introduce a Chinese Prompt Attack Dataset for LLMs, called CPAD. Our prompts aim to induce LLMs to generate unexpected outputs with several carefully designed prompt attack approaches and widely concerned attacking contents. Different from previous datasets involving safety estimation, We construct the prompts considering three dimensions: contents, attacking methods and goals, thus the responses can be easily evaluated and analysed. We run several well-known Chinese LLMs on our dataset, and the results show that our prompts are significantly harmful to LLMs, with around 70 studies on prompt attack and defense.

READ FULL TEXT

page 3

page 5

research
07/27/2023

Universal and Transferable Adversarial Attacks on Aligned Language Models

Because "out-of-the-box" large language models are capable of generating...
research
01/16/2022

COLD: A Benchmark for Chinese Offensive Language Detection

Offensive language detection and prevention becomes increasing critical ...
research
07/27/2023

Backdoor Attacks for In-Context Learning with Language Models

Because state-of-the-art language models are expensive to train, most pr...
research
08/16/2023

Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models

Large language models (LLMs), such as ChatGPT, have emerged with astonis...
research
09/07/2022

Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots

Chatbots are used in many applications, e.g., automated agents, smart ho...
research
05/19/2023

Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning

Large Language Models (LLMs) are known to memorize significant portions ...
research
09/19/2023

GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

Large language models (LLMs) have recently experienced tremendous popula...

Please sign up or login with your details

Forgot password? Click here to reset