Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks

02/11/2023
by   Daniel Kang, et al.
0

Recent advances in instruction-following large language models (LLMs) have led to dramatic improvements in a range of NLP tasks. Unfortunately, we find that the same improved capabilities amplify the dual-use risks for malicious purposes of these models. Dual-use is difficult to prevent as instruction-following capabilities now enable standard attacks from computer security. The capabilities of these instruction-following LLMs provide strong economic incentives for dual-use by malicious actors. In particular, we show that instruction-following LLMs can produce targeted malicious content, including hate speech and scams, bypassing in-the-wild defenses implemented by LLM API vendors. Our analysis shows that this content can be generated economically and at cost likely lower than with human effort alone. Together, our findings suggest that LLMs will increasingly attract more sophisticated adversaries and attacks, and addressing these attacks may require new approaches to mitigations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2023

More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models

We are currently witnessing dramatic advances in the capabilities of Lar...
research
04/06/2023

Instruction Tuning with GPT-4

Prior work has shown that finetuning large language models (LLMs) using ...
research
07/20/2023

LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?

Large language models (LLMs) have exhibited impressive capabilities in c...
research
05/24/2023

Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models

Instruction-tuned models are trained on crowdsourcing datasets with task...
research
05/24/2023

From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads

This research article critically examines the potential risks and implic...
research
08/24/2023

Code Llama: Open Foundation Models for Code

We release Code Llama, a family of large language models for code based ...

Please sign up or login with your details

Forgot password? Click here to reset