Ignore Previous Prompt: Attack Techniques For Language Models

11/17/2022
by   Fábio Perez, et al.
0

Transformer-based large language models (LLMs) provide a powerful foundation for natural language tasks in large-scale customer-facing applications. However, studies that explore their vulnerabilities emerging from malicious user interaction are scarce. By proposing PromptInject, a prosaic alignment framework for mask-based iterative adversarial prompt composition, we examine how GPT-3, the most widely deployed language model in production, can be easily misaligned by simple handcrafted inputs. In particular, we investigate two types of attacks – goal hijacking and prompt leaking – and demonstrate that even low-aptitude, but sufficiently ill-intentioned agents, can easily exploit GPT-3's stochastic nature, creating long-tail risks. The code for PromptInject is available at https://github.com/agencyenterprise/PromptInject.

READ FULL TEXT
research
02/08/2023

Training-free Lexical Backdoor Attacks on Language Models

Large-scale language models have achieved tremendous success across vari...
research
04/29/2020

TextAttack: A Framework for Adversarial Attacks in Natural Language Processing

TextAttack is a library for running adversarial attacks against natural ...
research
06/21/2023

Mass-Producing Failures of Multimodal Systems with Language Models

Deployed multimodal systems can fail in ways that evaluators did not ant...
research
06/03/2021

Defending against Backdoor Attacks in Natural Language Generation

The frustratingly fragile nature of neural network models make current n...
research
03/20/2023

Large Language Models and Simple, Stupid Bugs

With the advent of powerful neural language models, AI-based systems to ...
research
05/01/2023

Evaluating statistical language models as pragmatic reasoners

The relationship between communicated language and intended meaning is o...
research
05/15/2023

Memorization for Good: Encryption with Autoregressive Language Models

Over-parameterized neural language models (LMs) can memorize and recite ...

Please sign up or login with your details

Forgot password? Click here to reset