Prompts Should not be Seen as Secrets: Systematically Measuring Prompt Extraction Attack Success

07/13/2023
by   Yiming Zhang, et al.
0

The generations of large language models are commonly controlled through prompting techniques, where a user's query to the model is prefixed with a prompt that aims to guide the model's behaviour on the query. The prompts used by companies to guide their models are often treated as secrets, to be hidden from the user making the query. They have even been treated as commodities to be bought and sold. However, there has been anecdotal evidence showing that the prompts can be extracted by a user even when they are kept secret. In this paper, we present a framework for systematically measuring the success of prompt extraction attacks. In experiments with multiple sources of prompts and multiple underlying language models, we find that simple text-based attacks can in fact reveal prompts with high probability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2020

Extracting Training Data from Large Language Models

It has become common to publish large (billion parameter) language model...
research
08/07/2023

CAESURA: Language Models as Multi-Modal Query Planners

Traditional query planners translate SQL queries into query plans to be ...
research
11/05/2018

Model Extraction and Active Learning

Machine learning is being increasingly used by individuals, research ins...
research
09/13/2022

PINCH: An Adversarial Extraction Attack Framework for Deep Learning Models

Deep Learning (DL) models increasingly power a diversity of applications...
research
05/19/2023

Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning

Large Language Models (LLMs) are known to memorize significant portions ...
research
08/09/2021

IntenT5: Search Result Diversification using Causal Language Models

Search result diversification is a beneficial approach to overcome under...
research
09/19/2023

Model Leeching: An Extraction Attack Targeting LLMs

Model Leeching is a novel extraction attack targeting Large Language Mod...

Please sign up or login with your details

Forgot password? Click here to reset