Hide and Seek (HaS): A Lightweight Framework for Prompt Privacy Protection

09/06/2023
by   Yu Chen, et al.
0

Numerous companies have started offering services based on large language models (LLM), such as ChatGPT, which inevitably raises privacy concerns as users' prompts are exposed to the model provider. Previous research on secure reasoning using multi-party computation (MPC) has proven to be impractical for LLM applications due to its time-consuming and communication-intensive nature. While lightweight anonymization techniques can protect private information in prompts through substitution or masking, they fail to recover sensitive data replaced in the LLM-generated results. In this paper, we expand the application scenarios of anonymization techniques by training a small local model to de-anonymize the LLM's returned results with minimal computational overhead. We introduce the HaS framework, where "H(ide)" and "S(eek)" represent its two core processes: hiding private entities for anonymization and seeking private entities for de-anonymization, respectively. To quantitatively assess HaS's privacy protection performance, we propose both black-box and white-box adversarial models. Furthermore, we conduct experiments to evaluate HaS's usability in translation and classification tasks. The experimental findings demonstrate that the HaS framework achieves an optimal balance between privacy protection and utility.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2019

Suboptimal Provision of Privacy and Statistical Accuracy When They are Public Goods

With vast databases at their disposal, private tech companies can compet...
research
02/11/2022

What Does it Mean for a Language Model to Preserve Privacy?

Natural language reflects our private lives and identities, making its p...
research
08/17/2022

An Empirical Study on the Membership Inference Attack against Tabular Data Synthesis Models

Tabular data typically contains private and important information; thus,...
research
09/13/2020

Information Laundering for Model Privacy

In this work, we propose information laundering, a novel framework for e...
research
04/15/2022

Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

With the increasing adoption of NLP models in real-world products, it be...
research
03/27/2018

Hiding in the Crowd: A Massively Distributed Algorithm for Private Averaging with Malicious Adversaries

The amount of personal data collected in our everyday interactions with ...
research
09/08/2020

SGX-MR: Regulating Dataflows for Protecting Access Patterns of Data-Intensive SGX Applications

Intel SGX has been a popular trusted execution environment (TEE) for pro...

Please sign up or login with your details

Forgot password? Click here to reset