The Limits of Word Level Differential Privacy

05/02/2022
by   Justus Mattern, et al.
0

As the issues of privacy and trust are receiving increasing attention within the research community, various attempts have been made to anonymize textual data. A significant subset of these approaches incorporate differentially private mechanisms to perturb word embeddings, thus replacing individual words in a sentence. While these methods represent very important contributions, have various advantages over other techniques and do show anonymization capabilities, they have several shortcomings. In this paper, we investigate these weaknesses and demonstrate significant mathematical constraints diminishing the theoretical privacy guarantee as well as major practical shortcomings with regard to the protection against deanonymization attacks, the preservation of content of the original sentences as well as the quality of the language output. Finally, we propose a new method for text anonymization based on transformer based language models fine-tuned for paraphrasing that circumvents most of the identified weaknesses and also offers a formal privacy guarantee. We evaluate the performance of our method via thorough experimentation and demonstrate superior performance over the discussed mechanisms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2022

Differentially Private Language Models for Secure Data Sharing

To protect the privacy of individuals whose data is being shared, it is ...
research
05/08/2023

Differentially Private Attention Computation

Large language models (LLMs) have had a profound impact on numerous aspe...
research
01/04/2022

Submix: Practical Private Prediction for Large-Scale Language Models

Recent data-extraction attacks have exposed that language models can mem...
research
04/15/2019

Differential Privacy for Eye-Tracking Data

As large eye-tracking datasets are created, data privacy is a pressing c...
research
06/03/2022

Differentially Private Model Compression

Recent papers have shown that large pre-trained language models (LLMs) s...
research
06/24/2023

Adaptive Privacy Composition for Accuracy-first Mechanisms

In many practical applications of differential privacy, practitioners se...

Please sign up or login with your details

Forgot password? Click here to reset