DP-BART for Privatized Text Rewriting under Local Differential Privacy

02/15/2023
by   Timour Igamberdiev, et al.
0

Privatized text rewriting with local differential privacy (LDP) is a recent approach that enables sharing of sensitive textual documents while formally guaranteeing privacy protection to individuals. However, existing systems face several issues, such as formal mathematical flaws, unrealistic privacy guarantees, privatization of only individual words, as well as a lack of transparency and reproducibility. In this paper, we propose a new system 'DP-BART' that largely outperforms existing LDP systems. Our approach uses a novel clipping method, iterative pruning, and further training of internal representations which drastically reduces the amount of noise required for DP guarantees. We run experiments on five textual datasets of varying sizes, rewriting them at different privacy guarantees and evaluating the rewritten texts on downstream text classification tasks. Finally, we thoroughly discuss the privatized text rewriting approach and its limitations, including the problem of the strict text adjacency constraint in the LDP paradigm that leads to the high noise requirement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/22/2022

DP-Rewrite: Towards Reproducibility and Transparency in Differentially Private Text Rewriting

Text rewriting with differential privacy (DP) provides concrete theoreti...
research
06/17/2021

Accuracy, Interpretability, and Differential Privacy via Explainable Boosting

We show that adding differential privacy to Explainable Boosting Machine...
research
01/11/2022

Achieving Differential Privacy with Matrix Masking in Big Data

Differential privacy schemes have been widely adopted in recent years to...
research
02/24/2022

Bounding Membership Inference

Differential Privacy (DP) is the de facto standard for reasoning about t...
research
03/06/2023

Crowdsourcing on Sensitive Data with Privacy-Preserving Text Rewriting

Most tasks in NLP require labeled data. Data labeling is often done on c...
research
06/02/2021

Differential Privacy for Text Analytics via Natural Text Sanitization

Texts convey sophisticated knowledge. However, texts also convey sensiti...
research
11/26/2018

Generalised Differential Privacy for Text Document Processing

We address the problem of how to "obfuscate" texts by removing stylistic...

Please sign up or login with your details

Forgot password? Click here to reset