Differential Privacy for Text Analytics via Natural Text Sanitization

06/02/2021
by   Xiang Yue, et al.
0

Texts convey sophisticated knowledge. However, texts also convey sensitive information. Despite the success of general-purpose language models and domain-specific mechanisms with differential privacy (DP), existing text sanitization mechanisms still provide low utility, as cursed by the high-dimensional text representation. The companion issue of utilizing sanitized texts for downstream analytics is also under-explored. This paper takes a direct approach to text sanitization. Our insight is to consider both sensitivity and similarity via our new local DP notion. The sanitized texts also contribute to our sanitization-aware pretraining and fine-tuning, enabling privacy-preserving natural language processing over the BERT language model with promising utility. Surprisingly, the high utility does not boost up the success rate of inference attacks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2020

Differentially Private Language Models Benefit from Public Pre-training

Language modeling is a keystone task in natural language processing. Whe...
research
04/15/2021

Privacy-Adaptive BERT for Natural Language Understanding

When trying to apply the recent advance of Natural Language Understandin...
research
10/25/2022

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

Privacy concerns have attracted increasing attention in data-driven prod...
research
03/12/2021

Privacy Regularization: Joint Privacy-Utility Optimization in Language Models

Neural language models are known to have a high capacity for memorizatio...
research
02/15/2023

DP-BART for Privatized Text Rewriting under Local Differential Privacy

Privatized text rewriting with local differential privacy (LDP) is a rec...
research
06/02/2023

Guiding Text-to-Text Privatization by Syntax

Metric Differential Privacy is a generalization of differential privacy ...
research
08/07/2020

Privacy Guarantees for De-identifying Text Transformations

Machine Learning approaches to Natural Language Processing tasks benefit...

Please sign up or login with your details

Forgot password? Click here to reset