TEM: High Utility Metric Differential Privacy on Text

07/16/2021
by   Ricardo Silva Carvalho, et al.
0

Ensuring the privacy of users whose data are used to train Natural Language Processing (NLP) models is necessary to build and maintain customer trust. Differential Privacy (DP) has emerged as the most successful method to protect the privacy of individuals. However, applying DP to the NLP domain comes with unique challenges. The most successful previous methods use a generalization of DP for metric spaces, and apply the privatization by adding noise to inputs in the metric space of word embeddings. However, these methods assume that one specific distance measure is being used, ignore the density of the space around the input, and assume the embeddings used have been trained on non-sensitive data. In this work we propose Truncated Exponential Mechanism (TEM), a general method that allows the privatization of words using any distance metric, on embeddings that can be trained on sensitive data. Our method makes use of the exponential mechanism to turn the privatization step into a selection problem. This allows the noise applied to be calibrated to the density of the embedding space around the input, and makes domain adaptation possible for the embeddings. In our experiments, we demonstrate that our method significantly outperforms the state-of-the-art in terms of utility for the same level of privacy, while providing more flexibility in the metric selection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2018

Metric-based local differential privacy for statistical applications

Local differential privacy (LPD) is a distributed variant of differentia...
research
10/19/2020

Locality Sensitive Hashing with Extended Differential Privacy

Extended differential privacy, a generalization of standard differential...
research
09/19/2023

A Neighbourhood-Aware Differential Privacy Mechanism for Static Word Embeddings

We propose a Neighbourhood-Aware Differential Privacy (NADP) mechanism c...
research
07/16/2021

BRR: Preserving Privacy of Text Data Efficiently on Device

With the use of personal devices connected to the Internet for tasks suc...
research
06/02/2023

Guiding Text-to-Text Privatization by Syntax

Metric Differential Privacy is a generalization of differential privacy ...
research
10/22/2020

A Differentially Private Text Perturbation Method Using a Regularized Mahalanobis Metric

Balancing the privacy-utility tradeoff is a crucial requirement of many ...
research
10/20/2019

Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations

Accurately learning from user data while providing quantifiable privacy ...

Please sign up or login with your details

Forgot password? Click here to reset