N-grams Bayesian Differential Privacy

01/29/2021
by   Osman Ramadan, et al.
0

Differential privacy has gained popularity in machine learning as a strong privacy guarantee, in contrast to privacy mitigation techniques such as k-anonymity. However, applying differential privacy to n-gram counts significantly degrades the utility of derived language models due to their large vocabularies. We propose a differential privacy mechanism that uses public data as a prior in a Bayesian setup to provide tighter bounds on the privacy loss metric epsilon, and thus better privacy-utility trade-offs. It first transforms the counts to log space, approximating the distribution of the public and private data as Gaussian. The posterior distribution is then evaluated and softmax is applied to produce a probability distribution. This technique achieves up to 85 known mechanisms at epsilon equals 0.1. We compare our mechanism to k-anonymity in a n-gram language modelling task and show that it offers competitive performance at large vocabulary sizes, while also providing superior privacy protection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2021

Bayesian Differential Privacy for Linear Dynamical Systems

Differential privacy is a privacy measure based on the difficulty of dis...
research
11/08/2021

Distribution-Invariant Differential Privacy

Differential privacy is becoming one gold standard for protecting the pr...
research
10/15/2022

Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System

In "Differential Perspectives: Epistemic Disconnects Surrounding the US ...
research
05/24/2020

Successive Refinement of Privacy

This work examines a novel question: how much randomness is needed to ac...
research
04/15/2022

Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

With the increasing adoption of NLP models in real-world products, it be...

Please sign up or login with your details

Forgot password? Click here to reset