N-grams Bayesian Differential Privacy

by   Osman Ramadan, et al.

Differential privacy has gained popularity in machine learning as a strong privacy guarantee, in contrast to privacy mitigation techniques such as k-anonymity. However, applying differential privacy to n-gram counts significantly degrades the utility of derived language models due to their large vocabularies. We propose a differential privacy mechanism that uses public data as a prior in a Bayesian setup to provide tighter bounds on the privacy loss metric epsilon, and thus better privacy-utility trade-offs. It first transforms the counts to log space, approximating the distribution of the public and private data as Gaussian. The posterior distribution is then evaluated and softmax is applied to produce a probability distribution. This technique achieves up to 85 known mechanisms at epsilon equals 0.1. We compare our mechanism to k-anonymity in a n-gram language modelling task and show that it offers competitive performance at large vocabulary sizes, while also providing superior privacy protection.



page 1

page 2

page 3

page 4


Bayesian Differential Privacy for Linear Dynamical Systems

Differential privacy is a privacy measure based on the difficulty of dis...

Distribution-Invariant Differential Privacy

Differential privacy is becoming one gold standard for protecting the pr...

Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

With the increasing adoption of NLP models in real-world products, it be...

Successive Refinement of Privacy

This work examines a novel question: how much randomness is needed to ac...

Can Differential Privacy Practically Protect Collaborative Deep Learning Inference for the Internet of Things?

Collaborative inference has recently emerged as an intriguing framework ...

A New Analysis of Differential Privacy's Generalization Guarantees

We give a new proof of the "transfer theorem" underlying adaptive data a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.