Privacy Regularization: Joint Privacy-Utility Optimization in Language Models

Neural language models are known to have a high capacity for memorization of training samples. This may have serious privacy implications when training models on user content such as email correspondence. Differential privacy (DP), a popular choice to train models with privacy guarantees, comes with significant costs in terms of utility degradation and disparate impact on subgroups of users. In this work, we introduce two privacy-preserving regularization methods for training language models that enable joint optimization of utility and privacy through (1) the use of a discriminator and (2) the inclusion of a triplet-loss term. We compare our methods with DP through extensive evaluation. We show the advantages of our regularizers with favorable utility-privacy trade-off, faster training with the ability to tap into existing optimization approaches, and ensuring uniform treatment of under-represented subgroups.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2023

Federated Learning of Gboard Language Models with Differential Privacy

We train language models (LMs) with federated learning (FL) and differen...
research
09/07/2022

On the utility and protection of optimization with differential privacy and classic regularization techniques

Nowadays, owners and developers of deep learning models must consider st...
research
11/01/2022

User-Entity Differential Privacy in Learning Natural Language Models

In this paper, we introduce a novel concept of user-entity differential ...
research
06/06/2023

OptimShare: A Unified Framework for Privacy Preserving Data Sharing – Towards the Practical Utility of Data with Privacy

Tabular data sharing serves as a common method for data exchange. Howeve...
research
06/02/2021

Differential Privacy for Text Analytics via Natural Text Sanitization

Texts convey sophisticated knowledge. However, texts also convey sensiti...
research
12/17/2019

Analyzing Privacy Loss in Updates of Natural Language Models

To continuously improve quality and reflect changes in data, machine lea...
research
10/16/2021

Noise-Augmented Privacy-Preserving Empirical Risk Minimization with Dual-purpose Regularizer and Privacy Budget Retrieval and Recycling

We propose Noise-Augmented Privacy-Preserving Empirical Risk Minimizatio...

Please sign up or login with your details

Forgot password? Click here to reset