Large Language Models Can Be Strong Differentially Private Learners

10/12/2021
by   Xuechen Li, et al.
0

Differentially Private (DP) learning has seen limited success for building large deep learning models of text, and attempts at straightforwardly applying Differentially Private Stochastic Gradient Descent (DP-SGD) to NLP tasks have resulted in large performance drops and high computational overhead. We show that this performance drop can be mitigated with (1) the use of large pretrained models; (2) hyperparameters that suit DP optimization; and (3) fine-tuning objectives aligned with the pretraining procedure. With these factors set right, we obtain private NLP models that outperform state-of-the-art private training approaches and strong non-private baselines – by directly fine-tuning pretrained models with DP optimization on moderately-sized corpora. To address the computational challenge of running DP-SGD with large Transformers, we propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients for any layer in the model. The technique enables privately training Transformers with almost the same memory cost as non-private training at a modest run-time overhead. Contrary to conventional wisdom that DP optimization fails at learning high-dimensional models (due to noise that scales with dimension) empirical results reveal that private learning with pretrained models tends to not suffer from dimension-dependent performance degradation.

READ FULL TEXT
research
10/05/2022

Fine-Tuning with Differential Privacy Necessitates an Additional Hyperparameter Search

Models need to be trained with privacy-preserving learning algorithms to...
research
12/01/2021

Differentially Private SGD with Sparse Gradients

To protect sensitive training data, differentially private stochastic gr...
research
07/01/2022

When Does Differentially Private Learning Not Suffer in High Dimensions?

Large pretrained models can be privately fine-tuned to achieve performan...
research
09/13/2023

DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass

Differentially private stochastic gradient descent (DP-SGD) adds noise t...
research
07/14/2021

An Efficient DP-SGD Mechanism for Large Scale NLP Models

Recent advances in deep learning have drastically improved performance o...
research
10/15/2022

A Closer Look at the Calibration of Differentially Private Learners

We systematically study the calibration of classifiers trained with diff...
research
09/30/2022

Differentially Private Optimization on Large Model at Small Cost

Differentially private (DP) optimization is the standard paradigm to lea...

Please sign up or login with your details

Forgot password? Click here to reset