DP-FP: Differentially Private Forward Propagation for Large Models

12/29/2021
by   Jian Du, et al.
0

When applied to large-scale learning problems, the conventional wisdom on privacy-preserving deep learning, known as Differential Private Stochastic Gradient Descent (DP-SGD), has met with limited success due to significant performance degradation and high memory overhead when compared to the non-privacy counterpart. We show how to mitigate the performance drop by replacing the DP-SGD with a novel DP Forward-Propagation (DP-FP) followed by an off-the-shelf non-DP optimizer. Our DP-FP employs novel (1) representation clipping followed by noise addition in the forward propagation stage, as well as (2) micro-batch construction via subsampling to achieve DP amplification and reduce noise power to 1/M, where M is the number of micro-batch in a step. When training a classification model, our DP-FP with all of the privacy-preserving operations on the representation is innately free of gradient bias, total noise proportionally to model size, and memory issues in DP-SGD. As a result, our DP-FP outperforms cutting-edge DP-SGD while retaining the same level of privacy, and it approaches non-private baselines and significantly outperforms state-of-the-art DP-SGD variants. When applied to RoBERTa-large on four downstream tasks, for example, DP-FP achieves an average accuracy of 91.34% with privacy budgets less than 3, representing a 3.81% performance improvement over the state-of-the-art DP-SGD and only a 0.9% loss compared to the non-private baseline but with a significantly lower privacy leakage risk.

READ FULL TEXT
research
10/30/2021

Dynamic Differential-Privacy Preserving SGD

Differentially-Private Stochastic Gradient Descent (DP-SGD) prevents tra...
research
09/13/2023

DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass

Differentially private stochastic gradient descent (DP-SGD) adds noise t...
research
12/12/2022

Generalizing DP-SGD with Shuffling and Batching Clipping

Classical differential private DP-SGD implements individual clipping wit...
research
04/21/2023

DP-Adam: Correcting DP Bias in Adam's Second Moment Estimation

We observe that the traditional use of DP with the Adam optimizer introd...
research
12/15/2021

One size does not fit all: Investigating strategies for differentially-private learning across NLP tasks

Preserving privacy in training modern NLP models comes at a cost. We kno...
research
06/14/2022

Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger

Per-example gradient clipping is a key algorithmic step that enables pra...
research
03/20/2021

DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation

Recent success of deep neural networks (DNNs) hinges on the availability...

Please sign up or login with your details

Forgot password? Click here to reset