Considerations for Differentially Private Learning with Large-Scale Public Pretraining

12/13/2022
by   Florian Tramèr, et al.
0

The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets. We critically review this approach. We primarily question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving. We caution that publicizing these models pretrained on Web data as "private" could lead to harm and erode the public's trust in differential privacy as a meaningful definition of privacy. Beyond the privacy considerations of using public data, we further question the utility of this paradigm. We scrutinize whether existing machine learning benchmarks are appropriate for measuring the ability of pretrained models to generalize to sensitive domains, which may be poorly represented in public Web data. Finally, we notice that pretraining has been especially impactful for the largest available models – models sufficiently large to prohibit end users running them on their own devices. Thus, deploying such models today could be a net loss for privacy, as it would require (private) data to be outsourced to a more compute-powerful third party. We conclude by discussing potential paths forward for the field of private learning, as public pretraining becomes more popular and powerful.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2022

Self-Supervised Pretraining for Differentially Private Learning

We demonstrate self-supervised pretraining (SSP) is a scalable solution ...
research
06/01/2023

TMI! Finetuned Models Leak Private Information from their Pretraining Data

Transfer learning has become an increasingly popular technique in machin...
research
02/19/2023

Why Is Public Pretraining Necessary for Private Model Training?

In the privacy-utility tradeoff of a model trained on benchmark language...
research
05/25/2023

Differentially Private Latent Diffusion Models

Diffusion models (DMs) are widely used for generating high-quality image...
research
08/23/2018

Privacy-Preserving Synthetic Datasets Over Weakly Constrained Domains

Techniques to deliver privacy-preserving synthetic datasets take a sensi...
research
10/20/2022

Private Algorithms with Private Predictions

When applying differential privacy to sensitive data, a common way of ge...
research
06/04/2019

A Differentially Private Incentive Design for Traffic Offload to Public Transportation

Increasingly large trip demands have strained urban transportation capac...

Please sign up or login with your details

Forgot password? Click here to reset