Arbitrary Decisions are a Hidden Cost of Differentially-Private Training

02/28/2023
by   Bogdan Kulynych, et al.
0

Mechanisms used in privacy-preserving machine learning often aim to guarantee differential privacy (DP) during model training. Practical DP-ensuring training methods use randomization when fitting model parameters to privacy-sensitive data (e.g., adding Gaussian noise to clipped gradients). We demonstrate that such randomization incurs predictive multiplicity: for a given input example, the output predicted by equally-private models depends on the randomness used in training. Thus, for a given input, the predicted output can vary drastically if a model is re-trained, even if the same training dataset is used. The predictive-multiplicity cost of DP training has not been studied, and is currently neither audited for nor communicated to model designers and stakeholders. We derive a bound on the number of re-trainings required to estimate predictive multiplicity reliably. We analyze – both theoretically and through extensive experiments – the predictive-multiplicity cost of three DP-ensuring algorithms: output perturbation, objective perturbation, and DP-SGD. We demonstrate that the degree of predictive multiplicity rises as the level of privacy increases, and is unevenly distributed across individuals and demographic groups in the data. Because randomness used to ensure DP during training explains predictions for some examples, our results highlight a fundamental challenge to the justifiability of decisions supported by differentially-private models in high-stakes settings. We conclude that practitioners should audit the predictive multiplicity of their DP-ensuring algorithms before deploying them in applications of individual-level consequence.

READ FULL TEXT

page 2

page 5

page 12

research
10/13/2020

Chasing Your Long Tails: Differentially Private Prediction in Health Care Settings

Machine learning models in health care are often deployed in settings wh...
research
10/18/2022

DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling

Nowadays, differential privacy (DP) has become a well-accepted standard ...
research
10/12/2021

Not all noise is accounted equally: How differentially private learning benefits from large sampling rates

Learning often involves sensitive data and as such, privacy preserving e...
research
06/27/2022

Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization

By ensuring differential privacy in the learning algorithms, one can rig...
research
07/24/2023

A Differentially Private Weighted Empirical Risk Minimization Procedure and its Application to Outcome Weighted Learning

It is commonplace to use data containing personal information to build p...
research
05/23/2016

DP-EM: Differentially Private Expectation Maximization

The iterative nature of the expectation maximization (EM) algorithm pres...
research
10/28/2022

DPVIm: Differentially Private Variational Inference Improved

Differentially private (DP) release of multidimensional statistics typic...

Please sign up or login with your details

Forgot password? Click here to reset