Privacy-Utility Tradeoff of OLS with Random Projections

09/03/2023
by   Yun Lu, et al.
0

We study the differential privacy (DP) of a core ML problem, linear ordinary least squares (OLS), a.k.a. ℓ_2-regression. Our key result is that the approximate LS algorithm (ALS) (Sarlos, 2006), a randomized solution to the OLS problem primarily used to improve performance on large datasets, also preserves privacy. ALS achieves a better privacy/utility tradeoff, without modifications or further noising, when compared to alternative private OLS algorithms which modify and/or noise OLS. We give the first tight DP-analysis for the ALS algorithm and the standard Gaussian mechanism (Dwork et al., 2014) applied to OLS. Our methodology directly improves the privacy analysis of (Blocki et al., 2012) and (Sheffet, 2019)) and introduces new tools which may be of independent interest: (1) the exact spectrum of (ϵ, δ)-DP parameters (“DP spectrum") for mechanisms whose output is a d-dimensional Gaussian, and (2) an improved DP spectrum for random projection (compared to (Blocki et al., 2012) and (Sheffet, 2019)). All methods for private OLS (including ours) assume, often implicitly, restrictions on the input database, such as bounds on leverage and residuals. We prove that such restrictions are necessary. Hence, computing the privacy of mechanisms such as ALS must estimate these database parameters, which can be infeasible in big datasets. For more complex ML models, DP bounds may not even be tractable. There is a need for blackbox DP-estimators (Lu et al., 2022) which empirically estimate a data-dependent privacy. We demonstrate the effectiveness of such a DP-estimator by empirically recovering a DP-spectrum that matches our theory for OLS. This validates the DP-estimator in a nontrivial ML application, opening the door to its use in more complex nonlinear ML settings where theory is unavailable.

READ FULL TEXT
research
06/05/2021

Numerical Composition of Differential Privacy

We give a fast algorithm to optimally compose privacy guarantees of diff...
research
07/04/2021

Smoothed Differential Privacy

Differential privacy (DP) is a widely-accepted and widely-applied notion...
research
11/27/2019

Reviewing and Improving the Gaussian Mechanism for Differential Privacy

Differential privacy provides a rigorous framework to quantify data priv...
research
06/07/2021

Antipodes of Label Differential Privacy: PATE and ALIBI

We consider the privacy-preserving machine learning (ML) setting where t...
research
08/30/2023

Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation

Data valuation, a critical aspect of data-centric ML research, aims to q...
research
06/08/2021

Private Counting from Anonymous Messages: Near-Optimal Accuracy with Vanishing Communication Overhead

Differential privacy (DP) is a formal notion for quantifying the privacy...
research
10/26/2019

Facility Location Problem in Differential Privacy Model Revisited

In this paper we study the uncapacitated facility location problem in th...

Please sign up or login with your details

Forgot password? Click here to reset