A One-Pass Private Sketch for Most Machine Learning Tasks

06/16/2020
by   Benjamin Coleman, et al.
1

Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees. Inspired by recent progress toward general-purpose data release algorithms, we propose a private sketch, or small summary of the dataset, that supports a multitude of machine learning tasks including regression, classification, density estimation, near-neighbor search, and more. Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm. We prove competitive error bounds for DP kernel density estimation. Existing methods for DP kernel density estimation scale poorly, often exponentially slower with an increase in dimensions. In contrast, our sketch can quickly run on large, high-dimensional datasets in a single pass. Exhaustive experiments show that our generic sketch delivers a similar privacy-utility tradeoff when compared to existing DP methods at a fraction of the computation cost. We expect that our sketch will enable differential privacy in distributed, large-scale machine learning settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2022

Auditing Differential Privacy in High Dimensions with the Kernel Quantum Rényi Divergence

Differential privacy (DP) is the de facto standard for private data rele...
research
12/04/2019

Sub-linear RACE Sketches for Approximate Kernel Density Estimation on Streaming Data

Kernel density estimation is a simple and effective method that lies at ...
research
07/04/2023

Fast Private Kernel Density Estimation via Locality Sensitive Quantization

We study efficient mechanisms for differentially private kernel density ...
research
06/21/2021

Efficient Inference via Universal LSH Kernel

Large machine learning models achieve unprecedented performance on vario...
research
06/13/2018

Integral Privacy for Density Estimation with Approximation Guarantees

Density estimation is an old and central problem in statistics and machi...
research
10/19/2020

Locality Sensitive Hashing with Extended Differential Privacy

Extended differential privacy, a generalization of standard differential...
research
02/09/2023

Distributed Learning with Curious and Adversarial Machines

The ubiquity of distributed machine learning (ML) in sensitive public do...

Please sign up or login with your details

Forgot password? Click here to reset