Learning from Mixtures of Private and Public Populations

08/01/2020
by   Raef Bassily, et al.
0

We initiate the study of a new model of supervised learning under privacy constraints. Imagine a medical study where a dataset is sampled from a population of both healthy and unhealthy individuals. Suppose healthy individuals have no privacy concerns (in such case, we call their data "public") while the unhealthy individuals desire stringent privacy protection for their data. In this example, the population (data distribution) is a mixture of private (unhealthy) and public (healthy) sub-populations that could be very different. Inspired by the above example, we consider a model in which the population 𝒟 is a mixture of two sub-populations: a private sub-population 𝒟_ priv of private and sensitive data, and a public sub-population 𝒟_ pub of data with no privacy concerns. Each example drawn from 𝒟 is assumed to contain a privacy-status bit that indicates whether the example is private or public. The goal is to design a learning algorithm that satisfies differential privacy only with respect to the private examples. Prior works in this context assumed a homogeneous population where private and public data arise from the same distribution, and in particular designed solutions which exploit this assumption. We demonstrate how to circumvent this assumption by considering, as a case study, the problem of learning linear classifiers in ℝ^d. We show that in the case where the privacy status is correlated with the target label (as in the above example), linear classifiers in ℝ^d can be learned, in the agnostic as well as the realizable setting, with sample complexity which is comparable to that of the classical (non-private) PAC-learning. It is known that this task is impossible if all the data is considered private.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2019

Limits of Private Learning with Access to Public Data

We consider learning problems where the training set consists of two typ...
research
08/11/2023

Private Distribution Learning with Public Data: The View from Sample Compression

We study the problem of private distribution learning with access to pub...
research
09/17/2022

On PAC Learning Halfspaces in Non-interactive Local Privacy Model with Public Unlabeled Data

In this paper, we study the problem of PAC learning halfspaces in the no...
research
08/11/2021

Statistical Inference in the Differential Privacy Model

In modern settings of data analysis, we may be running our algorithms on...
research
01/28/2022

Transfer Learning In Differential Privacy's Hybrid-Model

The hybrid-model (Avent et al 2017) in Differential Privacy is a an augm...
research
03/24/2022

Reputation structure in indirect reciprocity under noisy and private assessment

Evaluation relationships are pivotal for maintaining a cooperative socie...
research
10/01/2019

Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data

In this paper, we study the problem of estimating smooth Generalized Lin...

Please sign up or login with your details

Forgot password? Click here to reset