Passing Tests without Memorizing: Two Models for Fooling Discriminators

02/09/2019
by   Olivier Bousquet, et al.
0

We introduce two mathematical frameworks for foolability in the context of generative distribution learning. In a nuthsell, fooling is an algorithmic task in which the input sample is drawn from some target distribution and the goal is to output a synthetic distribution that is indistinguishable from the target w.r.t to some fixed class of tests. This framework received considerable attention in the context of Generative Adversarial Networks (GANs), a recently proposed approach which achieves impressive empirical results. From a theoretical viewpoint this problem seems difficult to model. This is due to the fact that in its basic form, the notion of foolability is susceptible to a type of overfitting called memorizing. This raises a challenge of devising notions and definitions that separate between fooling algorithms that generate new synthetic data vs. algorithms that merely memorize or copy the training set. The first model we consider is called GAM--Foolability and is inspired by GANs. Here the learner has only an indirect access to the target distribution via a discriminator. The second model, called DP--Foolability, exploits the notion of differential privacy as a candidate criterion for non-memorization. We proceed to characterize foolability within these two models and study their interrelations. We show that DP--Foolability implies GAM--Foolability and prove partial results with respect to the converse. It remains, though, an open question whether GAM--Foolability implies DP--Foolability. We also present an application in the context of differentially private PAC learning. We show that from a statistical perspective, for any class H, learnability by a private proper learner is equivalent to the existence of a private sanitizer for H. This can be seen as an analogue of the equivalence between uniform convergence and learnability in classical PAC learning.

READ FULL TEXT
research
11/01/2021

Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence

Although machine learning models trained on massive data have led to bre...
research
05/27/2019

Private Learning Implies Online Learning: An Efficient Reduction

We study the relationship between the notions of differentially private ...
research
03/22/2023

Stability is Stable: Connections between Replicability, Privacy, and Adaptive Generalization

The notion of replicable algorithms was introduced in Impagliazzo et al....
research
06/28/2022

Improving Correlation Capture in Generating Imbalanced Data using Differentially Private Conditional GANs

Despite the remarkable success of Generative Adversarial Networks (GANs)...
research
06/02/2023

Harnessing large-language models to generate private synthetic text

Differentially private (DP) training methods like DP-SGD can protect sen...
research
05/23/2018

Differentially Private Uniformly Most Powerful Tests for Binomial Data

We derive uniformly most powerful (UMP) tests for simple and one-sided h...
research
02/13/2023

Do PAC-Learners Learn the Marginal Distribution?

We study a foundational variant of Valiant and Vapnik and Chervonenkis' ...

Please sign up or login with your details

Forgot password? Click here to reset