Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy

We show that Entropy-SGD (Chaudhari et al., 2016), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i.e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier. Entropy-SGD works by optimizing the bound's prior, violating the hypothesis of the PAC-Bayes theorem that the prior is chosen independently of the data. Indeed, available implementations of Entropy-SGD rapidly obtain zero training error on random labels and the same holds of the Gibbs posterior. In order to obtain a valid generalization bound, we show that an ϵ-differentially private prior yields a valid PAC-Bayes bound, a straightforward consequence of results connecting generalization with differential privacy. Using stochastic gradient Langevin dynamics (SGLD) to approximate the well-known exponential release mechanism, we observe that generalization error on MNIST (measured on held out data) falls within the (empirically nonvacuous) bounds computed under the assumption that SGLD produces perfect samples. In particular, Entropy-SGLD can be configured to yield relatively tight generalization bounds and still fit real labels, although these same settings do not obtain state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2018

Data-dependent PAC-Bayes priors via differential privacy

The Probably Approximately Correct (PAC) Bayes framework (McAllester, 19...
research
10/22/2021

Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

We study an approach to learning pruning masks by optimizing the expecte...
research
05/04/2021

Information Complexity and Generalization Bounds

We present a unifying picture of PAC-Bayesian and mutual information-bas...
research
09/12/2022

A Note on the Efficient Evaluation of PAC-Bayes Bounds

When utilising PAC-Bayes theory for risk certification, it is usually ne...
research
05/25/2023

Exponential Smoothing for Off-Policy Learning

Off-policy learning (OPL) aims at finding improved policies from logged ...
research
11/19/2022

Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States

Stochastic differential equations (SDEs) have been shown recently to wel...
research
10/14/2017

Learners that Leak Little Information

We study learning algorithms that are restricted to revealing little inf...

Please sign up or login with your details

Forgot password? Click here to reset