Learning from i.i.d. data under model miss-specification

by   Andrés R. Masegosa, et al.

This paper introduces a new approach to learning from i.i.d. data under model miss-specification. This approach casts the problem of learning as minimizing the expected code-length of a Bayesian mixture code. To solve this problem, we build on PAC-Bayes bounds, information theory and a new family of second-order Jensen bounds. The key insight of this paper is that the use of the standard (first-order) Jensen bounds in learning is suboptimal when our model class is miss-specified (i.e. it does not contain the data generating distribution). As a consequence of this insight, this work provides strong theoretical arguments explaining why the Bayesian posterior is not optimal for making predictions under model miss-specification because the Bayesian posterior is directly related to the use of first-order Jensen bounds. We then argue for the use of second-order Jensen bounds, which leads to new families of learning algorithms. In this work, we introduce novel variational and ensemble learning methods based on the minimization of a novel family of second-order PAC-Bayes bounds over the expected code-length of a Bayesian mixture code. Using this new framework, we also provide novel hypotheses of why parameters in a flat minimum generalize better than parameters in a sharp minimum.


page 21

page 23

page 25

page 29

page 31


Learning under Model Misspecification: Applications to Variational and Ensemble methods

This paper provides a novel theoretical analysis of the problem of learn...

Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote

We present a new second-order oracle bound for the expected risk of a we...

Information Complexity and Generalization Bounds

We present a unifying picture of PAC-Bayesian and mutual information-bas...

Novel Change of Measure Inequalities and PAC-Bayesian Bounds

PAC-Bayesian theory has received a growing attention in the machine lear...

Robust PAC^m: Training Ensemble Models Under Model Misspecification and Outliers

Standard Bayesian learning is known to have suboptimal generalization ca...

PAC-Bayesian Learning of Optimization Algorithms

We apply the PAC-Bayes theory to the setting of learning-to-optimize. To...

Decoupling Learning Rates Using Empirical Bayes Priors

In this work, we propose an Empirical Bayes approach to decouple the lea...

Please sign up or login with your details

Forgot password? Click here to reset