Learning from i.i.d. data under model miss-specification
This paper introduces a new approach to learning from i.i.d. data under model miss-specification. This approach casts the problem of learning as minimizing the expected code-length of a Bayesian mixture code. To solve this problem, we build on PAC-Bayes bounds, information theory and a new family of second-order Jensen bounds. The key insight of this paper is that the use of the standard (first-order) Jensen bounds in learning is suboptimal when our model class is miss-specified (i.e. it does not contain the data generating distribution). As a consequence of this insight, this work provides strong theoretical arguments explaining why the Bayesian posterior is not optimal for making predictions under model miss-specification because the Bayesian posterior is directly related to the use of first-order Jensen bounds. We then argue for the use of second-order Jensen bounds, which leads to new families of learning algorithms. In this work, we introduce novel variational and ensemble learning methods based on the minimization of a novel family of second-order PAC-Bayes bounds over the expected code-length of a Bayesian mixture code. Using this new framework, we also provide novel hypotheses of why parameters in a flat minimum generalize better than parameters in a sharp minimum.
READ FULL TEXT