Learning under Model Misspecification: Applications to Variational and Ensemble methods
This paper provides a novel theoretical analysis of the problem of learning from i.i.d. data under model misspecification. To analyze this problem, we build on PAC-Bayes and second-order Jensen bounds. We then show that Bayesian model averaging is not an optimal method for learning under these settings because it does not properly optimize the generalization performance of the posterior predictive distribution over unseen data samples. Based on these insights, we introduce novel variational and ensemble learning methods based on the (approximate) minimization of a novel family of second-order PAC-Bayes bounds over the generalization performance of the posterior predictive. This theoretical analysis provides novel explanations of why diversity is key for the performance of model averaging methods. Experiments on toy and real data sets with Bayesian neural networks illustrate these learning algorithms.
READ FULL TEXT