References
 Berger et al. (2006) Berger, J. et al. (2006). The case for objective Bayesian analysis. Bayesian analysis, 1(3):385–402.

Berger and Pericchi (1996)
Berger, J. O. and Pericchi, L. R. (1996).
The intrinsic Bayes factor for model selection and prediction.
Journal of the American Statistical Association, 91(433):109–122.  Clyde and George (2004) Clyde, M. and George, E. I. (2004). Model uncertainty. Statistical science, pages 81–94.
 Dinh et al. (2017) Dinh, L., Pascanu, R., Bengio, S., and Bengio, Y. (2017). Sharp minima can generalize for deep nets. arXiv preprint arXiv:1703.04933.
 Gal and Ghahramani (2016) Gal, Y. and Ghahramani, Z. (2016). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059.
 Garipov et al. (2018) Garipov, T., Izmailov, P., Podoprikhin, D., Vetrov, D. P., and Wilson, A. G. (2018). Loss surfaces, mode connectivity, and fast ensembling of DNNs. In Neural Information Processing Systems.
 Gelman et al. (2013) Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian data analysis. Chapman and Hall/CRC.
 Guo et al. (2017) Guo, C., Pleiss, G., Sun, Y., and Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 1321–1330. JMLR. org.
 Gustafsson et al. (2019) Gustafsson, F. K., Danelljan, M., and Schön, T. B. (2019). Evaluating scalable Bayesian deep learning methods for robust computer vision. arXiv preprint arXiv:1906.01620.
 Hafner et al. (2018) Hafner, D., Tran, D., Irpan, A., Lillicrap, T., and Davidson, J. (2018). Reliable uncertainty estimates in deep neural networks using noise contrastive priors. arXiv preprint arXiv:1807.09289.
 Hochreiter and Schmidhuber (1997) Hochreiter, S. and Schmidhuber, J. (1997). Flat minima. Neural Computation, 9(1):1–42.
 Huang et al. (2019) Huang, W. R., Emam, Z., Goldblum, M., Fowl, L., Terry, J. K., Huang, F., and Goldstein, T. (2019). Understanding generalization through visualizations. arXiv preprint arXiv:1906.03291.

Izmailov et al. (2019)
Izmailov, P., Maddox, W. J., Kirichenko, P., Garipov, T., Vetrov, D., and
Wilson, A. G. (2019).
Subspace inference for Bayesian deep learning.
In
Uncertainty in Artificial Intelligence
.  Izmailov et al. (2018) Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., and Wilson, A. G. (2018). Averaging weights leads to wider optima and better generalization. In Uncertainty in Artificial Intelligence (UAI).

Kendall and Gal (2017)
Kendall, A. and Gal, Y. (2017).
What uncertainties do we need in Bayesian deep learning for computer vision?
In Advances in neural information processing systems, pages 5574–5584.  Keskar et al. (2016) Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., and Tang, P. T. P. (2016). On largebatch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836.
 Khan et al. (2018) Khan, M. E., Nielsen, D., Tangkaratt, V., Lin, W., Gal, Y., and Srivastava, A. (2018). Fast and scalable bayesian deep learning by weightperturbation in adam. arXiv preprint arXiv:1806.04854.
 Lakshminarayanan et al. (2017) Lakshminarayanan, B., Pritzel, A., and Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems, pages 6402–6413.
 Louizos et al. (2019) Louizos, C., Shi, X., Schutte, K., and Welling, M. (2019). The functional neural process. In Advances in Neural Information Processing Systems.

MacKay (1992a)
MacKay, D. J. (1992a).
Bayesian interpolation.
Neural Computation, 4(3):415–447.  MacKay (1992b) MacKay, D. J. (1992b). Bayesian methods for adaptive models. PhD thesis, California Institute of Technology.
 MacKay (1995) MacKay, D. J. (1995). Probable networks and plausible predictions?a review of practical bayesian methods for supervised neural networks. Network: computation in neural systems, 6(3):469–505.
 MacKay (2003) MacKay, D. J. (2003). Information theory, inference and learning algorithms. Cambridge university press.
 Maddox et al. (2019) Maddox, W., Garipov, T., Izmailov, P., Vetrov, D., and Wilson, A. G. (2019). A simple baseline for Bayesian uncertainty in deep learning. In Advances in Neural Information Processing Systems.
 Minka (2000) Minka, T. P. (2000). Bayesian model averaging is not model combination.
 Neal (1996) Neal, R. (1996). Bayesian Learning for Neural Networks. Springer Verlag.
 O’Hagan (1995) O’Hagan, A. (1995). Fractional Bayes factors for model comparison. Journal of the Royal Statistical Society: Series B (Methodological), 57(1):99–118.
 Ovadia et al. (2019) Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J. V., Lakshminarayanan, B., and Snoek, J. (2019). Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. arXiv preprint arXiv:1906.02530.
 Pradier et al. (2018) Pradier, M. F., Pan, W., Yao, J., Ghosh, S., and DoshiVelez, F. (2018). Latent projection bnns: Avoiding weightspace pathologies by learning latent representations of neural network weights. arXiv preprint arXiv:1811.07006.
 Ritter et al. (2018) Ritter, H., Botev, A., and Barber, D. (2018). A scalable Laplace approximation for neural networks. In International Conference on Learning Representations (ICLR).
 Saatci and Wilson (2017) Saatci, Y. and Wilson, A. G. (2017). Bayesian GAN. In Advances in neural information processing systems, pages 3622–3631.
 Seeger (2006) Seeger, M. (2006). Bayesian modelling in machine learning: A tutorial review. Technical report.
 Sun et al. (2019) Sun, S., Zhang, G., Shi, J., and Grosse, R. (2019). Functional variational bayesian neural networks. arXiv preprint arXiv:1903.05779.
 Williams and Rasmussen (2006) Williams, C. K. and Rasmussen, C. E. (2006). Gaussian processes for machine learning. The MIT Press, 2(3):4.
 Wilson (2014) Wilson, A. G. (2014). Covariance kernels for fast automatic pattern discovery and extrapolation with Gaussian processes. PhD thesis, University of Cambridge.
 Wilson et al. (2016) Wilson, A. G., Hu, Z., Salakhutdinov, R., and Xing, E. P. (2016). Deep kernel learning. In Artificial Intelligence and Statistics, pages 370–378.
 Yang et al. (2019) Yang, W., Lorch, L., Graule, M. A., Srinivasan, S., Suresh, A., Yao, J., Pradier, M. F., and DoshiVelez, F. (2019). Outputconstrained Bayesian neural networks. arXiv preprint arXiv:1905.06287.
 Zhang et al. (2018) Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2018). Understanding deep learning requires rethinking generalization.
 Zhang et al. (2020) Zhang, R., Li, C., Zhang, J., Chen, C., and Wilson, A. G. (2020). Cyclical stochastic gradient MCMC for Bayesian deep learning. In International Conference on Learning Representations.
 Zołna et al. (2019) Zołna, K., Geras, K. J., and Cho, K. (2019). Classifieragnostic saliency map extraction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33.