Richer priors for infinitely wide multi-layer perceptrons

11/29/2019
by   Russell Tsuchida, et al.
2

It is well-known that the distribution over functions induced through a zero-mean iid prior distribution over the parameters of a multi-layer perceptron (MLP) converges to a Gaussian process (GP), under mild conditions. We extend this result firstly to independent priors with general zero or non-zero means, and secondly to a family of partially exchangeable priors which generalise iid priors. We discuss how the second prior arises naturally when considering an equivalence class of functions in an MLP and through training processes such as stochastic gradient descent. The model resulting from partially exchangeable priors is a GP, with an additional level of inference in the sense that the prior and posterior predictive distributions require marginalisation over hyperparameters. We derive the kernels of the limiting GP in deep MLPs, and show empirically that these kernels avoid certain pathologies present in previously studied priors. We empirically evaluate our claims of convergence by measuring the maximum mean discrepancy between finite width models and limiting models. We compare the performance of our new limiting model to some previously discussed models on synthetic regression problems. We observe increasing ill-conditioning of the marginal likelihood and hyper-posterior as the depth of the model increases, drawing parallels with finite width networks which require notoriously involved optimisation tricks.

READ FULL TEXT

page 20

page 21

page 22

page 23

research
11/01/2017

Deep Neural Networks as Gaussian Processes

A deep fully-connected neural network with an i.i.d. prior over its para...
research
06/18/2020

Exact posterior distributions of wide Bayesian neural networks

Recent work has shown that the prior over functions induced by a deep Ba...
research
05/25/2016

How priors of initial hyperparameters affect Gaussian process regression models

The hyperparameters in Gaussian process regression (GPR) model with a sp...
research
12/29/2022

Bayesian Interpolation with Deep Linear Networks

This article concerns Bayesian inference using deep linear networks with...
research
06/11/2021

The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective

Large width limits have been a recent focus of deep learning research: m...
research
02/23/2022

Wide Mean-Field Bayesian Neural Networks Ignore the Data

Bayesian neural networks (BNNs) combine the expressive power of deep lea...
research
06/02/2023

MLP-Mixer as a Wide and Sparse MLP

Multi-layer perceptron (MLP) is a fundamental component of deep learning...

Please sign up or login with your details

Forgot password? Click here to reset