Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes

by   Roman Novak, et al.

There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables, for instance, test set predictions that would have resulted from a fully Bayesian, infinitely wide trained FCN to be computed without ever instantiating the FCN, but by instead evaluating the corresponding GP. In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible. Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs with and without weight sharing are identical. As a consequence, translation equivariance in finite-channel CNNs trained with stochastic gradient descent (SGD) has no corresponding property in the Bayesian treatment of the infinite channel limit - a qualitative difference between the two regimes that is not present in the FCN case. We confirm experimentally, that while in some scenarios the performance of SGD-trained finite CNNs approaches that of the corresponding GPs as the channel count increases, with careful tuning SGD-trained CNNs can significantly outperform their corresponding GPs, suggesting advantages from SGD training compared to fully Bayesian parameter estimation.


page 7

page 16

page 17


Deep Neural Networks as Gaussian Processes

A deep fully-connected neural network with an i.i.d. prior over its para...

Infinite attention: NNGP and NTK for deep attention networks

There is a growing amount of literature on the relationship between wide...

Calibrating Deep Convolutional Gaussian Processes

The wide adoption of Convolutional Neural Networks (CNNs) in application...

Interrelation of equivariant Gaussian processes and convolutional neural networks

Currently there exists rather promising new trend in machine leaning (ML...

Disentangling trainability and generalization in deep learning

A fundamental goal in deep learning is the characterization of trainabil...

Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias

Despite the phenomenal success of deep neural networks in a broad range ...

Infinite-channel deep stable convolutional neural networks

The interplay between infinite-width neural networks (NNs) and classes o...

Please sign up or login with your details

Forgot password? Click here to reset