The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective

06/11/2021
by   Geoff Pleiss, et al.
0

Large width limits have been a recent focus of deep learning research: modulo computational practicalities, do wider networks outperform narrower ones? Answering this question has been challenging, as conventional networks gain representational power with width, potentially masking any negative effects. Our analysis in this paper decouples capacity and width via the generalization of neural networks to Deep Gaussian Processes (Deep GP), a class of hierarchical models that subsume neural nets. In doing so, we aim to understand how width affects standard neural networks once they have sufficient capacity for a given modeling task. Our theoretical and empirical results on Deep GP suggest that large width is generally detrimental to hierarchical models. Surprisingly, we prove that even nonparametric Deep GP converge to Gaussian processes, effectively becoming shallower without any increase in representational power. The posterior, which corresponds to a mixture of data-adaptable basis functions, becomes less data-dependent with width. Our tail analysis demonstrates that width and depth have opposite effects: depth accentuates a model's non-Gaussianity, while width makes models increasingly Gaussian. We find there is a "sweet spot" that maximizes test set performance before the limiting GP behavior prevents adaptability, occurring at width = 1 or width = 2 for nonparametric Deep GP. These results make strong predictions about the same phenomenon in conventional neural networks: we show empirically that many neural network architectures need 10 - 500 hidden units for sufficient capacity - depending on the dataset - but further width degrades test performance.

READ FULL TEXT

page 5

page 19

research
11/29/2021

Dependence between Bayesian neural network units

The connection between Bayesian neural networks and Gaussian processes g...
research
01/03/2020

Wide Neural Networks with Bottlenecks are Deep Gaussian Processes

There is recently much work on the "wide limit" of neural networks, wher...
research
05/17/2022

Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility

This article studies the infinite-width limit of deep feedforward neural...
research
10/29/2020

Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth

A key factor in the success of deep neural networks is the ability to sc...
research
08/29/2021

Neural Network Gaussian Processes by Increasing Depth

Recent years have witnessed an increasing interest in the correspondence...
research
11/29/2019

Richer priors for infinitely wide multi-layer perceptrons

It is well-known that the distribution over functions induced through a ...
research
04/02/2020

Predicting the outputs of finite networks trained with noisy gradients

A recent line of studies has focused on the infinite width limit of deep...

Please sign up or login with your details

Forgot password? Click here to reset