On the Power of Shallow Learning

06/06/2021
by   James B. Simon, et al.
0

A deluge of recent work has explored equivalences between wide neural networks and kernel methods. A central theme is that one can analytically find the kernel corresponding to a given wide network architecture, but despite major implications for architecture design, no work to date has asked the converse question: given a kernel, can one find a network that realizes it? We affirmatively answer this question for fully-connected architectures, completely characterizing the space of achievable kernels. Furthermore, we give a surprising constructive proof that any kernel of any wide, deep, fully-connected net can also be achieved with a network with just one hidden layer and a specially-designed pointwise activation function. We experimentally verify our construction and demonstrate that, by just choosing the activation function, we can design a wide shallow network that mimics the generalization performance of any wide, deep, fully-connected network.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/29/2019

On the rate of convergence of fully connected very deep neural network regression estimates

Recent results in nonparametric regression show that deep learning, i.e....
09/16/2020

Activation Functions: Do They Represent A Trade-Off Between Modular Nature of Neural Networks And Task Performance

Current research suggests that the key factors in designing neural netwo...
09/30/2020

Deep Equals Shallow for ReLU Networks in Kernel Regimes

Deep networks are often considered to be more expressive than shallow on...
04/26/2017

The loss surface of deep and wide neural networks

While the optimization problem behind deep neural networks is highly non...
04/07/2021

Spectral Analysis of the Neural Tangent Kernel for Deep Residual Networks

Deep residual network architectures have been shown to achieve superior ...
04/26/2019

On Exact Computation with an Infinitely Wide Neural Net

How well does a classic deep net architecture like AlexNet or VGG19 clas...
02/28/2017

On architectural choices in deep learning: From network structure to gradient convergence and parameter estimation

We study mechanisms to characterize how the asymptotic convergence of ba...