
Deep Network Approximation for Smooth Functions
This paper establishes optimal approximation error characterization of d...
read it

The Future is LogGaussian: ResNets and Their InfiniteDepthandWidth Limit at Initialization
Theoretical results show that neural networks can be approximated by Gau...
read it

Duality of Width and Depth of Neural Networks
Here, we report that the depth and the width of a neural network are dua...
read it

VariancePreserving Initialization Schemes Improve Deep Network Training: But Which Variance is Preserved?
Before training a neural net, a classic rule of thumb is to randomly ini...
read it

Is deeper better? It depends on locality of relevant features
It has been recognized that a heavily overparameterized artificial neura...
read it

Residual Tangent Kernels
A recent body of work has focused on the theoretical study of neural net...
read it

Neural Optimization Kernel: Towards Robust Deep Learning
Recent studies show a close connection between neural networks (NN) and ...
read it
Finite Depth and Width Corrections to the Neural Tangent Kernel
We prove the precise scaling, at finite depth and width, for the mean and variance of the neural tangent kernel (NTK) in a randomly initialized ReLU network. The standard deviation is exponential in the ratio of network depth to width. Thus, even in the limit of infinite overparameterization, the NTK is not deterministic if depth and width simultaneously tend to infinity. Moreover, we prove that for such deep and wide networks, the NTK has a nontrivial evolution during training by showing that the mean of its first SGD update is also exponential in the ratio of network depth to width. This is sharp contrast to the regime where depth is fixed and network width is very large. Our results suggest that, unlike relatively shallow and wide networks, deep and wide ReLU networks are capable of learning datadependent features even in the socalled lazy training regime.
READ FULL TEXT
Comments
There are no comments yet.