
The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies
We study the relationship between the speed at which a neural network le...
read it

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
A fundamental question in modern machine learning is how deep neural net...
read it

Rethink the Connections among Generalization, Memorization and the Spectral Bias of DNNs
Overparameterized deep neural networks (DNNs) with sufficient capacity ...
read it

The Effect of Network Width on the Performance of Largebatch Training
Distributed implementations of minibatch stochastic gradient descent (S...
read it

Practical Issues in the Synthesis of Ternary Sequences
Several issues related to the practical synthesis of ternary sequences w...
read it

On the exact computation of linear frequency principle dynamics and its generalization
Recent works show an intriguing phenomenon of Frequency Principle (FPri...
read it

Towards Understanding the Spectral Bias of Deep Learning
An intriguing phenomenon observed during training neural networks is the...
read it
Frequency Bias in Neural Networks for Input of NonUniform Density
Recent works have partly attributed the generalization ability of overparameterized neural networks to frequency bias – networks trained with gradient descent on data drawn from a uniform distribution find a low frequency fit before high frequency ones. As realistic training sets are not drawn from a uniform distribution, we here use the Neural Tangent Kernel (NTK) model to explore the effect of variable density on training dynamics. Our results, which combine analytic and empirical observations, show that when learning a pure harmonic function of frequency κ, convergence at a point ∈^d1 occurs in time O(κ^d/p()) where p() denotes the local density at . Specifically, for data in ^1 we analytically derive the eigenfunctions of the kernel associated with the NTK for twolayer networks. We further prove convergence results for deep, fully connected networks with respect to the spectral decomposition of the NTK. Our empirical study highlights similarities and differences between deep and shallow networks in this model.
READ FULL TEXT
Comments
There are no comments yet.