
Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)
Gradient descent yields zero training loss in polynomial time for deep n...
read it

Neural Tangent Kernel: Convergence and Generalization in Neural Networks
At initialization, artificial neural networks (ANNs) are equivalent to G...
read it

When and why PINNs fail to train: A neural tangent kernel perspective
Physicsinformed neural networks (PINNs) have lately received great atte...
read it

Residual Tangent Kernels
A recent body of work has focused on the theoretical study of neural net...
read it

Weighted Neural Tangent Kernel: A Generalized and Improved NetworkInduced Kernel
The Neural Tangent Kernel (NTK) has recently attracted intense study, as...
read it

Analysis of the Gradient Descent Algorithm for a Deep Neural Network Model with Skipconnections
The behavior of the gradient descent (GD) algorithm is analyzed for a de...
read it

Infinitedimensional Foldedintime Deep Neural Networks
The method recently introduced in arXiv:2011.10115 realizes a deep neura...
read it
Dynamics of Deep Neural Networks and Neural Tangent Hierarchy
The evolution of a deep neural network trained by the gradient descent can be described by its neural tangent kernel (NTK) as introduced in [20], where it was proven that in the infinite width limit the NTK converges to an explicit limiting kernel and it stays constant during training. The NTK was also implicit in some other recent papers [6,13,14]. In the overparametrization regime, a fullytrained deep neural network is indeed equivalent to the kernel regression predictor using the limiting NTK. And the gradient descent achieves zero training loss for a deep overparameterized neural network. However, it was observed in [5] that there is a performance gap between the kernel regression using the limiting NTK and the deep neural networks. This performance gap is likely to originate from the change of the NTK along training due to the finite width effect. The change of the NTK along the training is central to describe the generalization features of deep neural networks. In the current paper, we study the dynamic of the NTK for finite width deep fullyconnected neural networks. We derive an infinite hierarchy of ordinary differential equations, the neural tangent hierarchy (NTH) which captures the gradient descent dynamic of the deep neural network. Moreover, under certain conditions on the neural network width and the data set dimension, we prove that the truncated hierarchy of NTH approximates the dynamic of the NTK up to arbitrary precision. This description makes it possible to directly study the change of the NTK for deep neural networks, and sheds light on the observation that deep neural networks outperform kernel regressions using the corresponding limiting NTK.
READ FULL TEXT
Comments
There are no comments yet.