Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel Theory?

by   Mariia Seleznova, et al.

Neural Tangent Kernel (NTK) theory is widely used to study the dynamics of infinitely-wide deep neural networks (DNNs) under gradient descent. But do the results for infinitely-wide networks give us hints about the behaviour of real finite-width ones? In this paper we study empirically when NTK theory is valid in practice for fully-connected ReLu and sigmoid networks. We find out that whether a network is in the NTK regime depends on the hyperparameters of random initialization and network's depth. In particular, NTK theory does not explain behaviour of sufficiently deep networks initialized so that their gradients explode: the kernel is random at initialization and changes significantly during training, contrary to NTK theory. On the other hand, in case of vanishing gradients DNNs are in the NTK regime but become untrainable rapidly with depth. We also describe a framework to study generalization properties of DNNs by means of NTK theory and discuss its limits.



There are no comments yet.


page 5

page 7

page 8

page 9


Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

We study the training and generalization of deep neural networks (DNNs) ...

Learning Curves for Deep Neural Networks: A Gaussian Field Theory Perspective

A series of recent works suggest that deep neural networks (DNNs), of fi...

Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?

We give a rigorous analysis of the statistical behavior of gradients in ...

On the Empirical Neural Tangent Kernel of Standard Finite-Width Convolutional Neural Network Architectures

The Neural Tangent Kernel (NTK) is an important milestone in the ongoing...

Freeze and Chaos for DNNs: an NTK view of Batch Normalization, Checkerboard and Boundary Effects

In this paper, we analyze a number of architectural features of Deep Neu...

Deep Networks and the Multiple Manifold Problem

We study the multiple manifold problem, a binary classification task mod...

Deep Networks Provably Classify Data on Curves

Data with low-dimensional nonlinear structure are ubiquitous in engineer...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.