
Enhanced Convolutional Neural Tangent Kernels
Recent research shows that for training with ℓ_2 loss, convolutional neu...
read it

Gradient Kernel Regression
In this article a surprising result is demonstrated using the neural tan...
read it

Exchangeability and Kernel Invariance in Trained MLPs
In the analysis of machine learning models, it is often convenient to as...
read it

Learning with invariances in random features and kernel models
A number of machine learning tasks entail a high degree of invariance: t...
read it

Disentangling trainability and generalization in deep learning
A fundamental goal in deep learning is the characterization of trainabil...
read it

Deep Networks with Adaptive Nyström Approximation
Recent work has focused on combining kernel methods and deep learning to...
read it

Compressing invariant manifolds in neural nets
We study how neural networks compress uninformative input space in model...
read it
Properties of the After Kernel
The Neural Tangent Kernel (NTK) is the widenetwork limit of a kernel defined using neural networks at initialization, whose embedding is the gradient of the output of the network with respect to its parameters. We study the "after kernel", which is defined using the same embedding, except after training, for neural networks with standard architectures, on binary classification problems extracted from MNIST and CIFAR10, trained using SGD in a standard way. For some datasetarchitecture pairs, after a few epochs of neural network training, a hardmargin SVM using the network's after kernel is much more accurate than when the network's initial kernel is used. For networks with an architecture similar to VGG, the after kernel is more "global", in the sense that it is less invariant to transformations of input images that disrupt the global structure of the image while leaving the local statistics largely intact. For fully connected networks, the after kernel is less global in this sense. The after kernel tends to be more invariant to small shifts, rotations and zooms; data augmentation does not improve these invariances. The (finite approximation to the) conjugate kernel, obtained using the last layer of hidden nodes, sometimes, but not always, provides a good approximation to the NTK and the after kernel.
READ FULL TEXT
Comments
There are no comments yet.