When Do Neural Networks Outperform Kernel Methods?

06/24/2020
by   Behrooz Ghorbani, et al.
93

For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothness classes than RKHS and we know of special examples for which SGD-trained NN provably outperform RKHS. This is true even in the wide network limit, for a different scaling of the initialization. How can we reconcile the above claims? For which tasks do NNs outperform RKHS? If feature vectors are nearly isotropic, RKHS methods suffer from the curse of dimensionality, while NNs can overcome it by learning the best low-dimensional representation. Here we show that this curse of dimensionality becomes milder if the feature vectors display the same low-dimensional structure as the target function, and we precisely characterize this tradeoff. Building on these results, we present a model that can capture in a unified framework both behaviors observed in earlier work. We hypothesize that such a latent low-dimensional structure is present in image classification. We test numerically this hypothesis by showing that specific perturbations of the training distribution degrade the performances of RKHS methods much more significantly than NNs.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 19

page 22

page 26

03/22/2021

Weighted Neural Tangent Kernel: A Generalized and Improved Network-Induced Kernel

The Neural Tangent Kernel (NTK) has recently attracted intense study, as...
06/04/2021

Learning Curves for SGD on Structured Features

The generalization performance of a machine learning algorithm such as a...
06/18/2020

Stochastic Gradient Descent in Hilbert Scales: Smoothness, Preconditioning and Earlier Stopping

Stochastic Gradient Descent (SGD) has become the method of choice for so...
11/10/2020

Neural Networks Optimally Compress the Sawbridge

Neural-network-based compressors have proven to be remarkably effective ...
06/22/2021

Pure Exploration in Kernel and Neural Bandits

We study pure exploration in bandits, where the dimension of the feature...

Code Repositories

linearized_neural_networks

The code for the paper "When do neural networks outperform kernel methods"


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.