Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

03/01/2021
by   Eran Malach, et al.
0

We study the relative power of learning with gradient descent on differentiable models, such as neural networks, versus using the corresponding tangent kernels. We show that under certain conditions, gradient descent achieves small error only if a related tangent kernel method achieves a non-trivial advantage over random guessing (a.k.a. weak learning), though this advantage might be very small even when gradient descent can achieve arbitrarily high accuracy. Complementing this, we show that without these conditions, gradient descent can in fact learn with small error even when no kernel method, in particular using the tangent kernel, can achieve a non-trivial advantage over random guessing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2019

Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks

Recent work has revealed that overparameterized networks trained by grad...
research
06/28/2023

Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time

Despite recent theoretical progress on the non-convex optimization of tw...
research
12/03/2017

Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima

We consider the problem of learning a one-hidden-layer neural network wi...
research
07/26/2023

Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum

Wide neural networks are biased towards learning certain functions, infl...
research
08/01/2023

An Exact Kernel Equivalence for Finite Classification Models

We explore the equivalence between neural networks and kernel methods by...
research
03/30/2017

Diving into the shallows: a computational perspective on large-scale shallow learning

In this paper we first identify a basic limitation in gradient descent-b...
research
02/22/2021

Kernel quadrature by applying a point-wise gradient descent method to discrete energies

We propose a method for generating nodes for kernel quadrature by a poin...

Please sign up or login with your details

Forgot password? Click here to reset