Linearized two-layers neural networks in high dimension

04/27/2019
by   Behrooz Ghorbani, et al.
0

We consider the problem of learning an unknown function f_ on the d-dimensional sphere with respect to the square loss, given i.i.d. samples {(y_i, x_i)}_i< n where x_i is a feature vector uniformly distributed on the sphere and y_i=f_( x_i). We study two popular classes of models that can be regarded as linearizations of two-layers neural networks around a random initialization: (RF) The random feature model of Rahimi-Recht; (NT) The neural tangent kernel model of Jacot-Gabriel-Hongler. Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and hence enjoy universal approximation properties when the number of neurons N diverges, for a fixed dimension d. We prove that, if both d and N are large, the behavior of these models is instead remarkably simpler. If N = o(d^2), then RF performs no better than linear regression with respect to the raw features x_i, and NT performs no better than linear regression with respect to degree-one and two monomials in the x_i. More generally, if N= o(d^ℓ+1) then RF fits at most a degree-ℓ polynomial in the raw features, and NT fits at most a degree-(ℓ+1) polynomial.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2023

Six Lectures on Linearized Neural Networks

In these six lectures, we examine what can be learnt about the behavior ...
research
02/19/2020

Implicit Regularization of Random Feature Models

Random Feature (RF) models are used as efficient parametric approximatio...
research
02/01/2023

Optimal Learning of Deep Random Networks of Extensive-width

We consider the problem of learning a target function corresponding to a...
research
06/21/2019

Limitations of Lazy Training of Two-layers Neural Networks

We study the supervised learning problem under either of the following t...
research
08/16/2022

Polynomial kernel for immersion hitting in tournaments

For a fixed simple digraph H without isolated vertices, we consider the ...
research
05/12/2019

A New Look at an Old Problem: A Universal Learning Approach to Linear Regression

Linear regression is a classical paradigm in statistics. A new look at i...
research
09/21/2020

Generalized Leverage Score Sampling for Neural Networks

Leverage score sampling is a powerful technique that originates from the...

Please sign up or login with your details

Forgot password? Click here to reset