Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

07/09/2020
by   Yuanzhi Li, et al.
8

We consider the dynamic of gradient descent for learning a two-layer neural network. We assume the input x∈ℝ^d is drawn from a Gaussian distribution and the label of x satisfies f^⋆(x) = a^⊤|W^⋆x|, where a∈ℝ^d is a nonnegative vector and W^⋆∈ℝ^d× d is an orthonormal matrix. We show that an over-parametrized two-layer neural network with ReLU activation, trained by gradient descent from random initialization, can provably learn the ground truth network with population loss at most o(1/d) in polynomial time with polynomial samples. On the other hand, we prove that any kernel method, including Neural Tangent Kernel, with a polynomial number of samples in d, has population loss at least Ω(1 / d).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2018

Learning One-hidden-layer ReLU Networks via Gradient Descent

We study the problem of learning one-hidden-layer neural networks with R...
research
06/22/2020

Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent

We prove the first superpolynomial lower bounds for learning one-layer n...
research
11/01/2017

Learning One-hidden-layer Neural Networks with Landscape Design

We consider the problem of learning a one-hidden-layer neural network: w...
research
12/24/2020

Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms

We describe the convex semi-infinite dual of the two-layer vector-output...
research
02/16/2018

Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks

We analyze algorithms for approximating a function f(x) = Φ x mapping ^d...
research
11/04/2019

Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

We consider the problem of computing the best-fitting ReLU with respect ...
research
06/14/2020

Global Convergence of Sobolev Training for Overparametrized Neural Networks

Sobolev loss is used when training a network to approximate the values a...

Please sign up or login with your details

Forgot password? Click here to reset