On Learning Over-parameterized Neural Networks: A Functional Approximation Prospective

05/26/2019
by   Lili Su, et al.
0

We consider training over-parameterized two-layer neural networks with Rectified Linear Unit (ReLU) using gradient descent (GD) method. Inspired by a recent line of work, we study the evolutions of the network prediction errors across GD iterations, which can be neatly described in a matrix form. It turns out that when the network is sufficiently over-parameterized, these matrices individually approximate an integral operator which is determined by the feature vector distribution ρ only. Consequently, GD method can be viewed as approximately apply the powers of this integral operator on the underlying/target function f^* that generates the responses/labels. We show that if f^* admits a low-rank approximation with respect to the eigenspaces of this integral operator, then, even with constant stepsize, the empirical risk decreases to this low-rank approximation error at a linear rate in iteration t. In addition, this linear rate is determined by f^* and ρ only. Furthermore, if f^* has zero low-rank approximation error, then Ω(n^2) network over-parameterization is enough, and the empirical risk decreases to Θ(1/√(n)). We provide an application of our general results to the setting where ρ is the uniform distribution on the spheres and f^* is a polynomial.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2022

Algorithms for Efficiently Learning Low-Rank Neural Networks

We study algorithms for learning low-rank neural networks – networks whe...
research
02/16/2020

A gradient system approach for Hankel structured low-rank approximation

Rank deficient Hankel matrices are at the core of several applications. ...
research
05/26/2023

Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks

Recently, significant progress has been made in understanding the genera...
research
06/16/2020

Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization

Recent advances have shown that implicit bias of gradient descent on ove...
research
07/07/2023

Point spread function approximation of high rank Hessians with locally supported non-negative integral kernels

We present an efficient matrix-free point spread function (PSF) method f...
research
04/12/2022

Adaptive cross approximation for Tikhonov regularization in general form

Many problems in Science and Engineering give rise to linear integral eq...
research
12/14/2021

Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

We consider the problem of training a multi-layer over-parametrized neur...

Please sign up or login with your details

Forgot password? Click here to reset