Escaping mediocrity: how two-layer networks learn hard single-index models with SGD

05/29/2023
by   Luca Arnaboldi, et al.
0

This study explores the sample complexity for two-layer neural networks to learn a single-index target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization. It is well-established that in this scenario n=O(dlogd) samples are typically needed. However, we provide precise results concerning the pre-factors in high-dimensional contexts and for varying widths. Notably, our findings suggest that overparameterization can only enhance convergence by a constant factor within this problem class. These insights are grounded in the reduction of SGD dynamics to a stochastic process in lower dimensions, where escaping mediocrity equates to calculating an exit time. Yet, we demonstrate that a deterministic approximation of this process adequately represents the escape time, implying that the role of stochasticity may be minimal in this scenario.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2022

Neural Networks Efficiently Learn Low-Dimensional Representations with SGD

We study the problem of training a two-layer neural network (NN) of arbi...
research
11/20/2019

Bayesian interpretation of SGD as Ito process

The current interpretation of stochastic gradient descent (SGD) as a sto...
research
11/12/2018

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Neural networks have great success in many machine learning applications...
research
08/17/2023

Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models

We analyze the dynamics of streaming stochastic gradient descent (SGD) i...
research
09/18/2022

Is Stochastic Gradient Descent Near Optimal?

The success of neural networks over the past decade has established them...
research
08/04/2022

Feature selection with gradient descent on two-layer networks in low-rotation regimes

This work establishes low test error of gradient flow (GF) and stochasti...
research
07/28/2023

On Single Index Models beyond Gaussian Data

Sparse high-dimensional functions have arisen as a rich framework to stu...

Please sign up or login with your details

Forgot password? Click here to reset