SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

02/21/2023
by   Emmanuel Abbe, et al.
0

We investigate the time complexity of SGD learning on fully-connected neural networks with isotropic data. We put forward a complexity measure – the leap – which measures how "hierarchical" target functions are. For d-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function f with low-dimensional support is Θ̃(d^max(Leap(f),2)). We prove a version of this conjecture for a class of functions on Gaussian isotropic data and 2-layer neural networks, under additional technical assumptions on how SGD is run. We show that the training sequentially learns the function support with a saddle-to-saddle dynamic. Our result departs from [Abbe et al. 2022] by going beyond leap 1 (merged-staircase functions), and by going beyond the mean-field and gradient flow approximations that prohibit the full complexity control obtained here. Finally, we note that this gives an SGD complexity for the full training trajectory that matches that of Correlational Statistical Query (CSQ) lower-bounds.

READ FULL TEXT

page 19

page 22

page 23

research
01/13/2020

Backward Feature Correction: How Deep Learning Performs Deep Learning

How does a 110-layer ResNet learn a high-complexity classifier using rel...
research
08/05/2022

On the non-universality of deep learning: quantifying the cost of symmetry

We prove computational limitations for learning with neural networks tra...
research
12/30/2020

SGD Distributional Dynamics of Three Layer Neural Networks

With the rise of big data analytics, multi-layer neural networks have su...
research
05/28/2019

SGD on Neural Networks Learns Functions of Increasing Complexity

We perform an experimental study of the dynamics of Stochastic Gradient ...
research
04/10/2023

(Almost) Ruling Out SETH Lower Bounds for All-Pairs Max-Flow

The All-Pairs Max-Flow problem has gained significant popularity in the ...
research
11/10/2021

SGD Through the Lens of Kolmogorov Complexity

We prove that stochastic gradient descent (SGD) finds a solution that ac...

Please sign up or login with your details

Forgot password? Click here to reset