On the symmetries in the dynamics of wide two-layer neural networks

11/16/2022
by   Karl Hajjar, et al.
0

We consider the idealized setting of gradient flow on the population risk for infinitely wide two-layer ReLU neural networks (without bias), and study the effect of symmetries on the learned parameters and predictors. We first describe a general class of symmetries which, when satisfied by the target function f^* and the input distribution, are preserved by the dynamics. We then study more specific cases. When f^* is odd, we show that the dynamics of the predictor reduces to that of a (non-linearly parameterized) linear predictor, and its exponential convergence can be guaranteed. When f^* has a low-dimensional structure, we prove that the gradient flow PDE reduces to a lower-dimensional PDE. Furthermore, we present informal and numerical arguments that suggest that the input neurons align with the lower-dimensional structure of the problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2022

Infinite-width limit of deep linear neural networks

This paper studies the infinite-width limit of deep linear neural networ...
research
02/28/2023

Learning time-scales in two-layers neural networks

Gradient-based learning in multi-layer neural networks displays a number...
research
09/27/2022

Magnitude and Angle Dynamics in Training Single ReLU Neurons

To understand learning the dynamics of deep ReLU networks, we investigat...
research
07/14/2020

Global Convergence of Second-order Dynamics in Two-layer Neural Networks

Recent results have shown that for two-layer fully connected neural netw...
research
06/18/2019

Gradient Dynamics of Shallow Univariate ReLU Networks

We present a theoretical and empirical study of the gradient dynamics of...
research
06/08/2022

On Gradient Descent Convergence beyond the Edge of Stability

Gradient Descent (GD) is a powerful workhorse of modern machine learning...
research
01/20/2020

Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective

It is known that any target function is realized in a sufficiently small...

Please sign up or login with your details

Forgot password? Click here to reset