Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning

02/02/2023
by   François Caron, et al.
0

We consider the optimisation of large and shallow neural networks via gradient flow, where the output of each hidden node is scaled by some positive parameter. We focus on the case where the node scalings are non-identical, differing from the classical Neural Tangent Kernel (NTK) parameterisation. We prove that, for large neural networks, with high probability, gradient flow converges to a global minimum AND can learn features, unlike in the NTK regime. We also provide experiments on synthetic and real-world datasets illustrating our theoretical results and showing the benefit of such scaling in terms of pruning and transfer learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2022

On Feature Learning in Neural Networks with Global Convergence Guarantees

We study the optimization of wide neural networks (NNs) via gradient flo...
research
04/19/2023

Leveraging the two timescale regime to demonstrate convergence of neural networks

We study the training dynamics of shallow neural networks, in a two-time...
research
04/15/2021

On Energy-Based Models with Overparametrized Shallow Neural Networks

Energy-based models (EBMs) are a simple yet powerful framework for gener...
research
06/03/2022

A Theoretical Analysis on Feature Learning in Neural Networks: Emergence from Inputs and Advantage over Fixed Features

An important characteristic of neural networks is their ability to learn...
research
10/12/2019

Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

We study the convergence of gradient flows related to learning deep line...
research
09/17/2021

AdaLoss: A computationally-efficient and provably convergent adaptive gradient method

We propose a computationally-friendly adaptive learning rate schedule, "...
research
05/20/2019

Optimisation of Overparametrized Sum-Product Networks

It seems to be a pearl of conventional wisdom that parameter learning in...

Please sign up or login with your details

Forgot password? Click here to reset