Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks

05/21/2023
by   Mingze Wang, et al.
0

The training process of ReLU neural networks often exhibits complicated nonlinear phenomena. The nonlinearity of models and non-convexity of loss pose significant challenges for theoretical analysis. Therefore, most previous theoretical works on the optimization dynamics of neural networks focus either on local analysis (like the end of training) or approximate linear models (like Neural Tangent Kernel). In this work, we conduct a complete theoretical characterization of the training process of a two-layer ReLU network trained by Gradient Flow on a linearly separable data. In this specific setting, our analysis captures the whole optimization process starting from random initialization to final convergence. Despite the relatively simple model and data that we studied, we reveal four different phases from the whole training process showing a general simplifying-to-complicating learning trend. Specific nonlinear behaviors can also be precisely identified and captured theoretically, such as initial condensation, saddle-to-plateau dynamics, plateau escape, changes of activation patterns, learning with increasing complexity, etc.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2022

Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

The training of neural networks by gradient descent methods is a corners...
research
07/22/2021

Local SGD Optimizes Overparameterized Neural Networks in Polynomial Time

In this paper we prove that Local (S)GD (or FedAvg) can optimize two-lay...
research
02/28/2023

Learning time-scales in two-layers neural networks

Gradient-based learning in multi-layer neural networks displays a number...
research
12/29/2017

The Multilinear Structure of ReLU Networks

We study the loss surface of neural networks equipped with a hinge loss ...
research
05/16/2023

Deep ReLU Networks Have Surprisingly Simple Polytopes

A ReLU network is a piecewise linear function over polytopes. Figuring o...
research
06/02/2022

Understanding the Role of Nonlinearity in Training Dynamics of Contrastive Learning

While the empirical success of self-supervised learning (SSL) heavily re...
research
03/27/2020

On the Optimization Dynamics of Wide Hypernetworks

Recent results in the theoretical study of deep learning have shown that...

Please sign up or login with your details

Forgot password? Click here to reset