Stronger Convergence Results for Deep Residual Networks: Network Width Scales Linearly with Training Data Size

11/11/2019
by   Talha Cihad Gulcu, et al.
0

Deep neural networks are highly expressive machine learning models with the ability to interpolate arbitrary datasets. Deep nets are typically optimized via first-order methods and the optimization process crucially depends on the characteristics of the network as well as the dataset. This work sheds light on the relation between the network size and the properties of the dataset with an emphasis on deep residual networks (ResNets). Our contribution is that if the network Jacobian is full rank, gradient descent for the quadratic loss and smooth activation converges to the global minima even if the network width m of the ResNet scales linearly with the sample size n, and independently from the network depth. To the best of our knowledge, this is the first work which provides a theoretical guarantee for the convergence of neural networks in the m=Ω(n) regime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2019

Training Over-parameterized Deep ResNet Is almost as Easy as Training a Two-layer Network

It has been proved that gradient descent converges linearly to the globa...
research
06/11/2019

An Improved Analysis of Training Over-parameterized Deep Neural Networks

A recent line of research has shown that gradient-based algorithms with ...
research
05/30/2021

Overparameterization of deep ResNet: zero loss and mean-field analysis

Finding parameters in a deep neural network (NN) that fit training data ...
research
10/21/2022

When Expressivity Meets Trainability: Fewer than n Neurons Can Work

Modern neural networks are often quite wide, causing large memory and co...
research
05/13/2022

Convergence Analysis of Deep Residual Networks

Various powerful deep neural network architectures have made great contr...
research
02/02/2023

Dataset Distillation Fixes Dataset Reconstruction Attacks

Modern deep learning requires large volumes of data, which could contain...
research
09/29/2022

Restricted Strong Convexity of Deep Learning Models with Smooth Activations

We consider the problem of optimization of deep learning models with smo...

Please sign up or login with your details

Forgot password? Click here to reset