A global analysis of global optimisation

10/10/2022
by   Lachlan Ewen MacDonald, et al.
11

Theoretical understanding of the training of deep neural networks has made great strides in recent years. In particular, it has been shown that sufficient width and sufficiently small learning rate suffice to guarantee that chain networks trained with the square cost converge to global minima close to initialisation. However, this theory cannot apply to the cross-entropy cost, whose global minima exit only at infinity. In this paper, we introduce a general theoretical framework, designed for the study of optimisation, that encompasses ubiquitous architectural choices including batch normalisation, weight normalisation and skip connections. We use our framework to conduct a global analysis of the curvature and regularity properties of neural network loss landscapes, and give two applications. First, we give the first proof that a class of deep neural networks can be trained using gradient descent to global optima even when such optima only exist at infinity. Second, we use the theory in an empirical analysis of the effect of residual connections on training speed, which we verify with ResNets on MNIST, CIFAR10 and CIFAR100.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2019

Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks

The skip-connections used in residual networks have become a standard ar...
research
08/01/2022

Improving the Trainability of Deep Neural Networks through Layerwise Batch-Entropy Regularization

Training deep neural networks is a very demanding task, especially chall...
research
06/07/2017

Are Saddles Good Enough for Deep Learning?

Recent years have seen a growing interest in understanding deep neural n...
research
03/17/2019

Training Over-parameterized Deep ResNet Is almost as Easy as Training a Two-layer Network

It has been proved that gradient descent converges linearly to the globa...
research
06/10/2020

Is the Skip Connection Provable to Reform the Neural Network Loss Landscape?

The residual network is now one of the most effective structures in deep...
research
05/13/2022

Convergence Analysis of Deep Residual Networks

Various powerful deep neural network architectures have made great contr...
research
11/17/2016

Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks

Training deep neural networks for solving machine learning problems is o...

Please sign up or login with your details

Forgot password? Click here to reset