On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

12/17/2021
by   Arnulf Jentzen, et al.
0

In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learning problem is piecewise polynomial, under the assumption that the target function (describing the relationship between input data and the output data) is piecewise polynomial, and under the assumption that the risk function of the considered supervised learning problem admits at least one regular global minimum. In addition, in the special situation of shallow ANNs with just one hidden layer and one-dimensional input we also verify this assumption by proving in the training of such shallow ANNs that for every Lipschitz continuous target function there exists a global minimum in the risk landscape. Finally, in the training of deep ANNs with ReLU activation we also study solutions of gradient flow (GF) differential equations and we prove that every non-divergent GF trajectory converges with a polynomial rate of convergence to a critical point (in the sense of limiting Fréchet subdifferentiability). Our mathematical convergence analysis builds up on tools from real algebraic geometry such as the concept of semi-algebraic functions and generalized Kurdyka-Lojasiewicz inequalities, on tools from functional analysis such as the Arzelà-Ascoli theorem, on tools from nonsmooth analysis such as the concept of limiting Fréchet subgradients, as well as on the fact that the set of realization functions of shallow ReLU ANNs with fixed architecture forms a closed subset of the set of continuous functions revealed by Petersen et al.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2021

Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation

The training of artificial neural networks (ANNs) with rectified linear ...
research
06/29/2023

Sampling weights of deep neural networks

We introduce a probability distribution, combined with an efficient samp...
research
02/28/2023

On the existence of minimizers in shallow residual ReLU neural network optimization landscapes

Many mathematical convergence results for gradient descent (GD) based al...
research
07/09/2021

Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

Gradient descent (GD) type optimization schemes are the standard methods...
research
07/13/2022

Normalized gradient flow optimization in the training of ReLU artificial neural networks

The training of artificial neural networks (ANNs) is nowadays a highly r...
research
12/05/2021

On the Convergence of Shallow Neural Network Training with Randomly Masked Neurons

Given a dense shallow neural network, we focus on iteratively creating, ...

Please sign up or login with your details

Forgot password? Click here to reset