Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function

07/19/2021
by   Dong-Young Lim, et al.
0

We consider non-convex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a non-asymptotic analysis for the tamed unadjusted stochastic Langevin algorithm (TUSLA) introduced in Lovas et al. (2021). In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wasserstein-1 and Wasserstein-2 distances. The latter result enables us to further derive non-asymptotic estimates for the expected excess risk. To illustrate the applicability of the main results, we consider an example from transfer learning with ReLU neural networks, which represents a key paradigm in machine learning. Numerical experiments are presented for the aforementioned example which supports our theoretical findings. Hence, in this setting, we demonstrate both theoretically and numerically that the TUSLA algorithm can solve the optimization problem involving neural networks with ReLU activation function. Besides, we provide simulation results for synthetic examples where popular algorithms, e.g. ADAM, AMSGrad, RMSProp, and (vanilla) SGD, may fail to find the minimizer of the objective functions due to the super-linear growth and the discontinuity of the corresponding stochastic gradient, while the TUSLA algorithm converges rapidly to the optimal solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2022

Langevin dynamics based algorithm e-THεO POULA for stochastic optimization problems with discontinuous stochastic gradient

We introduce a new Langevin dynamics based algorithm, called e-THεO POUL...
research
07/02/2020

A fully data-driven approach to minimizing CVaR for portfolio of assets via SGLD with discontinuous updating

A new approach in stochastic optimization via the use of stochastic grad...
research
07/06/2022

Non-asymptotic convergence bounds for modified tamed unadjusted Langevin algorithm in non-convex setting

We consider the problem of sampling from a high-dimensional target distr...
research
06/25/2020

Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms

Artificial neural networks (ANNs) are typically highly nonlinear systems...
research
02/26/2020

Non-Asymptotic Bounds for Zeroth-Order Stochastic Optimization

We consider the problem of optimizing an objective function with and wit...
research
05/23/2019

How degenerate is the parametrization of neural networks with the ReLU activation function?

Neural network training is usually accomplished by solving a non-convex ...
research
05/13/2023

Successive Affine Learning for Deep Neural Networks

This paper introduces a successive affine learning (SAL) model for const...

Please sign up or login with your details

Forgot password? Click here to reset