Gradient is All You Need?

06/16/2023
by   Konstantin Riedl, et al.
0

In this paper we provide a novel analytical perspective on the theoretical understanding of gradient-based learning algorithms by interpreting consensus-based optimization (CBO), a recently proposed multi-particle derivative-free optimization method, as a stochastic relaxation of gradient descent. Remarkably, we observe that through communication of the particles, CBO exhibits a stochastic gradient descent (SGD)-like behavior despite solely relying on evaluations of the objective function. The fundamental value of such link between CBO and SGD lies in the fact that CBO is provably globally convergent to global minimizers for ample classes of nonsmooth and nonconvex objective functions, hence, on the one side, offering a novel explanation for the success of stochastic relaxations of gradient descent. On the other side, contrary to the conventional wisdom for which zero-order methods ought to be inefficient or not to possess generalization abilities, our results unveil an intrinsic gradient descent nature of such heuristics. This viewpoint furthermore complements previous insights into the working principles of CBO, which describe the dynamics in the mean-field limit through a nonlinear nonlocal partial differential equation that allows to alleviate complexities of the nonconvex function landscape. Our proofs leverage a completely nonsmooth analysis, which combines a novel quantitative version of the Laplace principle (log-sum-exp trick) and the minimizing movement scheme (proximal iteration). In doing so, we furnish useful and precise insights that explain how stochastic perturbations of gradient descent overcome energy barriers and reach deep levels of nonconvex functions. Instructive numerical illustrations support the provided theoretical insights.

READ FULL TEXT
research
10/04/2019

The Complexity of Finding Stationary Points with Stochastic Gradient Descent

We study the iteration complexity of stochastic gradient descent (SGD) f...
research
02/03/2019

Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

Stochastic gradient descent (SGD) is a popular and efficient method with...
research
03/28/2021

Consensus-based optimization methods converge globally in mean-field law

In this paper we study consensus-based optimization (CBO), which is a mu...
research
04/15/2020

On Learning Rates and Schrödinger Operators

The learning rate is perhaps the single most important parameter in the ...
research
09/15/2021

Self-learn to Explain Siamese Networks Robustly

Learning to compare two objects are essential in applications, such as d...
research
11/15/2021

Convergence of Anisotropic Consensus-Based Optimization in Mean-Field Law

In this paper we study anisotropic consensus-based optimization (CBO), a...
research
03/08/2018

Fast Convergence for Stochastic and Distributed Gradient Descent in the Interpolation Limit

Modern supervised learning techniques, particularly those using so calle...

Please sign up or login with your details

Forgot password? Click here to reset