Adaptive Strategies in Non-convex Optimization

06/17/2023
by   Zhenxun Zhuang, et al.
0

An algorithm is said to be adaptive to a certain parameter (of the problem) if it does not need a priori knowledge of such a parameter but performs competitively to those that know it. This dissertation presents our work on adaptive algorithms in following scenarios: 1. In the stochastic optimization setting, we only receive stochastic gradients and the level of noise in evaluating them greatly affects the convergence rate. Tuning is typically required when without prior knowledge of the noise scale in order to achieve the optimal rate. Considering this, we designed and analyzed noise-adaptive algorithms that can automatically ensure (near)-optimal rates under different noise scales without knowing it. 2. In training deep neural networks, the scales of gradient magnitudes in each coordinate can scatter across a very wide range unless normalization techniques, like BatchNorm, are employed. In such situations, algorithms not addressing this problem of gradient scales can behave very poorly. To mitigate this, we formally established the advantage of scale-free algorithms that adapt to the gradient scales and presented its real benefits in empirical experiments. 3. Traditional analyses in non-convex optimization typically rely on the smoothness assumption. Yet, this condition does not capture the properties of some deep learning objective functions, including the ones involving Long Short-Term Memory networks and Transformers. Instead, they satisfy a much more relaxed condition, with potentially unbounded smoothness. Under this condition, we show that a generalized SignSGD algorithm can theoretically match the best-known convergence rates obtained by SGD with gradient clipping but does not need explicit clipping at all, and it can empirically match the performance of Adam and beat others. Moreover, it can also be made to automatically adapt to the unknown relaxed smoothness.

READ FULL TEXT
research
06/02/2023

Convex and Non-Convex Optimization under Generalized Smoothness

Classical analysis of convex and non-convex optimization methods often r...
research
03/02/2023

Variance-reduced Clipping for Non-convex Optimization

Gradient clipping is a standard training technique used in deep learning...
research
05/21/2018

On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

Stochastic gradient descent is the method of choice for large scale opti...
research
01/25/2019

Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization

Stochastic Gradient Descent (SGD) has played a central role in machine l...
research
02/13/2023

Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD

This work considers the problem of finding a first-order stationary poin...
research
09/29/2022

META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

We study the application of variance reduction (VR) techniques to genera...
research
05/30/2017

Online to Offline Conversions, Universality and Adaptive Minibatch Sizes

We present an approach towards convex optimization that relies on a nove...

Please sign up or login with your details

Forgot password? Click here to reset