Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions

08/20/2018
by   Zaiyi Chen, et al.
0

Although stochastic gradient descent () method and its variants (e.g., stochastic momentum methods, ) are the choice of algorithms for solving non-convex problems (especially deep learning), there still remain big gaps between the theory and the practice with many questions unresolved. For example, there is still a lack of theories of convergence for that uses stagewise step size and returns an averaged solution. In addition, theoretical insights of why adaptive step size of could improve non-adaptive step size of is still missing for non-convex optimization. This paper aims to address these questions and fill the gap between theory and practice. We propose a universal stagewise optimization framework for a broad family of non-smooth non-convex problems with the following key features: (i) each stage calls a basic algorithm (e.g., or ) for a regularized convex problem that returns an averaged solution; (ii) the step size is decreased in a stagewise manner; (iii) an averaged solution is returned as the final solution that is selected from all stagewise averaged solutions with sampling probabilities increasing as the stage number. Our theoretical results of stagewise exhibit its adaptive convergence, therefore shed insights on its faster convergence for problems with sparse stochastic gradients than stagewise . To the best of our knowledge, these new results are the first of their kind for addressing the unresolved issues of existing theories mentioned earlier.

READ FULL TEXT
research
02/18/2021

On the Convergence of Step Decay Step-Size for Stochastic Optimization

The convergence of stochastic gradient descent is highly dependent on th...
research
04/12/2022

An Adaptive Time Stepping Scheme for Rate-Independent Systems with Non-Convex Energy

We investigate a local incremental stationary scheme for the numerical s...
research
08/28/2019

Linear Convergence of Adaptive Stochastic Gradient Descent

We prove that the norm version of the adaptive stochastic gradient metho...
research
07/31/2022

Formal guarantees for heuristic optimization algorithms used in machine learning

Recently, Stochastic Gradient Descent (SGD) and its variants have become...
research
09/15/2023

Convergence of ADAM with Constant Step Size in Non-Convex Settings: A Simple Proof

In neural network training, RMSProp and ADAM remain widely favoured opti...
research
05/24/2022

Weak Convergence of Approximate reflection coupling and its Application to Non-convex Optimization

In this paper, we propose a weak approximation of the reflection couplin...
research
11/17/2017

Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein Step Size

Learning representation from relative similarity comparisons, often call...

Please sign up or login with your details

Forgot password? Click here to reset