On the Sublinear Convergence of Randomly Perturbed Alternating Gradient Descent to Second Order Stationary Solutions

02/28/2018
by   Songtao Lu, et al.
0

The alternating gradient descent (AGD) is a simple but popular algorithm which has been applied to problems in optimization, machine learning, data ming, and signal processing, etc. The algorithm updates two blocks of variables in an alternating manner, in which a gradient step is taken on one block, while keeping the remaining block fixed. When the objective function is nonconvex, it is well-known the AGD converges to the first-order stationary solution with a global sublinear rate. In this paper, we show that a variant of AGD-type algorithms will not be trapped by "bad" stationary solutions such as saddle points and local maximum points. In particular, we consider a smooth unconstrained optimization problem, and propose a perturbed AGD (PA-GD) which converges (with high probability) to the set of second-order stationary solutions (SS2) with a global sublinear rate. To the best of our knowledge, this is the first alternating type algorithm which takes O(polylog(d)/ϵ^7/3) iterations to achieve SS2 with high probability [where polylog(d) is polynomial of the logarithm of dimension d of the problem].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2018

Second-order Guarantees of Distributed Gradient Algorithms

We consider distributed smooth nonconvex unconstrained optimization over...
research
03/02/2017

How to Escape Saddle Points Efficiently

This paper shows that a perturbed form of gradient descent converges to ...
research
03/18/2020

Block Layer Decomposition schemes for training Deep Neural Networks

Deep Feedforward Neural Networks' (DFNNs) weights estimation relies on t...
research
06/04/2023

A Generalized Alternating Method for Bilevel Optimization under the Polyak-Łojasiewicz Condition

Bilevel optimization has recently regained interest owing to its applica...
research
05/29/2021

A Stochastic Alternating Balance k-Means Algorithm for Fair Clustering

In the application of data clustering to human-centric decision-making s...
research
05/29/2018

K-Beam Subgradient Descent for Minimax Optimization

Minimax optimization plays a key role in adversarial training of machine...

Please sign up or login with your details

Forgot password? Click here to reset