Escaping Saddle Points in Nonconvex Minimax Optimization via Cubic-Regularized Gradient Descent-Ascent

10/14/2021
by   Ziyi Chen, et al.
0

The gradient descent-ascent (GDA) algorithm has been widely applied to solve nonconvex minimax optimization problems. However, the existing GDA-type algorithms can only find first-order stationary points of the envelope function of nonconvex minimax optimization problems, which does not rule out the possibility to get stuck at suboptimal saddle points. In this paper, we develop Cubic-GDA – the first GDA-type algorithm for escaping strict saddle points in nonconvex-strongly-concave minimax optimization. Specifically, the algorithm uses gradient ascent to estimate the second-order information of the minimax objective function, and it leverages the cubic regularization technique to efficiently escape the strict saddle points. Under standard smoothness assumptions on the objective function, we show that Cubic-GDA admits an intrinsic potential function whose value monotonically decreases in the minimax optimization process. Such a property leads to a desired global convergence of Cubic-GDA to a second-order stationary point at a sublinear rate. Moreover, we analyze the convergence rate of Cubic-GDA in the full spectrum of a gradient dominant-type nonconvex geometry. Our result shows that Cubic-GDA achieves an orderwise faster convergence rate than the standard GDA for a wide spectrum of gradient dominant geometry. Our study bridges minimax optimization with second-order optimization and may inspire new developments along this direction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2021

Finding Second-Order Stationary Point for Nonconvex-Strongly-Concave Minimax Problem

We study the smooth minimax optimization problem of the form min_ xmax_ ...
research
02/09/2021

Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry

The gradient descent-ascent (GDA) algorithm has been widely applied to s...
research
08/22/2018

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

Cubic-regularized Newton's method (CR) is a popular algorithm that guara...
research
09/23/2018

Second-order Guarantees of Distributed Gradient Algorithms

We consider distributed smooth nonconvex unconstrained optimization over...
research
05/01/2020

A Dual-Dimer Method for Training Physics-Constrained Neural Networks with Minimax Architecture

Data sparsity is a common issue to train machine learning tools such as ...
research
02/21/2022

Semi-Implicit Hybrid Gradient Methods with Application to Adversarial Robustness

Adversarial examples, crafted by adding imperceptible perturbations to n...
research
02/15/2023

An abstract convergence framework with application to inertial inexact forward–backward methods

In this paper we introduce a novel abstract descent scheme suited for th...

Please sign up or login with your details

Forgot password? Click here to reset