SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization

10/12/2022
by   Hanseul Cho, et al.
0

Stochastic gradient descent-ascent (SGDA) is one of the main workhorses for solving finite-sum minimax optimization problems. Most practical implementations of SGDA randomly reshuffle components and sequentially use them (i.e., without-replacement sampling); however, there are few theoretical results on this approach for minimax algorithms, especially outside the easier-to-analyze (strongly-)monotone setups. To narrow this gap, we study the convergence bounds of SGDA with random reshuffling (SGDA-RR) for smooth nonconvex-nonconcave objectives with Polyak-Łojasiewicz (PŁ) geometry. We analyze both simultaneous and alternating SGDA-RR for nonconvex-PŁ and primal-PŁ-PŁ objectives, and obtain convergence rates faster than with-replacement SGDA. Our rates also extend to mini-batch SGDA-RR, recovering known rates for full-batch gradient descent-ascent (GDA). Lastly, we present a comprehensive lower bound for two-time-scale GDA, which matches the full-batch rate for primal-PŁ-PŁ case.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2022

Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization

We analyze the convergence rates of stochastic gradient algorithms for s...
research
02/09/2021

Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry

The gradient descent-ascent (GDA) algorithm has been widely applied to s...
research
03/07/2023

Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax Optimization

In the paper, we study a class of nonconvex nonconcave minimax optimizat...
research
07/15/2020

Incremental Without Replacement Sampling in Nonconvex Optimization

Minibatch decomposition methods for empirical risk minimization are comm...
research
12/10/2021

Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity

Gradient descent ascent (GDA), the simplest single-loop algorithm for no...
research
06/09/2022

What is a Good Metric to Study Generalization of Minimax Learners?

Minimax optimization has served as the backbone of many machine learning...
research
06/12/2020

SGD with shuffling: optimal rates without component convexity and large epoch requirements

We study without-replacement SGD for solving finite-sum optimization pro...

Please sign up or login with your details

Forgot password? Click here to reset