
Competitive Gradient Descent
We introduce a new algorithm for the numerical computation of Nash equil...
read it

Solving MinMax Optimization with Hidden Structure via Gradient Descent Ascent
Many recent AI architectures are inspired by zerosum games, however, th...
read it

Convergence of Learning Dynamics in Stackelberg Games
This paper investigates the convergence of learning dynamics in Stackelb...
read it

On Solving Minimax Optimization Locally: A FollowtheRidge Approach
Many tasks in modern machine learning can be formulated as finding equil...
read it

Asymptotic behaviour of learning rates in Armijo's condition
Fix a constant 0<α <1. For a C^1 function f:ℝ^k→ℝ, a point x and a posit...
read it

Linear Lastiterate Convergence for Matrix Games and Stochastic Games
Optimistic Gradient Descent Ascent (OGDA) algorithm for saddlepoint opt...
read it

A Provably Convergent and Practical Algorithm for Minmax Optimization with Applications to GANs
We present a new algorithm for optimizing minmax loss functions that ar...
read it
Gradient DescentAscent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation
We study the role that a finite timescale separation parameter τ has on gradient descentascent in twoplayer nonconvex, nonconcave zerosum games where the learning rate of player 1 is denoted by γ_1 and the learning rate of player 2 is defined to be γ_2=τγ_1. Existing work analyzing the role of timescale separation in gradient descentascent has primarily focused on the edge cases of players sharing a learning rate (τ =1) and the maximizing player approximately converging between each update of the minimizing player (τ→∞). For the parameter choice of τ=1, it is known that the learning dynamics are not guaranteed to converge to a gametheoretically meaningful equilibria in general. In contrast, Jin et al. (2020) showed that the stable critical points of gradient descentascent coincide with the set of strict local minmax equilibria as τ→∞. In this work, we bridge the gap between past work by showing there exists a finite timescale separation parameter τ^∗ such that x^∗ is a stable critical point of gradient descentascent for all τ∈ (τ^∗, ∞) if and only if it is a strict local minmax equilibrium. Moreover, we provide an explicit construction for computing τ^∗ along with corresponding convergence rates and results under deterministic and stochastic gradient feedback. The convergence results we present are complemented by a nonconvergence result: given a critical point x^∗ that is not a strict local minmax equilibrium, then there exists a finite timescale separation τ_0 such that x^∗ is unstable for all τ∈ (τ_0, ∞). Finally, we empirically demonstrate on the CIFAR10 and CelebA datasets the significant impact timescale separation has on training performance.
READ FULL TEXT
Comments
There are no comments yet.