Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation

09/30/2020 ∙ by Tanner Fiez, et al. ∙ 0

We study the role that a finite timescale separation parameter τ has on gradient descent-ascent in two-player non-convex, non-concave zero-sum games where the learning rate of player 1 is denoted by γ_1 and the learning rate of player 2 is defined to be γ_2=τγ_1. Existing work analyzing the role of timescale separation in gradient descent-ascent has primarily focused on the edge cases of players sharing a learning rate (τ =1) and the maximizing player approximately converging between each update of the minimizing player (τ→∞). For the parameter choice of τ=1, it is known that the learning dynamics are not guaranteed to converge to a game-theoretically meaningful equilibria in general. In contrast, Jin et al. (2020) showed that the stable critical points of gradient descent-ascent coincide with the set of strict local minmax equilibria as τ→∞. In this work, we bridge the gap between past work by showing there exists a finite timescale separation parameter τ^∗ such that x^∗ is a stable critical point of gradient descent-ascent for all τ∈ (τ^∗, ∞) if and only if it is a strict local minmax equilibrium. Moreover, we provide an explicit construction for computing τ^∗ along with corresponding convergence rates and results under deterministic and stochastic gradient feedback. The convergence results we present are complemented by a non-convergence result: given a critical point x^∗ that is not a strict local minmax equilibrium, then there exists a finite timescale separation τ_0 such that x^∗ is unstable for all τ∈ (τ_0, ∞). Finally, we empirically demonstrate on the CIFAR-10 and CelebA datasets the significant impact timescale separation has on training performance.



There are no comments yet.


page 27

page 34

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.