Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

by   Mingrui Liu, et al.

Adaptive gradient algorithms perform gradient-based updates using the history of gradients and are ubiquitous in training deep neural networks. While adaptive gradient methods theory is well understood for minimization problems, the underlying factors driving their empirical success in min-max problems such as GANs remain unclear. In this paper, we aim at bridging this gap from both theoretical and empirical perspectives. First, we analyze a variant of Optimistic Stochastic Gradient (OSG) proposed in <cit.> for solving a class of non-convex non-concave min-max problem and establish O(ϵ^-4) complexity for finding ϵ-first-order stationary point, in which the algorithm only requires invoking one stochastic first-order oracle while enjoying state-of-the-art iteration complexity achieved by stochastic extragradient method by <cit.>. Then we propose an adaptive variant of OSG named Optimistic Adagrad (OAdagrad) and reveal an improved adaptive complexity O(ϵ^-2/1-α) [%s], where α characterizes the growth rate of the cumulative stochastic gradient and 0≤α≤ 1/2. To the best of our knowledge, this is the first work for establishing adaptive complexity in non-convex non-concave min-max optimization. Empirically, our experiments show that indeed adaptive gradient algorithms outperform their non-adaptive counterparts in GAN training. Moreover, this observation can be explained by the slow growth rate of the cumulative stochastic gradient, as observed empirically.


Sharp Analysis of Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization

Epoch gradient descent method (a.k.a. Epoch-GD) proposed by (Hazan and K...

A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems

Min-max saddle point games have recently been intensely studied, due to ...

Kernel-Based Training of Generative Networks

Generative adversarial networks (GANs) are designed with the help of min...

On the One-sided Convergence of Adam-type Algorithms in Non-convex Non-concave Min-max Optimization

Adam-type methods, the extension of adaptive gradient methods, have show...

Solving a class of non-convex min-max games using adaptive momentum methods

Adaptive momentum methods have recently attracted a lot of attention for...

Saddle Point Optimization with Approximate Minimization Oracle

A major approach to saddle point optimization min_xmax_y f(x, y) is a gr...

Decentralized Parallel Algorithm for Training Generative Adversarial Nets

Generative Adversarial Networks (GANs) are powerful class of generative ...

Please sign up or login with your details

Forgot password? Click here to reset