We analyze some recent results on the use of Mirror Descent (MD) and Optimistic Mirror descent (OMD),  that have recently been studied extensively for alleviating convergence issues in training of adversarial generative networks [3, 1]. In particular these papers consider the following general saddle-point (SP) problem.
where are compact convex subset of a finite-dimensional normed space , , and is continuously differentiable. Let and let be the dual of . Define as
When the function is convex-concave, it is also well-known that the point also satisfies the Minty Variational Inequality (MVI) , i.e.
For convex-concave problems, it can be shown that these three conditions, namely, is the optimal point of (1), satisfies SVI, and satisfies MVI, are all equivalent. The interplay of these two VIs characterizing the optimal set of solutions has been investigated extensively. We focus here on the notion of coherence proposed in [2, 1], that are assumed to be satisfied by problem (1) and that are in some sense the weakest set of possible conditions considered for global optimality going beyond convexity, pseudo-montonocity, and quasi-convexity .
In order to describe MD and OMD, one needs to define the notion of Bregman Divergence (BD) with respect to a differentiable and -strongly convex function whose domain includes the set . There are several equivalent definitions of -strong convexity, here we provide the one that will directly be used in the proof later:
We further assume that is -Lipschitz, which is needed in the proof. The BD is defined as,
Bregman divergence enjoys a number of properties that are critical to the success of MD and OMD and we refer the reader to the Apendices in 
where they are proposed and derived. Given a vector, and a vector , define the following Bregman projection operator via,
In the following we will assume that an is given and fixed throughout.
1.1 Variational Inequalities and Coherence
The following definition of coherence is provided in [1, Definition 2.1], where we explicitly define what it means by being sufficiently close to in Condition 3; this definition will be used in the proofs presented in Section 4.
We say that (SP) is coherent if
Every solution of (SVI) also solves (SP).
There exists a solution of (SP) that satisfies (MVI).
Every solution of (SP) satisfies (MVI) locally. Specifically, for some fixed , for all such that .
If, moreover, (MVI) holds as a strict inequality in Condition 2 when is not a solution of (SP), then (SP) will be called strictly coherent; by contrast, if (MVI) holds as an equality in Condition 2 for all , we will say that (SP) is null-coherent.
1.2 Mirror Descent Algorithms
Under stochastic gradients, the MD and OMD are defined below along with the almost standard assumptions on the expected values and the variance of the gradient estimates. For all the probabilistic statements in this article, we consider the probability spaceand let denote the expectation with respect to .
Mirror Descent (MD): The vanilla mirror descent (MD) algorithm is defined as
where is some constant and satisfies
with , the -algebra generated by . Note that with the assumption in (6), we have that there exists some constant such that
since a continuous function () is bounded on a compact set ().
Optimistic Mirror Descent (OMD): The optimistic mirror descent (OMD) algorithm is defined as
with the assumption that
where . Similarly, there exist some finite constant such that
2 Main Results
We are now ready to re-state the main results from  with several important corrections.
First, for coherent problems and OMD with exact (non-stochastic) gradients, it is claimed in  that is monotone decreasing for some . However, as we fill out the missing details in the proof, we believe that monotone decreasing is guaranteed only after some sufficiently large number of iterations; our modified statement is given in Theorem 1.
Theorem 1 (Modified from [1, Theorem 4.1]).
Next, for strictly coherent problems, almost sure convergence is claimed in  for both MD and OMD with stochastic gradients. However, as we fill out the missing details in their proofs, we believe that only convergence with high probability can be guaranteed; our modified theorem statements for MD and OMD are given in Theorem 2 (a) and Theorem 3, respectively. Moreover, for null-coherent problems, it is claimed in  that the sequence is non-decreasing for all , whereas we believe that it is true only for the ’s that satisfy Condition 2 in Definition 1; our modified statement is given in Theorem 2 (b).
Theorem 2 (Modified from [1, Theorem 3.1]).
For null-coherent problems, the sequence is non-decreasing for saddle points that satisfy Condition 2 (global MVI) in Definition 1.
Theorem 3 (Modified from [1, Theorem 4.3]).
3 Conclusions and Future Work
In an attempt towards understanding the recent body of work on MD/OMD dynamics for saddle point problems, in this article we have provided more rigorous and corrected statement of the claims in . As part of future work we aim to shed light on the rates of convergence of MD and OMD under coherency assumptions. In this context we aim to build upon the analysis conducted in .
-  P. Mertikopoulos, B. Lecouat, H. Zenati, C.-S. Foo, V. Chandrasekhar, and G. Piliouras, “Optimistic mirror descent in saddle-point problems: Going the extra(-gradient) mile,” in International Conference on Learning Representations (ICLR 2019), 2019.
-  Z. Zhou, P. Mertikopoulos, N. Bambos, S. Boyd, and P. Glynn, “On the convergence of mirror descent beyond stochastic convex programming,” arXiv preprint arXiv:1706.05681, 2017.
-  C. Daskalakis, A. Ilyas, V. Syrgkanis, and H. Zeng, “Training gans with optimism.,” in International Conference on Learning Representations (ICLR 2018), 2018.
-  R. Ferrentino, “Variational inequalities and optimization problems,” Applied Mathematical Sciences, vol. 1, pp. 2327–2343, 01 2007.
-  P. Mertikopoulos and M. Staudigl, “Stochastic mirror descent dynamics and their convergence in monotone variational inequalities,” Journal of optimization theory and applications, vol. 179, no. 3, pp. 838–867, 2018.
4.1 Proof of Theorem 1
First, we restate [1, Lemma D.1].
Suppose that (SP) is coherent and g is -Lipschitz. For any saddle point , the iterates of (OMD) with exact gradient satisfy
If moreover, is (one of) the special saddle points that satisfy (MVI) globally, then
Next, we add a result that is similar to [1, Proposition B.4(a)].
Let be a -strongly convex distance-generating function on and further assume that is -Lipschitz. Then, for any , we have
Let and . By [1, Lemma B.1(b)(c)], we have
Letting and and adding the two inequalities:
Note that the LHS of the above is low bounded by
by strong convexity of . The RHS of the above is upper bounded by
by Cauchy-Schwarz and Lipschitz property of . Combining the two, we have
We are now ready to prove Theorem 1.
Since , there exists such that for all . Then with this , we have
Telescoping the above, we have
Rearranging the above, we have
where the last inequality follows by positivity of Bregman divergence. Taking the limit as on both sides of the above inequality, we have , which implies that .
Next by compactness of , we have that has a convergent subsequence such that . We show in the following that, in fact, . First notice that
Moreover, suppose that , and so as well. Then
where the last equality follow by
since for all and , taking the limit as on both sides of the resulting inequality, we have that . In the above, step follows by [1, Proposition B.4(a)] and Lemma 2. The above shows that . By [1, (B.7)], we have
That is, for all , which is (SVI). By definition of coherence, must be a saddle point of (SP).
Telescoping the above, we have
where the choice of is as follows: let with being arbitrary but fixed,
Choose sufficiently large such that for all , such choice is possible since we have proved that as ;
Choose sufficiently large such that for all , such choice is possible since by and the Bregman reciprocity condition, we have as ;
Choose such that , such choice is possible since we have proved that .
With such choice of , we will prove that for all .
First we show that for all ,
To see this, in [1, Lemma B.2], letting , we have
where the first inequality follows by Cauchy-Schwarz and the Lipschitz property of , and the second inequality follows by our choice of .
Now starting with , we have
Since , by (16) we have . By our modified Condition 3 in the definition of coherence, we have , which implies
Using (16) again, the above implies and hence . Therefore,
Keeping this procedure, we can show that for all , we have . Since can be chosen to be arbitrarily close to zero (by choosing arbitrarily close to zero), we have proved that for all , there exists an such that for all , hence . By the Bregman reciprocity condition, we have . ∎
4.2 Proof of Theorem 2
The same technique used for proving (ii) and (iii) in Theorem 3 can also be used to prove the convergence of mirror descent (MD) algorithm; we omit the details here.
Note that there is a typo in the proof of [1, Theorem 3.1(b)] that can be quite confusing: in [1, (C.14)], the plus sign before the last innerproduct should be a minus sign. To see how [1, (C.14)] (with the corrected sign) is obtained, we first recall that is proper, convex, and closed on . Therefore, we have , where is the convex conjugate of . By [1, (B.5),(B.6b)], MD (5) can be written as
It follows that
where the last equality follows by (17). Taking expectation on both sides,
where step follows by , since is -measurable and is an unbiased conditioned on by (6), and step follows by definition of null-coherence. This shows that the sequence is non-decreasing for the special saddle points that satisfy global (MVI).
4.3 Proof of Theorem 3
We first include a result from [1, Proposition B.4(b)], which will be used frequently in the proof of the theorem.
Lemma 3 ([1, Proposition B.4(b)]).
We are now ready to prove the theorem. The proof contains three steps:
Let denote a subsequence of . Show that
Show that (13) holds.
Let denote a subsequence of and we notice that (i) and (ii) imply that
We now show (i). Let be one of the special saddle points that satisfy global (MVI). Define