1 Introduction
In this work, we focus on two wellstudied classes of problems: monotone variational inequalities (MVIs) and convexconcave minmax problems (Minty et al., 1962; Kinderlehrer and Stampacchia, 1980; Nemirovski, 2004). In an MVI, we are given a monotone operator over a convex set , and the goal is to find a point such that
(1) 
Such a point is called a solution to a weak (Minty) MVI (Komlósi, 1999). The MVI problem eq. 1 is closely related to the classic minmax optimization problem:
(2) 
where is a convexconcave function over convex sets and
. Such problems are ubiquitous in statistics, optimization, machine learning, and game theory. Solving
eq. 2 is equivalent to finding the Nash Equilibrium of a zerosum game and is also sometimes called a saddle point problem.The Mirror Prox (MP) algorithm of Nemirovski (2004) is a popular method for solving both eq. 1 (when is Lipschitz continuous) and eq. 2 (when is smooth). MP is a generalization of the extragradient algorithm of Korpelevich (1976), and it converges in iterations, which is tight for firstorder methods (FOMs) (Nemirovski and Yudin, 1983). Given that MP achieves the optimal performance for FOMs, there is a natural question of whether one can improve the iteration complexity by using higherorder methods (HOMs), which tend to converge in fewer iterations but at the expense of higher cost per iteration. HOMs use higherorder derivatives of the objective function and generally require higherorder smoothness, namely that the higherorder derivatives of the objective be Lipschitz continuous.
In convex and nonconvex optimization, while FOMs such as gradient descent are the gold standard for optimization algorithms, HOMs are useful in a variety of different settings. Newton’s method is one of the most wellknown HOMs, and it is a central component of pathfollowing interiorpoint methods (Nesterov and Nemirovski, 1994). In cases when the higherorder update is efficiently computable, HOMs can achieve faster overall running times than FOMs. For example, HOMs have been used to find approximate local minima in nonconvex optimization faster than gradient descent (Agarwal et al., 2017; Carmon et al., 2018). While secondorder methods are the most common type of HOM, there has also been significant recent work on HOMs beyond secondorder methods (Agarwal and Hazan, 2018; Arjevani et al., 2018; Gasnikov et al., 2018; Jiang et al., 2018; Bubeck et al., 2018; Bullins, 2018).
HOMs have seen much less study in the context of MVIs and minmax problems. Monteiro and Svaiter (2012) use a secondorder method with an implicit update that achieves an improved iteration complexity of for problems with secondorder smoothness. Their method uses the Hybrid Proximal Extragradient (HPE) framework established in Monteiro and Svaiter (2010) and requires access to an oracle for finding a fixed point of a constrained secondorder equation. However, it was unknown whether one could achieve further improved iteration complexity in the presence of thirdorder smoothness and beyond.
Contributions.
Our main contribution is a higherorder method HigherOrderMirrorProx for approximately solving MVIs and convexconcave minmax problems that achieves an iteration complexity of for problems with order smoothness. To our knowledge, this is the first work showing that improved convergence rates are possible for problems with thirdorder smoothness and beyond. Our algorithm requires access to an oracle for finding a fixed point of a order equation, using a higherorder implicit update that can be thought of as a generalization of Mirror Prox. Since the implicit update may be difficult to compute in the constrained case, we show how to instantiate our algorithm in the secondorder unconstrained case, giving overall running time bounds in that setting.
We begin by reviewing definitions, notions of convergence, and related work in Section 2. Then we summarize our main results and our algorithm in Section 3. In Section 4, we present the proof of our main result. We then show how to fully instantiate our algorithm in the unconstrained case in Section 5.
2 Preliminaries
We will use MVI() to denote the MVI given in eq. 1
over a vector field
and convex constraint set . Unless otherwise specified, we will use to signify a solution to MVI(). Throughout the paper, we will use to represent positive weights, and we let . We use to denote the Jacobian operator. We use to denote an arbitrary norm and to denote its dual norm. We use to denote the Euclidean norm for vectors and the operator norm for matrices.We use to denote a Bregman divergence over a distance generating function that is 1strongly convex with respect to some norm . Recall that the definition of a Bregman divergence is as follows:
(3) 
for all .
We now discuss several key definitions:
Definition 2.1.
A vector field is monotone if for all .
For notational convenience, we assume our algorithms have access to a monotone operator . This is the usual assumption in MVIs, but it will also allow us to solve minmax problems, as we now show. For minmax problems eq. 2, one can consider the gradient descentascent field of :
(4) 
Letting and , we can say maps to with only a slight abuse of notation. It is then easy to show that is monotone when is convexconcave. So to apply our algorithms to minmax settings, we simply apply them on .
Our algorithms will we require the following general notion of smoothness:
Definition 2.2.
A vector field is order smooth w.r.t. if, for all ,
where we define
Remark 2.3.
Our definition of order smoothness as a property of the derivative of is motivated by the minmax setting eq. 2, where is already expressed in terms of the gradient of . If is order smooth, this is a statement about the Lipschitz continuity of order derivatives of .
Another key component of our algorithms is the order Taylor expansion of at evaluated at :
(5) 
While depends on , we leave this implicit to lighten notation, as the relevant will be clear from context.
Remark 2.4.
To be consistent with Remark 2.3, when we refer to “order methods,” we will be referring to methods that use a order Taylor expansion of and which typically require order smoothness. Again, this indexing makes sense in the context of minmax problems, where a order method uses a Taylor expansion involving order derivatives of .
A wellstudied consequence of Definition 2.2 is the following:
Fact 2.5.
Let , and let be order smooth. Then,
(6) 
Finally, our algorithms will all require the following assumption:
Assumption 2.6.
There exists a solution to the weak variational inequality MVI, namely is a point that satisfies eq. 1.
Assumption 2.6 always holds when is a compact convex set and is continuous on (Kinderlehrer and Stampacchia, 1980).
2.1 Notions of convergence for variational inequalities
The main solution concept for eq. 1 that we consider is an approximate weak solution to MVI(), namely a point such that:
(7) 
Our main bounds will be of the form:
(8) 
where are iterates produced by our algorithm, are positive constants, and . We now show conditions under which a guarantee of the form eq. 8 gives approximate weak solutions.
Lemma 2.7.
Let , let for be monotone, and let . Let . Assume eq. 8 holds. Then is an approximate weak solution to MVI().
Proof.
By monotonicity, we have:
Therefore,
Then is an approximate solution to the weak MVI problem. ∎
2.2 Solving convexconcave minmax problems with variational inequalities
The classic notion of convergence for eq. 2 is the duality gap :
(9) 
The duality gap is defined in terms of a minmax objective , but we leave it implicit because the relevant will be clear from context. We will now show how to prove bounds on the duality gap given a bound like in eq. 8.
We will use the following lemma to prove bounds on the duality gap:
Lemma 2.8.
Proof.
When is the gradient descentascent field for a convexconcave problem, we have:
Overall, we then have:
∎
2.3 Related work
Monotone variational inequalities.
The weak MVI eq. 1 is a classic and wellstudied optimization problem (Minty et al., 1962; Komlósi, 1999; Nemirovski, 2004; Monteiro and Svaiter, 2010). It is closely related to the strong MVI problem (Stampacchia, 1970), where the goal is to find a such that
(10) 
When is continuous and singlevalued, any solution to the weak MVI eq. 1 is a solution to the strong MVI.
Our algorithm is based on the Mirror Prox (MP) algorithm of Nemirovski (2004), which is a generalization of the extragradient method of Korpelevich (1976). MP is a firstorder method that achieves iteration complexity, which is tight (Nemirovski and Yudin, 1983). Monteiro and Svaiter (2010) prove convergence rates for MP in the unconstrained case by formulating MP as an instance of what they call a Hybrid Proximal Extragradient (HPE) algorithm. Monteiro and Svaiter (2012) provide a secondorder algorithm to solve eq. 1 in settings with secondorder smoothness. That algorithm achieves an iteration complexity, and its analysis goes through the HPE framework from Monteiro and Svaiter (2010).
Minmax optimization.
Many convexconcave minmax optimization problems are either solved with MP or firstorder noregret algorithms. Ouyang and Xu (2018) show a lower bound of for firstorder methods in constrained smooth convexconcave saddle point problems, even in the simple case when for convex and . A number of recent works have also applied secondorder methods to unconstrained smooth minmax problems, where the secondorder information is often accessed through Hessianvector products (Balduzzi et al., 2018; Gemp and Mahadevan, 2018; Letcher et al., 2019; Adolphs et al., 2019; Abernethy et al., 2019; Schäfer and Anandkumar, 2019).
Higherorder methods for convex optimization.
Higherorder methods have a long history of use in solving convex optimization problems. Assuming Lipschitz continuity of the Hessian, Nesterov (2008) provided an accelerated variant of the cubic regularization method (Nesterov and Polyak, 2006), which was further generalized by Baes (2009) under order smoothness assumptions. The rate in (Nesterov, 2008) was later improved by Monteiro and Svaiter (2013), and since then several works concerning lower bounds in this setting (Agarwal and Hazan, 2018; Arjevani et al., 2018) have shown that this rate is essentially tight (up to logarithmic factors) when the Hessian is Lipschitz continuous. Recently, several works have shown that the lower bound is also essentially tight for (Gasnikov et al., 2018; Jiang et al., 2018; Bubeck et al., 2018; Bullins, 2018), leading to advances in related problems, such as regression (Bullins and Peng, 2019) and parallel nonsmooth convex optimization (Bubeck et al., 2019).
3 Main results
Our main result is a new higherorder method HigherOrderMirrorProx (Algorithm 1) for solving MVIs and convexconcave minmax problems with higherorder smoothness. We prove the following convergence rate:
Theorem 3.1.
Suppose is order smooth. Let . Moreover,
let . Then for as output by Algorithm 1:

If is monotone, then is an approximate solution to the weak MVI problem.

If is the gradient descentascent field for a convexconcave problem over and , then .
Our result matches the rate of Monteiro and Svaiter (2012) when and gives improved convergence rates for higher . To our knowledge, this is the first algorithm to achieve improved iteration complexity in the presence of higherorder smoothness. We compare our algorithm to that of Monteiro and Svaiter (2012) in more detail in Section 3.3.
Similar to other higherorder algorithms, which require an oracle for solving a minimization over a order Taylor series (Gasnikov et al., 2018; Jiang et al., 2018; Bubeck et al., 2018), our algorithm requires an oracle for solving a fixed point problem of a order equation. While this oracle is stronger, we believe it is justified given that the MVI and convexconcave minmax settings are significantly more difficult compared to convex minimization problems. A common downside of higherorder algorithms is that the required oracle may be difficult to compute, particularly in the constrained setting. We can also consider running our algorithm in the unconstrained setting, which requires a slightly weaker unconstrained oracle rather than a constrained oracle. We discuss how to interpret our bounds in the unconstrained setting in Section 3.1.
Finally, we show how to instantiate our method in the secondorder unconstrained case, giving the following running time bounds:
Theorem 3.2 (Main theorem, (Informal)).
Suppose is sufficiently smooth, and let be the output of HigherOrderMirrorProx (Algorithm 2). Then, for , the iterates satisfy, for all ,
(11) 
with periteration cost dominated by matrix inversions.^{*}^{*}*Here we use the notation to suppress logarithmic factors.
(12) 
(13) 
(14) 
3.1 Interpreting our results in the unconstrained setting
In the unconstrained setting, the standard solution concepts for MVIs and minmax problems can be vacuous in general. For example, for and the associated vector field , all approximate solutions to the minmax problem / MVI are exact solutions. However, the bounds we prove are still meaningful. In the MVI case, our guarantee can be interpreted as stating that for all such that , we have as long as . Likewise, for minmax problems, if is a convex set containing , then we can say that , where .
3.2 Explanation of our algorithm
Our algorithm is inspired by the Mirror Prox (MP) algorithm of Nemirovski (2004), defined as follows:
(15) 
(16) 
where is a Bregman divergence. Nemirovski (2004) motivates MP with a “conceptual prox method”, which is given as follows:
(17) 
This is an implicit method, as computing requires solving the equation above for a given stepsize . However, this method has good iteration complexity. Nemirovski (2004) shows that if one could run eq. 17 exactly, then the averaged iterate converges at a rate of . Thus, if one could implement eq. 17 with large stepsizes, one could achieve faster iteration complexity.
It turns out that as long as one approximates eq. 17 with small error, one can achieve a similar convergence rate. The MP algorithm with constant does just that, leading to a convergence rate. While one would like to increase the stepsize in MP to improve the convergence rate, this approach does not work because MP with large stepsizes will no longer approximate eq. 17 with small error.
In our algorithm, we replace the firstorder minimization in MP eq. 15 with a order minimization (12). We also simultaneously choose a particular stepsize. This can be viewed as approximating eq. 17 with large stepsizes while using the higherorder minimization to ensure that our algorithm is still a “good” approximation of eq. 17.
3.3 Comparison to Monteiro and Svaiter (2012)
Monteiro and Svaiter (2012) give a secondorder algorithm for solving eq. 1 with iteration complexity in the presence of secondorder smoothness. Like our algorithm, their algorithm also heavily relies on the idea of approximating a proximal point method with a large stepsize. In fact, their algorithm is very similar to our algorithm in the secondorder case. However, our analysis is rather different and arguably simpler. While their analysis goes through the Hybrid Proximal Extragradient framework of Monteiro and Svaiter (2010), our analysis relies on a natural extension of the Mirror Prox analysis. Finally, Monteiro and Svaiter (2012) only deal with the Euclidean setting, whereas we allow arbitrary norms.
While Monteiro and Svaiter (2012) do not explicitly instantiate their secondorder oracle, they mention that their oracle reduces to solving a strongly monotone variational inequality, which can then be solving using a variety of approaches, including interior point methods. In the case, our oracle can be similarly instantiated.
4 HigherOrder Mirror Prox Guarantees
In this section, we prove our main result of the convergence guarantees provided by Algorithm 1.
Lemma 4.1.
Suppose is order smooth and let . Then, the iterates generated by Algorithm 1 satisfy, for all ,
(18) 
Theorem 4.2.
Suppose is order smooth. Let . Moreover,
let . Then for as output by Algorithm 1:

If is monotone, then is an approximate solution to the weak MVI problem.

If is the gradient descentascent field for a convexconcave problem over and , then .
Theorem 4.2 follows immediately from Lemmas 2.7, 2.8, and 4.1. To prove Lemma 4.1, we will need to establish our main technical result (Lemma 4.3), which we prove in Section 4.1 and whose proof proceeds in a similar manner to the Mirror Prox analysis (Nemirovski, 2004; Tseng, 2008).
Lemma 4.3.
Suppose is order smooth. Then, as generated by Algorithm 1 satisfy, for all ,
(19) 
We will also need the following technical lemma:
Lemma 4.4.
Let for all , and let . Then .
Proof.
We use the following power means:
By the power mean inequality, we have , so letting gives:
∎
We now have the necessary tools to prove Lemma 4.1.
Proof of Lemma 4.1.
Using Lemma 4.3, we can divide both sides of (19) by , and so using the nonnegativity of and the Bregman divergence, we get:
We simply need to lower bound in order to prove our convergence rate result. By Assumption 2.6, we know that there exists a solution to MVI(), which means that for all , we have . We can combine this with Lemma 4.3 to get that . Since , we can apply Lemma 4.4 by setting and , which gives the result. ∎
4.1 Proof of main technical result (Lemma 4.3)
Before proving Lemma 4.3, we state a useful lemma concerning the updates (12) and (14) in Algorithm 1.
Lemma 4.5 (Tseng (2008)).
Let be a convex function, let , and let
(20) 
Then, for all ,
(21) 
Proof.
By the optimality condition for , we know that for all ,
(22) 
Rearranging and adding to both sides gives us
where the first equality comes from the Bregman threepoint property, i.e.,
(23) 
∎
We now prove Lemma 4.3, which is our main technical result.
Proof of Lemma 4.3.
By Lemma 4.5, along with the algorithm’s determination of , we have that for all ,
(24) 
Using Lemma 4.5 again with the choice of , it follows that for all ,
(25) 
We may now observe that
where the final inequality follows from (24) and (25). Now by Hölder’s inequality, using eq. (6), and the 1strong convexity of w.r.t. , it follows that
Finally, by our guarantee from Algorithm 1 that , and using the fact that for , it follows that
(26) 
Summing over gives the result. ∎
5 Instantiating HigherOrderMirrorProx (for )
In this section, we provide an efficient implementation of HigherOrderMirrorProxfor the case where is secondorder smooth. In particular, we consider the unconstrained problem (i.e., ) with the Bregman divergence chosen as . First, for technical reasons, we require the following assumption:
Assumption 5.1.
During the execution of Algorithm 2, for all , , we assume that is invertible and .
As we later discuss in Section 5.2, these conditions always hold for convexconcave minmax problems. We then arrive at the following result for this setting:
Theorem 5.2 (Main theorem, ).
Suppose is firstorder smooth, secondorder smooth, and Assumption 5.1 holds. Let be a solution to MVI(), let , and let be the output of HigherOrderMirrorProx + (Algorithm 2). Further assume that, for all , . Then, for , the iterates satisfy, for all ,
(27) 
In addition, the computational cost of each iteration of Algorithm 2 is dominated by a total of
matrix inversions.
Proof.
We will first show that the choices of and are valid binary search bounds whenever
is called by Algorithm 2, i.e., that and . We begin with our choice of . Suppose that, for some iteration , it is the case that . If so, then the algorithm sets , which means that . Therefore, since we know that
(28) 
it follows that
(29) 
and so we would be done. In addition, supposing it is the case that (at which point, the algorithm sets ), we again reach this conclusion by the same reasoning. For ensuring the validity of , note that by (36), it follows that .
Having established the validity of the binary search bounds in the case that the search routine is in fact called, we now move on to show how we may explicitly instantiate the implicitly defined update in (12). Namely, in this setting the key conditions (12) and (13) that must simultaneously hold can be equivalently expressed as
(30) 
(31) 
From (30), it follows by firstorder optimality conditions that , and so rearranging gives us
Since we assume that is invertible, it follows that