We first define variational inequalities and complementarity problems . We consider here only problems defined over convex feasible sets, although it is possible to define analogous problems over nonconvex sets.
A cone is a set such that, for any and , we have . The dual cone of is
For any set and point , the normal cone to at is .
Definition 1.1 (Variational inequality)
Given a nonempty closed convex set and an operator , the variational inequality VI is to find s.t.:
The VI is linear if .
In a complementarity problem, the feasible set is required to be a cone. Define to be -complementary if , , and . Then the complementarity problem asks us to find a -complementary pair :
Definition 1.2 (Complementarity problem)
Given a nonempty closed convex cone and an operator , the complementarity problem CP is to find s.t.:
The CP is linear if .
A common choice is to take to be the nonnegative orthant; in this case complementarity means that, for all , at least one of and is zero.
Variational inequalities and complementarity problems are strongly related. First, if the feasible set of a variational inequality is a cone , then VI is equivalent to CP; that is, both problems have the same set of solutions. Second, if the feasible set is the intersection of a cone with some equality constraints , then we can eliminate the equality constraints using Lagrange multipliers: VI is equivalent to VI, where and
Since we can represent any convex set as the intersection of a cone with equality constraints, the above relationships mean that we can transform any VI to a CP and vice versa.
For computational purposes, it is often convenient to transform our problem so that the feasible set is a very simple cone. For example, for the polyhedron
, we introduce a vector of nonnegative slack variablesand write , with free and . After eliminating the equality constraints with Lagrange multipliers, our feasible set is the cone . This cone is separable: it is the product of one-dimensional cones. The advantage of this sort of transformation is that it is extremely efficient to work with separable cones: e.g., Euclidean projection onto just means thresholding each component of at .
If we assume that the feasible set is a separable cone, then the computational complexity of a variational inequality or complementarity problem depends on the operator . Even if is restricted to be linear, it is possible to encode NP-hard problems. An important class that ensures polynomial-time solvability is the class of monotone operators:
An operator is monotone on the set if, for some ,
It is strongly monotone if the above holds with .
A linear operator is monotone iff is positive semidefinite, and strongly monotone iff is positive definite. (There is no requirement for to be symmetric.)
Among Lipschitz operators (which include all finite-dimensional linear operators), strong monotonicity implies a useful contraction property:
If the operator is -strongly monotone and -Lipschitz on some set , and if we take , then the operator is Lipschitz on with constant .
Proof: For any ,
Choosing to minimize the RHS, we have ; the square of the Lipschitz constant is then
This contraction property means that a very simple algorithm, the projection method, converges linearly to the solution of our variational inequality, as we will see in the next section.
2 Projection method
In a variational inequality, the condition that is in the normal cone to at is equivalent to
where is any nonnegative step size and represents Euclidean projection onto the set . That is, if we take a step in the direction and project back onto , we don’t move. More formally:
If for some step , then .
(We omit the proof.)
The projection method simply treats (1) as an assignment: start from an arbitrary , then for , set
(We could also choose a separate step size for each step , but for simplicity we ignore this possibility.)
The projection method converges linearly for strongly-monotone , since its update operator is a contraction:
In the projection method, suppose the operator is -Lipschitz. Then the variational inequality (1) has a unique solution . Furthermore, for any , if we take steps of the projection method, we will have .
Proof: The update operator is -Lipschitz, since it is the composition of a projection operator (which is -Lipschitz) with . Existence and uniqueness of the solution follow from the contraction mapping theorem; for the error bound,
That is, the distance between and reduces by a factor with each application of . Substituting the assumed value for now yields the desired bound.
3 Galerkin method—I
If the dimension of is very large, each iteration of the projection method may be too slow to be practical. The problem is exacerbated if is close to 1. In this case, it makes sense to search for a Galerkin approximation, i.e., an approximate solution within the span of some basis matrix . The hope is that we can design an approximate projection method that will find such a solution much more cheaply than the raw projection method.
Bertsekas  proposed one such Galerkin approximation: assume that is nonempty, and define the approximation as the fixed point of
If the dimension of is , but the rank of is , we may be able to implement the iteration (3) cheaply: e.g., for a linear variational inequality, we can precompute and , and work in -dimensional space instead of -dimensional space. The only caveat is projection onto ; depending on the forms of and , this projection may or may not be possible to implement efficiently.
Essentially the same convergence rate analysis holds for the Bertsekas-Galerkin projection method as for the original projection method:
Suppose is -Lipschitz and is nonempty. Then the iteration (3) has a unique fixed point, say ; and, we need at most iterations to achieve error .
Proof: Identical to the proof of Lemma 2.2.
In addition, we can prove a bound on the distance between the approximate solution and the original solution :
For the Bertsekas-Galerkin method, the error between the approximate solution and the true solution satisfies:
Proof: Note that , and . So, , and
Applying repeatedly to both and , and using the fact that is -Lipschitz, we have
So, after iterations, we have
where the last line follows by summing a geometric series. Since this inequality holds for all , it holds in the limit as , yielding (4).
The bound (4) depends on , a measure of our representation error—that is, how closely we can approximate in our low-dimensional representation.
Qualitatively, compared to the exact projection method, the Bertsekas-Galerkin method tightens the feasible region and relaxes the optimality condition: both the iterates and the approximate solution will be exactly feasible. (In fact, they will be in .) But, at convergence, the step direction will not necessarily be in a normal cone of , as it would in the exact solution. Instead, Lemma 2.1 means that the step direction will be in the normal cone to at .
Another way to interpret this approximate optimality criterion is that, since is the interection of two sets, the normal cone to is the sum of their two normal cones, . In other words, for some residual vector , we have that is in the normal cone to at .
4 Galerkin method—II
While the error bound (4) is nice to have, it is less than perfectly satisfying: the representation error measure may be quite large, even if passes close to . See Fig. 1 for an example. An extreme version of this problem is that might not even intersect the feasible set .
To remedy this problem, in this section we propose a simple new Galerkin method, and bound the new method’s convergence rate and error. Our new method’s bounds will depend on a different measure of representation error, based on projection onto instead of . Since , we expect this change to lead to an improvement in the bound: for any given vector , the error will be no larger, and sometimes much smaller, than the error .
To enable projection onto instead of , we modify the algorithm as described below; in particular, we project the proposed update instead of the proposed solution . When is small, we have , so that . For larger it is possible that this inequality could be reversed; but in our (limited) experience, even for larger , the new method’s bounds are at least as tight as those of the Bertsekas-Galerkin method, and often much tighter.
In more detail, our approximation is based on the following re-arrangement of the projection method: pick or arbitrarily, and for all , repeat:
It is easy to see that the sequence computed by this iteration is the same as the one computed by (2), so long as we start from the same value of .
Now we insert a projection operator into (6), and define our new Galerkin approximation as the fixed point of the resulting iteration:
(As before, fixed-point iteration is only one way to compute the approximation; we will discuss others below.)
Right away we can see a difference between the new method and the old one: instead of projecting onto the intersection of and , we project onto each of and at different points in the update. So, for example, it is not even necessary that be nonempty.
Another important difference is what is required for efficient implementation of the iteration (7–8). For example, suppose that is the nonnegative orthant and is an arbitrary basis matrix. The Bertsekas-Galerkin method requires us to project onto , a quadratic program in dimensions. The new method requires only separate projections onto and , both of which are much faster than solving a quadratic program: the former is componentwise thresholding, while the latter is a rank- linear operator which we can precompute. (For example, if
is dense, a good strategy might be to precompute its QR decomposition; then we can threshold and project in total timewith a very low constant.)
As before, the entire update is -Lipschitz (being the composition of with some projections, each of which is -Lipschitz). We therefore have:
Proof: Identical to the proof of Lemma 2.2.
Proof: We can follow almost the same proof strategy as in Lemma 3.2: the update operator for ,
is -Lipschitz as discussed above, so to get our error bound we just need to bound the change in a single update starting from . Since ,
Just as before, we can now apply the update operator repeatedly to get bounds on the difference between successive iterates, and sum these bounds to show the desired result.
To bound the error in , note and , and use the fact that is -Lipschitz.
As promised, comparing to (4), the form of the new bound is identical except that we measure representation error as rather than .
Qualitatively, both the iterates and the approximate solution will be exactly feasible (i.e., , ), just as for the Bertsekas-Galerkin method. The approximate optimality criterion is also similar: for some residual vector , we have that is in the normal cone to at (see Lemma 4.3). So, the most important difference between the two methods is that the new method searches over a larger feasible region ( instead of ).
5 Projective LCPs
The projection method described above is by no means the only way to find the Galerkin approximation (7–8). If the operator is linear, another attractive algorithm is the following interior point method.
where we have defined
But now we can recognize (11) as the optimality condition for a new problem, CP. So, we can use any appropriate linear complementarity problem solver to solve CP. Furthermore, if the original matrix problem was strongly monotone, then the new problem is strongly monotone too:
Proof: First note that is -Lipschitz, where is defined in Lemma 1.1: , and the RHS is the composition of a projection (which is -Lipschitz) with (which is -Lipschitz). Now pick any . We will show , which implies that is positive definite:
The first line holds since is -Lipschitz. The second line expands the squares. The third line rearranges and collects terms. The last line follows since the LHS of line 3 is strictly positive: is strictly positive since and , and .
The dimension of the new matrix is the same as that of the original matrix , so it is not clear that we are making progress by expressing our LCP this way. However, it turns out that we can solve CP much more quickly than a general LCP of the same dimension, due to the special structure of .
In particular, is projective in the sense of . So, as shown in that paper, we can run the Unified Interior Point (UIP) method of Kojima et al.  very quickly: if and , then each iteration of the UIP method takes time , only linear in . (By contrast, an iteration of UIP ordinarily requires solving an system of equations, usually much more expensive if .) Furthermore, as shown in , the number of iterations required is polynomial for any monotone LCP.
-  Dimitri P. Bertsekas. Projected equations, variational inequalities, and temporal difference methods. Technical Report LIDS-P-2808, MIT Lab for Information and Decision Sciences, 2009.
-  Francisco Facchinei and Jong-Shi Pang. Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Series in Operations Research and Financial Engineering. Springer, 2003.
-  Geoffrey J. Gordon. Fast solutions to projective monotone linear complementarity problems. Technical Report arXiv:1212.6958, 2012.
-  Masakazu Kojima, Nimrod Megiddo, Toshihito Noma, and Akiko Yoshise. A Unified Approach to Interior Point Algorithms for Linear Complementarity Problems. Lecture Notes in Computer Science. Springer, 1991.