## 1 Background

### 1.1 Definitions

We first define variational inequalities and complementarity problems [2]. We consider here only problems defined over convex feasible sets, although it is possible to define analogous problems over nonconvex sets.

A cone is a set such that, for any and , we have . The dual cone of is

For any set and point , the normal cone to at is .

###### Definition 1.1 (Variational inequality)

Given a nonempty closed convex set and an operator , the variational inequality VI is to find s.t.:

The VI is linear if .

In a complementarity problem, the feasible set is required to be a cone. Define to be -complementary if , , and . Then the complementarity problem asks us to find a -complementary pair :

###### Definition 1.2 (Complementarity problem)

Given a nonempty closed convex cone and an operator , the complementarity problem CP is to find s.t.:

The CP is linear if .

A common choice is to take to be the nonnegative orthant; in this case complementarity means that, for all , at least one of and is zero.

### 1.2 Relationships

Variational inequalities and complementarity problems are strongly related. First, if the feasible set of a variational inequality is a cone , then VI is equivalent to CP; that is, both problems have the same set of solutions. Second, if the feasible set is the intersection of a cone with some equality constraints , then we can eliminate the equality constraints using Lagrange multipliers: VI is equivalent to VI, where and

Since we can represent any convex set as the intersection of a cone with equality constraints, the above relationships mean that we can transform any VI to a CP and vice versa.

For computational purposes, it is often convenient to transform our problem so that the feasible set is a very simple cone. For example, for the polyhedron

, we introduce a vector of nonnegative slack variables

and write , with free and . After eliminating the equality constraints with Lagrange multipliers, our feasible set is the cone . This cone is*separable*: it is the product of one-dimensional cones. The advantage of this sort of transformation is that it is extremely efficient to work with separable cones: e.g., Euclidean projection onto just means thresholding each component of at .

### 1.3 Complexity

If we assume that the feasible set is a separable cone, then the
computational complexity of a variational inequality or
complementarity problem depends on the operator . Even if is
restricted to be linear, it is possible to encode NP-hard problems.
An important class that ensures polynomial-time solvability is the
class of *monotone* operators:

###### Definition 1.3

An operator is monotone on the set if, for some ,

It is strongly monotone if the above holds with .

A linear operator is monotone iff is positive semidefinite, and strongly monotone iff is positive definite. (There is no requirement for to be symmetric.)

Among Lipschitz operators (which include all finite-dimensional linear operators), strong monotonicity implies a useful contraction property:

###### Lemma 1.1

If the operator is -strongly monotone and -Lipschitz on some set , and if we take , then the operator is Lipschitz on with constant .

Proof: For any ,

Choosing to minimize the RHS, we have ; the square of the Lipschitz constant is then

as claimed.

This contraction property means that a very simple algorithm, the
*projection method*, converges linearly to the solution of our
variational inequality, as we will see in the next section.

## 2 Projection method

In a variational inequality, the condition that is in the normal cone to at is equivalent to

(1) |

where is any nonnegative step size and represents Euclidean projection onto the set . That is, if we take a step in the direction and project back onto , we don’t move. More formally:

###### Lemma 2.1

If for some step , then .

(We omit the proof.)

The projection method simply treats (1) as an assignment: start from an arbitrary , then for , set

(2) |

(We could also choose a separate step size for each step , but for simplicity we ignore this possibility.)

The projection method converges linearly for strongly-monotone , since its update operator is a contraction:

###### Lemma 2.2

In the projection method, suppose the operator is -Lipschitz. Then the variational inequality (1) has a unique solution . Furthermore, for any , if we take steps of the projection method, we will have .

Proof: The update operator is -Lipschitz, since it is the composition of a projection operator (which is -Lipschitz) with . Existence and uniqueness of the solution follow from the contraction mapping theorem; for the error bound,

That is, the distance between and reduces by a factor with each application of . Substituting the assumed value for now yields the desired bound.

## 3 Galerkin method—I

If the dimension of is very large, each iteration of the projection method may be too slow to be practical. The problem is exacerbated if is close to 1. In this case, it makes sense to search for a Galerkin approximation, i.e., an approximate solution within the span of some basis matrix . The hope is that we can design an approximate projection method that will find such a solution much more cheaply than the raw projection method.

Bertsekas [1] proposed one such Galerkin approximation: assume that is nonempty, and define the approximation as the fixed point of

(3) |

Compared to (2), we simply project onto instead of at each iteration. Note that (3) also defines an algorithm for computing the Galerkin approximation, although not the only possible one.

If the dimension of is , but the rank of is , we may be able to implement the iteration (3) cheaply: e.g., for a linear variational inequality, we can precompute and , and work in -dimensional space instead of -dimensional space. The only caveat is projection onto ; depending on the forms of and , this projection may or may not be possible to implement efficiently.

Essentially the same convergence rate analysis holds for the Bertsekas-Galerkin projection method as for the original projection method:

###### Lemma 3.1

Suppose is -Lipschitz and is nonempty. Then the iteration (3) has a unique fixed point, say ; and, we need at most iterations to achieve error .

Proof: Identical to the proof of Lemma 2.2.

In addition, we can prove a bound on the distance between the approximate solution and the original solution :

###### Lemma 3.2

For the Bertsekas-Galerkin method, the error between the approximate solution and the true solution satisfies:

(4) |

Proof: Note that , and . So, , and

Applying repeatedly to both and , and using the fact that is -Lipschitz, we have

So, after iterations, we have

where the last line follows by summing a geometric series. Since this inequality holds for all , it holds in the limit as , yielding (4).

The bound (4) depends on , a measure of our *representation error*—that
is, how closely we can approximate in our low-dimensional
representation.

Qualitatively, compared to the exact projection method, the Bertsekas-Galerkin method tightens the feasible region and relaxes the optimality condition: both the iterates and the approximate solution will be exactly feasible. (In fact, they will be in .) But, at convergence, the step direction will not necessarily be in a normal cone of , as it would in the exact solution. Instead, Lemma 2.1 means that the step direction will be in the normal cone to at .

Another way to interpret this approximate optimality criterion is that, since is the interection of two sets, the normal cone to is the sum of their two normal cones, . In other words, for some residual vector , we have that is in the normal cone to at .

## 4 Galerkin method—II

While the error bound (4) is nice to have, it is less than perfectly satisfying: the representation error measure may be quite large, even if passes close to . See Fig. 1 for an example. An extreme version of this problem is that might not even intersect the feasible set .

To remedy this problem, in this section we propose a simple new Galerkin method, and bound the new method’s convergence rate and error. Our new method’s bounds will depend on a different measure of representation error, based on projection onto instead of . Since , we expect this change to lead to an improvement in the bound: for any given vector , the error will be no larger, and sometimes much smaller, than the error .

To enable projection onto instead of , we modify the algorithm as described below; in particular, we project the proposed update instead of the proposed solution . When is small, we have , so that . For larger it is possible that this inequality could be reversed; but in our (limited) experience, even for larger , the new method’s bounds are at least as tight as those of the Bertsekas-Galerkin method, and often much tighter.

In more detail, our approximation is based on the following re-arrangement of the projection method: pick or arbitrarily, and for all , repeat:

(5) | ||||

(6) |

It is easy to see that the sequence computed by this iteration is the same as the one computed by (2), so long as we start from the same value of .

Now we insert a projection operator into (6), and define our new Galerkin approximation as the fixed point of the resulting iteration:

(7) | ||||

(8) |

(As before, fixed-point iteration is only one way to compute the approximation; we will discuss others below.)

Right away we can see a difference between the new method and the old one: instead of projecting onto the intersection of and , we project onto each of and at different points in the update. So, for example, it is not even necessary that be nonempty.

Another important difference is what is required for efficient implementation of the iteration (7–8). For example, suppose that is the nonnegative orthant and is an arbitrary basis matrix. The Bertsekas-Galerkin method requires us to project onto , a quadratic program in dimensions. The new method requires only separate projections onto and , both of which are much faster than solving a quadratic program: the former is componentwise thresholding, while the latter is a rank- linear operator which we can precompute. (For example, if

is dense, a good strategy might be to precompute its QR decomposition; then we can threshold and project in total time

with a very low constant.)As before, the entire update is -Lipschitz (being the composition of with some projections, each of which is -Lipschitz). We therefore have:

###### Lemma 4.1

Proof: Identical to the proof of Lemma 2.2.

Write and for the true solution to our variational inequality, that is, the fixed point of (5–6). We can bound the distance between and :

###### Lemma 4.2

Proof: We can follow almost the same proof strategy as in Lemma 3.2: the update operator for ,

is -Lipschitz as discussed above, so to get our error bound we just need to bound the change in a single update starting from . Since ,

Just as before, we can now apply the update operator repeatedly to get bounds on the difference between successive iterates, and sum these bounds to show the desired result.

To bound the error in , note and , and use the fact that is -Lipschitz.

As promised, comparing to (4), the form of the new bound is identical except that we measure representation error as rather than .

Qualitatively, both the iterates and the approximate solution will be exactly feasible (i.e., , ), just as for the Bertsekas-Galerkin method. The approximate optimality criterion is also similar: for some residual vector , we have that is in the normal cone to at (see Lemma 4.3). So, the most important difference between the two methods is that the new method searches over a larger feasible region ( instead of ).

## 5 Projective LCPs

The projection method described above is by no means the only way to find the Galerkin approximation (7–8). If the operator is linear, another attractive algorithm is the following interior point method.

Let , and suppose that the feasible set is a separable cone , so that we are solving the two equivalent problems VI and CP. Saying that is a fixed point of (7–8) is equivalent to

(11) |

where we have defined

(12) |

(The first line substitutes into (7–8); the second line distributes and then adds and subtracts ; the third substitutes the definitions of and .)

But now we can recognize (11) as the optimality condition for a new problem, CP. So, we can use any appropriate linear complementarity problem solver to solve CP. Furthermore, if the original matrix problem was strongly monotone, then the new problem is strongly monotone too:

###### Lemma 5.1

Proof: First note that is -Lipschitz, where is defined in Lemma 1.1: , and the RHS is the composition of a projection (which is -Lipschitz) with (which is -Lipschitz). Now pick any . We will show , which implies that is positive definite:

The first line holds since is -Lipschitz. The second line expands the squares. The third line rearranges and collects terms. The last line follows since the LHS of line 3 is strictly positive: is strictly positive since and , and .

The dimension of the new matrix is the same as that of the original matrix , so it is not clear that we are making progress by expressing our LCP this way. However, it turns out that we can solve CP much more quickly than a general LCP of the same dimension, due to the special structure of .

In particular, is *projective* in the sense
of [3]. So, as shown in that paper, we can run the
Unified Interior Point (UIP) method of Kojima et
al. [4] very quickly: if and
, then each iteration of the UIP method takes
time , only linear in . (By contrast, an iteration of UIP
ordinarily requires solving an system of equations, usually
much more expensive if .) Furthermore, as shown
in [4], the number of iterations required is
polynomial for any monotone LCP.

## References

- [1] Dimitri P. Bertsekas. Projected equations, variational inequalities, and temporal difference methods. Technical Report LIDS-P-2808, MIT Lab for Information and Decision Sciences, 2009.
- [2] Francisco Facchinei and Jong-Shi Pang. Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Series in Operations Research and Financial Engineering. Springer, 2003.
- [3] Geoffrey J. Gordon. Fast solutions to projective monotone linear complementarity problems. Technical Report arXiv:1212.6958, 2012.
- [4] Masakazu Kojima, Nimrod Megiddo, Toshihito Noma, and Akiko Yoshise. A Unified Approach to Interior Point Algorithms for Linear Complementarity Problems. Lecture Notes in Computer Science. Springer, 1991.