Lagrange Multipliers and Rayleigh Quotient Iteration in Constrained Type Equations

We generalize the Rayleigh quotient iteration to a class of functions called vector Lagrangians. The Rayleigh quotient is an expression used in literature as an estimate of the Lagrange multiplier in constrained optimization. We discuss two methods of solving the updating equation associated with the iteration. One method leads to a generalization of Riemannian Newton method for embedded manifolds in a Euclidean space while the other leads to a generalization of the classical Rayleigh quotient formula and its invariant subspace extension. We also show how to apply second order iteration in this context to obtain cubic convergence. We discuss applications of this result to linear and nonlinear eigenvalue and subspace problems as well as potential applications in optimization.

Authors

• 7 publications
• Nonsmooth Optimization over Stiefel Manifold: Riemannian Subgradient Methods

Nonsmooth Riemannian optimization is a still under explored subfield of ...
11/12/2019 ∙ by Xiao Li, et al. ∙ 0

• Stochastic Subspace Cubic Newton Method

In this paper, we propose a new randomized second-order optimization alg...
02/21/2020 ∙ by Filip Hanzely, et al. ∙ 0

• Constrained Least Squares for Extended Complex Factor Analysis

For subspace estimation with an unknown colored noise, Factor Analysis (...

• Implicit algorithms for eigenvector nonlinearities

We study and derive algorithms for nonlinear eigenvalue problems, where ...
02/28/2020 ∙ by Elias Jarlebring, et al. ∙ 0

• Generalization of the Secant Method for Nonlinear Equations (extended version)

The secant method is a very effective numerical procedure used for solvi...
12/08/2020 ∙ by Avram Sidi, et al. ∙ 0

• Competitive Mirror Descent

Constrained competitive optimization involves multiple agents trying to ...
06/17/2020 ∙ by Florian Schäfer, et al. ∙ 0

• A framework for second order eigenvector centralities and clustering coefficients

We propose and analyse a general tensor-based framework for incorporatin...
10/28/2019 ∙ by Francesca Arrigo, et al. ∙ 0

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Consider three Euclidean spaces with . We consider a map from into and a map from to . The direct sum :

 L(xλ)=(L(x,λ)C(x)) (1)

is a map from to . When the Jacobian of is invertible in a domain of near a root of , and have Jacobians both of full row rank. In that situation we will call a vector Lagrangian and a constraint. The equation

 L(x,λ)=0 (2)

covers systems of equations where a number of equations in the system are dependent on a group of variables while the remaining equations involve all variables . The remaining variables are named in honor of Lagrange. The full row rank assumption for ensures that is solvable in (but not always explicitly). For a constrained optimization problem a Lagrangian is a scalar function in the literature and what we call vector Lagrangian here is its differential. Since our focus will be on vector Lagrangian in this article we will drop the qualifier vector for the remain of the article.

Of particular interest is the case when is affine in . Let be a vector function from an open set in to . For each , assume is a linear function from to . Then

 L(x,λ)=F(x)−H(x)λ (3)

is a Lagrangian if it has continuous differential and is of full row rank. We will call this special form explicit Lagrangian.

On the constraint side, from our full row rank assumption defines a submanifold of codimension .

This set up covers three classes of equations encountered in the literature:

• The eigenvector/invariant subspace problem:

 F=Ax;H(x)=x (4)
 C(x)=12(xTx−IEL)

where is a matrix, is a matrix. In this case and are both matrices and is the space of symmetric matrices. The case where is the eigenvector problem.

• The constraint optimization problem, one of the most important problems in applied mathematics. Here where is a real value function and . This is the case of the classical Lagrangian multiplier equations. is the domain where is defined and is the target space of the restrictions on . The system (Eq. 1) gives us the set of critical points.

• The nonlinear eigenvalue problem:

 P(λ)x=0 (5)

Here is a matrix with polynomial entries in . While this is not in the form (Eq. 1) we can impose the constraint (or alternatively for a fixed vector ) This equation is not of explicit form. is of dimension one and is a scalar. and are where is the dimension of . There is an extensive literature for this problem ([10]) and Newton-Raphson is an important algorithm.

An iteration method called Rayleigh quotient iteration (RQI) is among the most powerful methods to compute eigenvalues and vectors. For a vector the Rayleigh quotient is

 vTAvvTv

and the iteration computes

 vi+1=(A−λ)−1vi||(A−λ)−1vi||

which is shown to have cubic converge in for almost all initial points if is normal and quadratic otherwise on suitable initial points. ([1]) gives an extension of Rayleigh quotient iteration for the invariant subspace problem where is a -matrix and is a -matrix.

The letter

used in the second example in honor of Lagrange is also often used to denote an eigenvalue. The fact that RQI is related to Lagrange multipliers and Newton-Raphson has been known for a long time but it is probably fair to say the relationship is not yet transparent. We will try to clarify the relationship, namely applying Newton-Raphson iteration to (

Eq. 1), the updating equation for involves solving a linear equation similar to . The iterative series of will have form similar to Rayleigh quotient. When the Lagrangian is of explicit form we can use that form to estimate , resulting in a variation of Newton-Raphson extending RQI. We also obtain an iteration when the Lagrangian is implicit.

In practice,

could be spaces of matrices or tensors.

is then a tensor, and so are the Jacobians of and . On the theoretical side we will consider the space involves as vector, leaving the tensor related treatment to specific implementations.

2 Newton-Raphson applying to eigenvector problem

To explain the ideas involved here we look at the eigenvector problem in details. Here

 L(x,λ)=Ax−xλ

and constraint is The Jacobian of the total Lagrangian is

 JL=(Lx−xxT0)

where . Applying the Schur complement formula we can evaluate the Newton step:

 (ηδ)=J−1L(−Ax+xλ−12(xTx−1))

and get (with and )

 λi+1−λi=δ=(xTL−1xx)−1(xTL−1x(Ax−xλ))−(xTL−1xx)−112(xTx−1)
 xi+1−xi=η=−L−1x(Ax−xλ)+L−1xxδ

So with we simplify the updating equations to:

 δ=λi+1−λi=(2xTζ)−1(1+xTx)
 η=−x+ζδ=−x+ζ(2xTζ)−1(1+xTx)

From here

 xi+1=ζ(2xTζ)−1(1+xTixi)

We see is proportional to , a result known from classical Rayleigh iteration, and the equation for is exactly the Rayleigh equation. However the formula for is iterative. Starting with the general equation for the explicit system:

 L(x,λ)=F(x)−H(x)λ=0

From the full rank assumption has a left inverse (for example ). We can solve for :

 λ=H−F(x) (6)

In the eigenvector case this is exactly the Rayleigh quotient. This expression appeared very early in the multiplier method literature as discussed below.

We will show if we modify the Newton-Raphson process to use this expression to solve (Eq. 1), we get a generalization of the Rayleigh quotient iteration that also have quadratic convergent rate in general. We also provide a formula generalizing the Chebyshev iteration. We will show a similar process exists for implicit equation, in particular for nonlinear eigenvalue problem and provide both the RQI and Rayleigh-Chebyshev results, in particular recover the cubic convergent algorithm of Schwetlick and Schreiber ([10], [17] [18]

Many problems in numerical analysis and optimization problems could be reduced to the form (Eq. 2). We will apply the results of this paper to a number of situations. In most cases the constraints are at most quadratic, and in many cases the function is also at most quadratic. In those cases, second order methods would have simple form which we will try to exploit. In particular, we have a new cubic algorithm for eigenvectors of nonnormal matrices.

In future research we hope to study inequality constraints and applications of this method to optimization problem.

In the next few sections we discuss a few set ups needed to state the main results.

3 Higher derivatives as tensors

The reader can consult [3] for this section. We use slightly different notations in this paper. Recall we can use tensors to denote linear maps between two vector spaces each represented as tensor. If the domain space is of shape and the range space is of shape then a map between them could be represented as a tensor of shape . The map sending a tensor to the tensor formed by contracting to the right is the linear map represented by .

If is a (may be nonlinear) map between two vector spaces and then its Jacobian is a map in of linear maps between and and is represented by a -matrix.

A second derivative is a linear map between and , an element of and can be represented as tensor. We will denote this map as well as this tensor as . In general, we will denote the derivatives as and this is an element of (with copies of and one copy of . We can represent it as a tensor of size .

For vectors consider the tuple

 [η1,η2,…,ηl]

we can define

 T[η1η2…ηl]=(⋯((Tη1)η2)…)ηl

which is the repeated contraction of . We write

 η[l]=[η,η,⋯,η]

times. With this notation we can write the Taylor series expansion up to order around as:

 F(v)+JF(v)(h)+12JF(v)(h[2])+⋯1l!J(l)F(v)(h[l])

where .

To summarize, there are two maps related to higher derivatives. The map from to is generally nonlinear resulting in a tensor. For a fixed , that tensor gives a multilinear map acting on the tangent space which is embedded in , sending to . In code, we need two functions for these two maps. The second map is in general just tensor contraction however depending on problem a custom implementation may be useful.

4 Retractions

Consider a submanifold of . Recall the definitions of retractions from [2]:

• A first order retraction is a map from to around a point if there exists a neighborhood of in such that:

1. and the restriction is of class .

2. for all

• A second order retraction on is a first order retraction on that satisfies for all ,

 d2dt2R(x,tu)|t=0∈NM(x) (7)

is the normal space at . The exponent map is a second order retraction. It is shown in that paper that projection and orthographic projections are second order retractions. The following is clear:

Proposition 1

If is a retraction on then is a retraction on , if is a retraction of second order then is a retraction of second order.

From this proposition and the result of [2]), we can retract intermediate iteration points to , as a result can be made elements of while is unchanged. Following the common literature, we call a Newton-Raphson iteration on with no retraction to infeasible start iteration. When there is retraction to we have feasible start iteration. We note Lagrange multiplier with feasible improvement has already been studied been studied, e.g. in [8].

5 Newton-Raphson iteration of Lagrangians

We first examine the Newton-Raphson iteration. The Jacobian of is

 (8)

We will drop and from time to time to save space in the expressions. Newton-Raphson iteration in the framework of constraint optimization has been studied by many authors in the literature, including [7], [8], [9], [16], [19]. We will make the connection between with the eigenvector equation more explicit, and show converges to the Rayleigh quotient expression. We transform the updating equations to a format closer to one derived from Riemannian Newton optimization which motivates the RQI in the next section.

To invert this Jacobian without inverting the whole matrix, we describe two approaches. One approach would focus on first parametrize using the equation . If the tangent space has an explicit description this would help, otherwise it is parametrized by the null space of ( where is a basis of the null space and ’s are the new parameters). This will help reduce the number of variables to . Substitute back to the first equation we get a system of equations and variables ( from and from ) which we can solve. This solves the problem in the most general case, including the case where is not invertible and is closely related to the Riemannian Newton approach on the embedded manifold defined by . We present a second approach, already investigated in [[16]], [19]] where we assume is invertible. This approach reveals some interesting relations mentioned above. As we use the Schur complement formula in this second approach, we will call this solution the Schur form. The approach to parametrize first is called the tangent form. In a sense, the classical Rayleigh iteration and its extension in [1] is a Schur form solution for a modified Newton-Raphson equation.

The Schur complement with respect to the top block is evaluated at and the inverse of the Jacobian applied on is

 J−1L(ab)=(L−1xa−L−1xLλ[(JCL−1xLλ)−1JCL−1xa−(JCL−1xLλ)−1b](JCL−1xLλ)−1JCL−1xa−(JCL−1xLλ)−1b) (9)

evaluated at . With and the Newton step is with

 δ=−(JCL−1xLλ)−1JCL−1xL(x)+(JCL−1xLλ)−1C(x) (10)
 η=−L−1xL(x)−L−1xLλδ (11)

On a feasible starting point and we thus have:

 δ=−(JCL−1xLλ)−1JCL−1xL(x) (12)

We also note from the second row block of the system in that case. For nonlinear eigenvalue problem the above process is the Nonlinear inverse iteration in the literature ([12], [10]). We will review this in Section 7.4. We note that Schur complement is widely use in equality constraint optimization, for example see chapter 10 of [6] where we see essentially the above calculation. We summarize the steps in algorithm Algorithm 1.

When is explicit, and are two linear maps:

 Lxη=JF(x)η−JH(x)ηλ
 Lλδ=−Hδ

actually has a simpler form. Write it as

 ∑b∑cJabcηcλb=∑c(∑bλbJabc)ηc

we see it is . Here, means transposing with respect to the first two indices.

 Lx(x,λ)η=(JF−λJT[12]H)η (13)

is a generalization of the operator of eigenvalue problem. We do not need the full inverse of in general, but will need to solve for for some matrix in each iteration. We collect all the result thus far in theorem (Theorem 1)

Theorem 1

The Newton-Raphson iteration equations for are

 λi+1−λi=δ=(JCL−1xLλ)−1[C(X)−JCL−1xL(x)] (14)
 xi+1−xi=η=−L−1x(L+Lλδ) (15)

If then we have

 JCη=0

In the explicit case we have

 Lλ(x,λ)=−H(x)
 Lx=JF−λJT[12]H
 λi+1=γ=(JCL−1xH)−1[−C(X)+JCL−1xF(x)] (16)
 xi+1−xi=η=L−1x(−F(x)+H(x)γ) (17)

Proof

We already proved the implicit case. The explicit case is a direct substitution of in to (Eq. 10).

We note the is updated first, then is used in equation for :

 Lxη=−F(x)+H(x)γ (18)

While we have noted before is a generalization of the eigenvalue operator, the right hand side of this equation is different from that of Rayleigh quotient. To compute and we compute

 ζ=Lx(xi,λi)−1H(xi)
 ν=Lx(xi,λi)−1F(xi)

In the eigenvector case and are related:

 ν=x+ζλ

and we have seen the calculation get simplified. In the general case we need to compute both and . The expression of is iterative.

Let us now focus on getting a different expression on . From (Eq. 17)

 H(x)γ=Lxη+F(x)

As before let be a left invert to , we solve for :

 γ=H−[Lxη+F(x)] (19)

We see as converges to zero converges to

 γ=H−F(x)

as noted before. may be of a more general form than , for example we can replace with any map such that is invertible. This general set up is relevant when we discuss the two-sided Rayleigh quotient.

We note that the papers [7], [8] found the same expression for the general Rayleigh quotient as an estimate of the Lagrange multiplier, together with a related quasi-Newton method. The special case of , appeared much earlier in the literature (for example [11]). According to [19] it is difficult to put a name on it without an extensive search of the literature. Lagrange multiplier approximators are widely used in constrained optimization.

Let . This is a projection. If then it is a projection to the tangent space of at , but in general it is not. Substitute the expression (Eq. 19) in (Eq. 18) and moving term to one side

 ΠHLxη=−ΠHF(x) (20)

This system of equations for is of similar format to the Riemannian Newton equations. However, it is dependent on which needs to be solved recursively. It is interesting that it involves only here, and not . In the implicit case, assuming is a left inverse of we have with

 ΠLλLxη=−ΠLλL

Thus far the analysis is mostly algebraic, using the Schur complement formula and the special form of .

For second order iteration we note the formula for Chebyshev iteration (we choose Chebyshev over Halley to avoid another operator inversion) for high dimension is given by

 vi+1=vi−J−1LL−12J(2)L((J−1LL)[2]) (21)

As explain before with and , is an element of and is an element of and the last term is a contraction of the tensor twice on . We note that

 J(2)L((ηδ)[2])=(J(2)F(η[2])−J(2)H(η[2])λ−2JH(η)δJ(2)C(η[2]))

We proceed to use Schur complement to evaluate the second order term to arrive at Chebyshev iteration in algorithm Algorithm 3.

While this looks relatively complex, when , and are at most quadratic the algorithm may be useful. In particular, we have a cubic convergent algorithm for eigenvectors, even when the matrix is not normal: the only nonzero terms are and in the Chebyshev expression.

When is a vector and is represented as a matrix, is a matrix and can be represented as a square matrix, so the calculation is simple. When is a matrix, is could be a higher order tensor, so and in general are complex tensors. The main difficulty of this method is in evaluating these tensors and the inverse of the Schur complement .

6 Rayleigh Quotient Iteration

Motivated by [1] (we learned about [7], [8] late in our research) and the above analysis on Lagrange multipliers, with

 λ=R(x):=H−F(x)

in the expression for , it is plausible that the system:

 ΠHLxη= −ΠHF(x) (22) JCη= 0

would provide a generalization of RQI for to vector Lagrangians. Using augmented Lagrangian technique, [8] proposed a quasi-Newton method with this expression of as an estimate for the Lagrange multiplier. He showed that it converges superlinearly in general. We prove our particular process has quadratic convergence and extend our result to to the Chebyshev case.

Similar to the Newton-Raphson case, if is invertible we have a solution to this system providing a procedure generalizing the classical Rayleigh quotient. In fact, let and we see

 η=ζ(JCζ)−1JCν−ν (23)

satisfies the above equation by direct calculation. As before we call it the Schur form solution. For the eigen problem this provides exactly the classical algorithm. Likewise, we will see later on the invariant subspace RQI of [1] is the Schur form of an equivariant constrained problem on Grassmann manifolds.

We note is now solely dependent on via Rayleigh quotient expression for . We can reparametrize to express as a map from the tangent space of the constraint set to the image of , both of the same dimension . For the case where and ,

 ΠH(x)=IEout−(JCJTC)−1JCJTC
 Lxη=JFη−(JCJTC)−1JCF(JT[12]H)η

( is a just a collection of Hessians for each constraint equation of ) and the equation reduces to the emmbedded Riemannian Newton updating equation. When the last expression is just the expression for the projected Hessian of . We have the following theorem:

Theorem 2

Assuming that having continuous derivatives to degree , and in a neigborhood of a solution to the equation the map from the tangent space to to the image of is invertible satisfying

 ||ΠH(xi)Lxψ||≥C||ψ|| (24)

for any in for some constant . If is a first order retraction then for a starting point close enough to , the system (Eq. 22) provides an update to an iteration process which converges with quadratic rate to a solution of (Eq. 3).

Assuming (Eq. 24), having continuous derivatives to degree and have continuous derivatives to degree . If is a second order retraction, then for a starting point close enough to the Rayleigh-Chebyshev iteration with update

 xi+1=r(xi,η∗+T(x)(η[2]∗))

where and are defined in the following equations

 ΠHLxη∗=−ΠHF(x)ΠHLxT(x)(η[2]∗)=−12J(2)F(η[2]∗)−JH(η∗)JR(η∗)+12J(2)H(η[2]∗)R(xi)JC(η∗+T(x)(η[2]∗))=0 (25)

converges cubically to .

If the Rayleigh quotient iteration converges cubically up to degree .

If is invertible the Schur form solution exists

 ν=L−1xF(x)ζ=L−1xH(x)η=−ν+ζ(JCζ)−1JCνη∗=−ντ∗=L−1x{−12J(2)F(η[2]∗)−JH(η∗)(JR(η∗)+12J(2)H(η[2]∗)R(xi)}τ=τ∗−ζ(JCζ)−1JC(τ∗) (26)

Proof

Both algorithms require similar estimates. For optimization problem it is just the Riemannian Newton iteration, so this theorem could be considered an extension to vector Lagrangians. We focus on Rayleigh-Chebyshev as regular Rayleigh is the easier version and we will prove it again in the implicit case below. Let be a solution of (Eq. 3). As this is a variant of Newton-Raphson and Chebyshev, implicitly we look at Taylor series expansion of with modifications to make sure the steps belong to the tangent space () and note will remove some extra terms. When the Lagrangian is explicit, the Taylor series expansion with term gets cut off early and the second order term would include second derivative with respect to , as well as the cross term of and . This is the famous cross term in Rayleigh quotient literature linking the estimate of eigenvalues to eigenvector and is responsible for the quadratic convergence rate (see [15]). We have:

 H(xi)R(xi)−H(v)R(v)=−(H(xi)−H(v)(R(xi)−R(v))+ (27) H(xi)(R(xi)−R(v))+(H(xi)−H(v))R(xi)

This is purely algebraic (Taylor series of the two variable vector function ). The first term will converge quadratically while the other two are linear and will be made disappear with an appropriate iteration. We also note the first term is cubic if converges quadratically. Also :

 L(x,R(x))=F(x)−H(x)R(x)=ΠH(x)F(x) (28)

So if an iteration step is to cancel out , then also cancel out the same term as . As sends terms of form to zero, we get a simpler equation to solve.

With these two observations, using the notation of higher derivatives as tensor discussed previously we have the Taylor series expansion

 F(v)=F(xi)+JF(xi)(v−xi)+12J(2)F(xi)(v−xi)[2]+O(||v−xi||)3

From here

 −F(xi)=−F(v)+JF(xi)(v−xi)+12J(2)F(v−xi)2+O(||v−xi||)3

Replace on the right hand side with and add to both sides :

 12J(2)F(v−xi)2+O(||v−xi||3)

Using (Eq. 27) and (Eq. 28), with

 −ΠHF(xi)=−ΠH(H(xi)−H(v))(R(xi)−R(v))+ΠH(H(xi)−H(v))R(xi)+ ΠHJF(xi)(v−xi)+12ΠHJ(2)F(v−xi)2+O(||v−xi||3)

We have

 0=ΠHF(xi)−ΠH(H(xi+1)−H(v)+(H(xi)−H(xi+1))(R(xi+1)−R(v)+R(xi)−R(xi+1))+ ΠH(H(xi+1)−H(v)+H(xi)−H(xi+1))R(xi)+ ΠHJF(xi)(v−xi+1+τ)+12ΠHJ(2)F(v−xi+1+τ)2+O(||v−xi||3)

Replacing:

 H(xi+1)−H(xi)=JHτ+12J[2]H(τ[2])+O(τ3)
 R(xi+1)−R(xi)=JHτ+O(τ2)

in the above expression, expanding and separate the terms on the right hand side to two groups, one with terms containing a factor of form