# Optimal Subspace Expansion for Matrix Eigenvalue Problems

In this paper, we consider the optimal subspace expansion problem for the matrix eigenvalue problem Ax=λ x: Which vector w in the current subspace V, after multiplied by A, provides an optimal subspace expansion for approximating a desired eigenvector x in the sense that x has the smallest angle with the expanded subspace V_w=V+ span{Aw}? Our research motivation is that many iterative methods construct nested subspaces that successively expands V to V_w. Ye (Linear Algebra Appl., 428 (2008), pp. 911–918) studies the maximization characterization of cosine between x and V_w but does not obtain the maximizer. He shows how to approximately maximize the cosine so as to find approximate solutions of the subspace expansion problem for A Hermitian. However, his approach and analysis cannot extend to the non-Hermitian case. We study the optimal expansion problem in the general case and derive explicit expressions of the optimal expansion vector w_opt. By a careful analysis on the theoretical results, we obtain computable nearly optimal choices of w_opt for the standard, harmonic and refined (harmonic) Rayleigh–Ritz methods.

## Authors

• 8 publications
06/02/2017

### An improved Krylov eigenvalue strategy using the FEAST algorithm with inexact system solves

The FEAST eigenvalue algorithm is a subspace iteration algorithm that us...
11/15/2020

### On Relaxed Filtered Krylov Subspace Method for Non-Symmetric Eigenvalue Problems

In this paper, by introducing a class of relaxed filtered Krylov subspac...
10/14/2021

### A Theory of Quantum Subspace Diagonalization

Quantum subspace diagonalization methods are an exciting new class of al...
11/07/2019

### Linear Constrained Rayleigh Quotient Optimization: Theory and Algorithms

We consider the following constrained Rayleigh quotient optimization pro...
07/16/2021

### Near-Optimal Algorithms for Linear Algebra in the Current Matrix Multiplication Time

Currently, in the numerical linear algebra community, it is thought that...
11/09/2018

### Sequential Subspace Changepoint Detection

We consider the sequential changepoint detection problem of detecting ch...
06/18/2021

### A note on augmented unprojected Krylov subspace methods

Subspace recycling iterative methods and other subspace augmentation sch...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Consider the large scale matrix eigenproblem with and , where is the 2-norm of a vector or matrix. We are interested in a simple eigenvalue , which is either the largest in magnitude or closest to a given target , and/or the associated unit-length eigenvector . A number of numerical methods have been available for solving this kind of problem [1, 11, 12, 19, 17]. Arnoldi type methods [1, 12, 17] are well known methods for this purpose, and they are projection methods on the Krylov subspace

 Vm=Km(A,v1)=span{v1,Av1,…,Am−1v1}

and compute approximations to and . For Hermitian, Arnoldi type methods reduce to Lanczos type methods [1, 11]. Suppose that is column orthonormal and is generated by the Lanczos process in the Hermitian case or the Arnoldi process in the non-Hermitian case. The next basis vector is obtained by orthogonalizing against , and the projection matrix of onto is Hermitian tridiagonal or upper Hessenberg, where the superscript denotes the conjugate transpose of a vector or matrix.

The success of a general projection method requires that , which is not necessarily a Krylov subspace, contains accurate enough approximations to , but a sufficiently accurate cannot guarantee the convergence of Ritz vectors or harmonic Ritz vectors obtained by the standard or harmonic Rayleigh–Ritz method [3, 8]. To correct the possible irregular convergence and non-convergence, the refined and refined harmonic Rayleigh–Ritz methods have been proposed, which are shown to ensure the convergence of refined and refined harmonic Ritz vectors when is good enough [4, 5, 8, 17, 20].

Our objective is that, for a given subspace that does not contain sufficient accurate approximations to the desired eigenvector , how to best expand to by adding to with such that contains approximations to as accurately as possible. We comment that if is already an invariant subspace containing or then the subspace expansion naturally terminates. We will exclude these two lucky and rare events throughout the paper. As we have seen, starting with the -dimensional Krylov subspace , if , we are led to . In fact, provided that is not deficient in , it can be justified that . More generally, if we replace by with an arbitrary constant, we have . We will rigorously prove this assertion later.

However, the situation becomes different and complicated when the matrix-vector product is computed inaccurately, that is, when perturbations or errors are allowed in the process. In this case, is not a Krylov subspace any longer, and the projection matrix is generally full and loses the Hermitian tridiagonal or Hessenberg structure for Hermitian or non-Hermitian. Therefore, the choice of should become essential to subspace expansion. Different may affect the quality of substantially. As a matter of fact, van der Vorst [18] has observed that a variant of the Arnoldi algorithm that expands the subspace using a Ritz vector rather than is more robust under perturbations. Specifically, when the error in with being the Ritz vector is not small at a certain iteration, Ritz approximations at subsequent iterations are almost unaffected, while the Arnoldi type algorithms stagnates at the error level in computing . This variant of the Arnoldi method is called the Residual Arnoldi (RA) method [9, 10, 18]. If is a refined Ritz vector obtained by the refined Rayleigh–Ritz method, we have the refined RA method, which is justified to be more effective than the RA method [21]. We refer the reader to [22, 9, 10, 21] for more experimental phenomena and descriptions.

It is known that the Lanczos and Arnoldi type methods generally favor extreme and exterior eigenvalues but are not effective for computing interior eigenvalues and the associated eigenvectors. Suppose that the eigenvalue nearest to a given target and the associated eigenvector

are required. Then shift-invert Arnoldi type methods are commonly used, which apply the original methods to the shift-invert matrix

and find an approximation to . It now computes by orthogonalizing against whose columns form an orthonormal basis of , leading to . To achieve this, at outer iteration , one has to solve the inner linear system

 (1) (A−σI)u=vm.

Similarly, if we replace by with an arbitrary constant and is not deficient in , then .

A Shift-Invert Residual Arnoldi (SIRA) type method is an alterative of the RA type methods, proposed in [9, 10] and studied in [6, 7, 9, 10], and it expands the current subspace by solving a linear system like (1) at each iteration where the right-hand side of (1) is replaced by the residual of an approximate eigenpair, e.g., Ritz pair, harmonic Ritz pair or refined (harmonic) Ritz pair, which is obtained by the standard, harmonic or refined (harmonic) Rayleigh–Ritz method that projects the original other than onto . A Jacobi–Davidson (JD) type method with the fixed shift [15] requires to solve a correction equation at each iteration, which has been shown to be mathematically equivalent to the inner linear system in SIRA [6, Theorem 2.1]. It has been proved that the exact solutions to the inner linear systems involved in these two kinds of methods are of the form of with some scalar [6, 7]. This means that for these methods when inner linear systems are solved exactly starting with .

Since is large, inner linear systems are typically solved inaccurately by iterative solvers. For a detailed theoretical analysis on the least accuracy requirement on inner iterations, i.e., the maximum errors allowed in computing with being the approximating Ritz vector, harmonic Ritz vector or refined (harmonic) Ritz vector, we refer the reader to [6, 7]; see also [2]

for the extensions of JD type methods to the singular value decomposition (SVD) computations. A remarkable conclusion drawn in these papers is that provided that inner linear systems are solved with low or modest accuracy, e.g.,

, the expanded subspace has a similar accuracy as that by solving the inner linear systems exactly, and the resulting inexact eigensolvers use almost the same outer iterations to converge as the correspondingn exact eigensolvers where inner linear systems are solved exactly. These results provide rigorous theoretical supports on van der Vorst’ observations in [18].

There are two fundamental changes when is computed inaccurately when expanding subspaces at each iteration. First, is not a Krylov subspace any longer. Second, a remarkable distinction between SIRA or JD type methods and shift-invert Arnoldi type methods is that large perturbations in are now allowed [6, 7] in the former methods [6, 7], while the inner linear system (1) in the latter ones has to be solved with high accuracy for iterations until the approximate eigenpairs start to converge and afterwards its solution accuracy can be relaxed in an inversely proportional manner to the residual norms of approximate eigenpairs [13, 14, 21]. Combining with the aforementioned results on JD and SIRA type methods, we conclude that at each iteration the subspace expansion in JD and SIRA type methods costs much less than it does in inexact shift-invert Arnoldi type methods. As a result, JD and SIRA type methods are more promising and efficient for finding an interior eigenpair of than inexact shift-invert Arnoldi type methods.

For a non-Krylov subspace , as we have seen, different lead to different , and the quality of crucially depends on the choice of . It is well known that the quality of a projection subspace for eigenvalue problems is measured by its angle with the desired eigenvector [11, 12, 17]. An important question naturally arises: Given a general subspace , which vector , multiplied by or , is an optimal expansion vector for in the sense that the desired has the smallest angle with over ? Since the results involve a-priori unaccessible in practice, we naturally concern a computational issue: Which vectors are computable nearly optimal expansion ones within the framework of the standard, harmonic and refined (harmonic) Rayleigh–Ritz methods? Since an optimal expansion vector is independent of its scaling, we speak of it up to scaling. Clearly, optimally expanding means that improves over as much as possible, so that or can make the greatest contribution to and thus .

Since and play the same role in the optimal subspace expansion, it suffices to consider only. All the results and analysis on trivially apply to as well. We measure the quality of a given subspace (Hereafter we drop the subscript ) for approximating the unit-length by

 (2) cos∠(V,x)=maxz∈V,z≠0cos∠(z,x)=maxz∈V,z≠0∣xHz∣∥z∥∥x∥,

where denotes the acute angle between and . For a nonzero , define the expanded subspace

 (3) Vw=V+span{Aw}.

The optimal subspace expansion problem is then to find a that maximizes or equivalently minimizes , i.e., to find the optimal expansion vector

 (4) wopt=argmaxw∈V,w≠0cos∠(Vw,x)=argminw∈V,w≠0sin∠(Vw,x).

Ye in [22] considers the optimal subspace expansion problem

 (5) maxw∈Vcos∠(Vw,x)

and finds an explicit expression of for a general ; see Theorem 1 of [22]. His result, however, provides no way or hint of how to choose a good in practice, as he has elaborated clearly. To this end, he turns to consider the case that is symmetric (Hermitian) and derives a characterization of (5) for a given , but he does not find the solution of (5). Ye formulates (5) as a max-max problem, but does not find an analytic maximizer of the max-max problem. By an intuitive analysis, he suggests to use the Ritz or refined vector as practical choices of by analyzing the maximization characterization of for a given and Hermitian. Unfortunately, Ye’s approach and analysis cannot apply to a general , and no optimal or nearly optimal expansion vectors can thus be obtained in the general case. For symmetric (Hermitian) eigenvalue problems, when the initial subspace is non-Krylov, Ye has presented experiments to show the superiority of using the Ritz vector other than the newly basis vector to expand . For a general , Wu [21] proposes to use a refined Ritz vector [4, 8] as to expand and presents some theoretical justifications for this choice. He has made numerical experiments to justify the advantage of the refined RA method over the RA method and Arnoldi type methods.

In this paper, we consider the optimal subspace expansion problem for general. We first establish a maximization characterization of for a given , which is a generalization of Ye’s in the non-Hermitian case. Then we analyze the characterization and find the solution to this maximization problem. We establish two explicit forms of , which were unknown in the Hermitian case. We then go on to derive the relationships between and and between and . Based on them, we ultimately find an exact solution, i.e., the optimal expansion , to the max-max problem . By analyzing the results, we obtain computable nearly optimal choices of within the framework of the standard, harmonic and refined (harmonic) Rayleigh–Ritz methods, which are shown to be the Ritz vector, the harmonic Ritz vector and the refined (harmonic) Ritz vector, respectively. The importance of our results is twofold: first, for a non-Krylov initial subspace, the RA, SIRA and JD type methods expand their subspaces in a computationally optimal way. Second, in contrast to inexact Arnoldi and shift-invert Arnoldi type methods, much larger errors are allowed when computing in SIRA and JD type methods, so that one can use much less cost to iteratively solve inner linear systems approximately.

The paper is organized as follows. In Section 2, we consider the optimal subspace expansion problem, give it explicit solutions, and make an analysis on the results. Based on them, in Section 3 we show what computable nearly optimal replacements of are for the standard, harmonic and refined (harmonic) Rayleigh–Ritz methods. In Section 4, we conclude the paper.

Throughout the paper, denote by

the identity matrix with the order clear from the context, and by

the complex space of dimension . We use Greek letters to denote scalars and lowercase and capital Roman letters to denote vectors and matrices, respectively.

## 2 Optimal subspace expansion vector

We first establish three characterizations of . The first characterizes it as a maximization problem, the second gives the explicit solution to this problem, and the third expresses in terms of the cosine of and a specific vector in .

###### Theorem 2.1.

For with , assume that and , define with , and let the columns of form an orthonormal basis of the orthogonal complement of with respect to . Then

 (6) cos∠(Vw,x)=maxb∈V,b≠0cos∠(b,x)sin∠(b,rw)=cos∠(bwopt,x)sin∠(bwopt,rw)=cos∠(QwQHwbwopt,x)

with

 (7) bwopt=V(QwQHwV)†x=V(VHQwQHwV)−1VHx∈V,

where the columns of form an orthonormal basis of and the superscript denotes the Moore-Penrose generalized inverse.

###### Proof.

Note that any but can be written as

 a=b+βAw,

where , , and is a nonzero scalar. By definition, since and , we obtain

 cos∠(Vw,x) = maxa∈Vw,a≠0|xHa|∥a∥ = maxb∈V,β≠0,b+βAw≠0|xH(b+βAw)|∥b+βAw∥ = maxb∈V,β≠0,b+βAw≠0∣∣xH(b+βϕw)∣∣∥(b+βϕw)+β(A−ϕI)w∥.

Define , which belongs to as , and recall . Then, by the definition of cosine between two nonzero vectors, we have

 (8) cos∠(Vw,x) = maxb′∈V,b′+βrw≠0∣∣xHb′∣∣∥b′+βrw∥ = maxb′∈Vmaxβ∣∣xHb′∣∣∥b′+βrw∥=maxb′∈V∣∣xHb′∣∣minβ∥b′+βrw∥ = maxb′∈V∣∣xHb′∣∣∥∥∥b′−rHwb′∥rw∥2rw∥∥∥ = maxb′∈V∣∣xHb′∣∣∥b′∥sin∠(b′,rw) = maxb′∈Vcos∠(b′,x)sin∠(b′,rw).

Replacing by establishes the maximization characterization in (6).

We now seek the maximizer of the maximization problem in (6). Since , by the definition of , there exists a vector such that the unit-length eigenvector with . As a result, we obtain , and

 cos∠(b,x)=|xHb|∥b∥=|(Qwzw)Hb|∥b∥=|zHw(QHwb)|∥b∥.

Since , it follows from the above that

 cos∠(b,x) = ∥QHwb∥∥b∥|zHw(QHwb)|∥QHwb∥ = sin∠(b,rw)cos∠(QHwb,zw).

Therefore, from (8) and the above as well as the orthonormality of and , we obtain

 cos∠(Vw,x) = maxb∈V,b≠0cos∠(QHwb,zw) = maxb∈V,b≠0cos∠(QwQHwb,Qwzw) = maxy∈Cm,y≠0cos∠(QwQHwVy,x)

with . This amounts to , i.e., seeking the best approximation to from the column space of . As a result, solves the least squares problem

 (9) miny∈Cm∥x−QwQHwVy∥.

We next show that the solution to the problem (9) is unique. For , we need to prove that has full column rank, that is, the cross-product matrix is nonsingular by noting that . To this end, it suffices to prove that the solution of the homogenous linear system is zero. Since , we have

 (I−rwrHw∥rw∥2)Vz=0,

i.e.,

 rwrHwV∥rw∥2z=Vz.

Therefore, we obtain

 (10) VTrwrHwV∥rw∥2z=z.

Note that

 ∥∥∥VTrwrHwV∥rw∥2∥∥∥=∥VHrw∥2∥rw∥2=cos2∠(V,rw).

Taking norms in two hand sides of (10) gives

 ∥z∥cos2∠(V,rw)≥∥z∥,

which holds only if or . The latter means that , i.e, , a contradiction to our assumption. Hence we must have . Therefore, the problem (9) has the unique solution

 yw=(QwQHwV)†x=(VHQwQHwV)−1VHx.

Here in the second equality we have exploited . Since , we have proved the second and third relations in (6) with satisfying (7). ∎

Remark 1. For Hermitian, we have , the desired eigenvalue of . In this case, Theorem 2 in [22] is the first relation in (6), i.e., the maximization characterization. The second and third relations in (6) are new even for Hermitian, and they find the maximizer of the maximization problem and simplify to a relatively simpler of two vectors. For Hermitian, based on the maximization characterization in (6), Ye has shown how to approximately maximize , and argued that the Ritz or refined Ritz vector may be a good approximate solutions to .

Remark 2. Notice that . Then from (7) and the definition of it is straightforward to verify that . Relation (7) indicates that is the orthogonal projection of onto the column subspace of . The third relation in (6) means that is just the angle of and this orthogonal projection. As a result, minimizing over amounts to minimizing the angle between and over .

Let be the unit-length vector obtained by orthogonalizing against :

 (11) vw=(I−PV)Aw∥(I−PV)Aw∥,

where is the orthogonal projector onto . Then the columns of form an orthonormal basis of the expanded subspace . Denote by the orthogonal projector onto . Then . We can derive an explicit relationship between and , showing how much the expanded subspace is improved over . The relationship will form our basis of finding the solution, i.e., the optimal expansion vector , to the optimal subspace expansion problem (5).

###### Theorem 2.2.

Assume that , and . It holds that

 (12) cos∠(Vw,x)=√cos2∠(V,x)+cos2∠((I−PV)rw,x),

where is defined as in Theorem 2.1.

###### Proof.

Since is the orthogonal projection of onto , by definition we have

 cos∠(Vw,x)=cos∠(PVwx,x)=∥PVwx∥.

From , and , exploiting the orthogonality of and , we obtain

 cos∠(Vw,x) = |xH(PV+vwvHw)x|∥(PV+vwvHw)x∥ = ∥PVx∥2+|vHwx|2√∥PVx∥2+|vHwx|2 = √cos2∠(V,x)+cos2∠(vw,x)

as . ∎

Remark 3. This theorem is different from Lemma 1 established by Ye in [22]. The vector defined in [22] is orthogonal to and different from ours, and it is of the form with and is not necessarily the residual of an approximate eigenpair. If is taken as a Ritz value and is the Ritz vector, then . However, the residual of harmonic Ritz pair or refined (harmonic) Ritz pair does not satisfy such orthogonality requirement. Our is orthogonal to the desired but not orthogonal to generally.

This theorem shows that is simplified as , starting with which can we obtain our ultimate result on the optimal expansion vector .

The following result establishes the sine version of Theorem 2.2, which clearly shows how fast decreases relative to .

###### Theorem 2.3.

Assume that , and . Then

 (13) sin∠(Vw,x)=sin∠(V,x)sin∠(vw,x⊥),

where the unit length vector

 (14) x⊥=(I−PV)x∥(I−PV)x∥.
###### Proof.

From the definition of and , we have

 (15) sin2∠(V,x)−sin2∠(Vw,x) = ∥(I−PV)x∥2−∥(I−PVw)x∥2 = 1−∥PVx∥2−(1−∥PVwx∥2) = −∥PVx∥2+∥PVwx∥2 = −∥PVx∥2+∥PVx∥2+∥vwvHwx∥2 = |vHwx|2.

Note that and . Then it follows from (14) and that

 vHwx=vHw(I−PV)x=∥(I−PV)x∥vHwx⊥=sin∠(V,x)vHwx⊥.

From this relation and (15), we obtain

 sin∠(Vw,x)sin∠(V,x) =  ⎷1−(|vHwx|sin∠(V,x))2 =  ⎷1−(sin∠(V,x)|vHwx⊥|sin∠(V,x))2 = √1−cos2∠(vw,x⊥) = sin∠(vw,x⊥),

which proves (13). ∎

This theorem shows that exactly decreases by the factor relative to . From (4), the optimal expansion vector minimizes and thus over . As and , from (11) and (14) we have

 ∠(vw,x⊥)=∠((I−PV)Aw,(I−PV)x)

and

 ∠((I−PV)rw,x)=∠((I−PV)Aw,(I−PV)x).

Therefore, Theorems 2.22.3 indicate that the optimal expansion vector

 (16) wopt=argminw∈Vsin∠((I−PV)Aw,(I−PV)x)=argmaxw∈Vcos∠((I−PV)Aw,(I−PV)x).

Let

be an unitary matrix. Then

and . Suppose that the dimension of satisfies . Then the matrix pair

 (17) {VHAHx⊥xH⊥AV, VHAH(I−PV)AV}={VHAHx⊥xH⊥AV, VHAHV⊥VH⊥AV}

restricted to the orthogonal complement of the null space of is Hermitian definite because is of full column rank when restricted to this orthogonal complement. This means that the range restricted is Hermitian positive definite. Note that

 N⊥(VH⊥AV)=R(VHAHV⊥)

is the column space or range of . We can write such range restricted matrix pair as

 (18) {VHAHx⊥xH⊥AV, VHAH(I−PV)AV}∣∣R(VHAHV⊥),

which maps and to and , respectively.

Next we derive an explicit expression of the optimal expansion vector .

###### Theorem 2.4.

Assume that with not an invariant subspace of , , and . Then the optimal subspace expansion vector, up to scaling, is

 (19) wopt=Vyopt

where is an eigenvector of the range restricted Hermitian definite pair (18) associated with its largest eigenvalue. Furthermore, write

 (20) VHAH(I−PV)AV∣∣R(VHAHV⊥)=B∣∣R(VHAHV⊥).

Then the optimal expansion vector, up to scaling, is

 (21) wopt = V(B∣∣R(VHAHV⊥))−1VHAHV⊥VH⊥x (22) = V(VH⊥AV∣∣R(VHAHV⊥))†VH⊥x.
###### Proof.

We consider the solution of the second problem in (16). Write . Then by assumption, since is not an invariant subspace of , we have . Furthermore, since the rank of the matrix is one and the matrix is Hermitian positive definite, it follows from and (14) that

 cos2∠((I−PV)Aw,(I−PV)x) = cos2∠((I−PV)AVy,(I−PV)x) = |xH(I−PV)AVy|2∥(I−PV)x∥2∥(I−PV)AVy∥2 = yHVHAH(I−PV)((I−PV)xxH(I−PV)xH(I−PV)x))(I−PV)AVyyHVHAH(I−PV)AVy = yHVHAHx⊥xH⊥AVyyHVHAH(I−PV)AVy,

which is the Rayleigh quotient of the range restricted Hermitian definite pair (18) with respect to the nonzero vector . Therefore, by the min-max characterization of the eigenvalues of a Hermitian definite pair (cf. e.g., [16, p.281-2]), the solution to the maximization problem

 (23) maxw∈Vcos∠((I−PV)Aw,(I−PV)x)=maxycos∠((I−PV)AVy,(I−PV)x)

is an eigenvector of the range restricted Hermitian definite pair (18) associated with its largest eigenvalue , and the optimal expansion vector up to scaling.

We now prove (21) and (22). Notice that is a rank one Hermitian semi-positive matrix. Therefore, the range restricted Hermitian definite pair (18) has exactly one positive eigenvalue, and the other eigenvalues are zeros. By (20), the eigenvalues of the matrix pair (18) are identical to those of the standard rank one Hermitian matrix

 (24) (B∣∣R(VHAHV⊥))−12VHAHx⊥xH⊥AV(B∣∣R(VHAHV⊥))−12,

and the eigenvectors of the pair (18) are related with the eigenvectors of the matrix defined by (24) by . It is known that the unique positive eigenvalue of the matrix in (24) is

 μopt=∥(B∣∣R(VHAHV⊥))−12VHAHx⊥∥2

and its associated (unnormalized) eigenvector is

 ^yopt=(B∣∣R(VHAHV⊥))−12VHAHx⊥,

which, from (14), is