# A cross-product free Jacobi-Davidson type method for computing a partial generalized singular value decomposition (GSVD) of a large matrix pair

A Cross-Product Free (CPF) Jacobi-Davidson (JD) type method is proposed to compute a partial generalized singular value decomposition (GSVD) of a large matrix pair (A,B), which is referred to as the CPF-JDGSVD method. By computing the thin QR factorizations of two certain matrices, the method implicitly solves the mathematically equivalent generalized eigenvalue problem of a certain cross-product matrix pair but does not explicitly form the cross-product matrices and thus avoids the accuracy loss of the computed generalized singular values and generalized singular vectors. At each step, the right searching subspace is expanded by approximately solving a correction equation iteratively, called inner iterations, and the two left searching subspaces are constructed by premultiplying the right one by A and B, respectively. The extraction steps of approximate GSVD components with respect to given searching subspaces constitute outer iterations. A thick-restart CPF-JDGSVD algorithm with deflation is developed to compute several GSVD components, and some convergence results are established on inner and outer iterations, which are exploited to design practical stopping criteria for inner iterations. Numerical experiments illustrate the effectiveness of the algorithm.

## Authors

• 3 publications
• 8 publications
01/09/2022

### Two harmonic Jacobi–Davidson methods for computing a partial generalized singular value decomposition of a large matrix pair

Two harmonic extraction based Jacobi–Davidson (JD) type algorithms are p...
12/18/2019

### Cross product-free matrix pencils for computing generalized singular values

It is well known that the generalized (or quotient) singular values of a...
01/09/2022

### A FEAST SVDsolver for the computation of singular value decompositions of large matrices based on the Chebyshev–Jackson series expansion

The FEAST eigensolver is extended to the computation of the singular tri...
10/22/2020

### Fast Approximate CoSimRanks via Random Projections

Given a graph G with n nodes, and two nodes u,v in G, the CoSim-Rank val...
07/24/2019

### On choices of formulations of computing the generalized singular value decomposition of a matrix pair

For the computation of the generalized singular value decomposition (GSV...
01/10/2020

### The joint bidiagonalization process with partial reorthogonalization

The joint bidiagonalization(JBD) process is a useful algorithm for the c...
02/19/2021

### A comparison of eigenvalue-based algorithms and the generalized Lanczos trust-region algorithm for Solving the trust-region subproblem

Solving the trust-region subproblem (TRS) plays a key role in numerical ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The generalized singular value decomposition (GSVD) of a matrix pair is first introduced by van Loan [24] and then developed by Paige and Saunders [18]. It has become an important analysis means and computational tool [5], and has been used extensively in, e.g., solutions of discrete linear ill-posed problems [8], weighted or generalized least squares problems [2], information retrieval [10], linear discriminant analysis [19], and many others [1, 3, 5, 17, 23].

Let and with be large and possibly sparse matrices with , i.e., with and the null spaces of and , respectively. Denote and , , and and . Then the GSVD of the matrix pair is

 (1.1) {UTAX=ΣA=\diag{C,0l1,q1,Iq2},VTBX=ΣB=\diag{S,Iq1,0l2,q2},

where is nonsingular, and are orthogonal, and and are diagonal matrices that satisfy

 0<αi,βi<1andα2i+β2i=1,i=1,…,q;

see [18]. Here, in order to distinguish the generalized singular vector matrix blocks from each other, we have adopted the subscripts to denote their column and row numbers, and have denoted by and

the identity matrix of order

and zero matrix of order

, respectively. The subscripts of identity and zero matrices will be omitted in the sequel when their orders are clear from the context. It follows from (1.1) that , i.e., the columns of are -orthonormal.

The GSVD components and are associated with the zero and infinite generalized singular values of , called the trivial ones, and the columns of , and , form orthonormal and -orthonormal bases of , and , , respectively. In applications, one is interested in some nontrivial GSVD components in corresponding to the finite nonzero generalized singular values of . Denote by and the -th columns of , and , respectively, The quintuple is called a GSVD component of with the generalized singular value , the left generalized singular vectors , and the right generalized singular vector . We also refer to a pair as a generalized singular value of .

For a given target , assume that the nontrivial generalized singular values of are labeled as

 (1.2) |σ1−τ|≤|σ2−τ|≤|σℓ−τ|<⋯<|σℓ+1−τ|≤|σq−τ|.

Suppose that we are interested in and/or the corresponding generalized singular vectors . Such or is called an interior GSVD component of if is inside the spectrum of the nontrivial generalized singular values of ; otherwise it is called an extreme GSVD component.

For the computation of some extreme GSVD components of a large matrix pair , Zha [25] proposes a joint bidiagonalization method (JBD), which is a generalization of Lanczos bidiagonalization type methods for computing a partial singular value decomposition (SVD) of with . An advantage of the JBD method is that it works on the matrix pair directly without forming the cross-product matrices and . On the other hand, a large scale least squares problem with the coefficient matrix must be solved with high accuracy at each step of joint bidiagonalization. Jia and Yang [16] has made an analysis on this method and its variant, and provided more theoretical supports for its effectiveness. Zwaan and Hochstenbach [26] have proposed a generalized Davidson method (GDGSVD) and a multidirectional-type method (MDGSVD) to compute a few extreme GSVD components of . They design a fast truncation step in MDGSVD to remove a low quality search direction that is orthogonal to the desired generalized singular vector so as to ensure moderate growth of the searching subspace.

Particularly, Hochstenbach [9] proposes a Jacobi-Davidson type method, called JDGSVD hereafter, to compute an interior GSVD component of with full column rank. The JDGSVD method formulates the GSVD of as the generalized eigenvalue problem of the augmented matrix pair , extracts an approximation to the relevant eigenpair from the subspace that is the direct sum of the right searching subspace and one of the left searching subspaces, and then recovers the desired GSVD component from the converged eigenpair. When expanding the searching subspaces, one needs to approximately solve a correction equation iteratively, so that the approximate solution is split up into two vectors of lengths and , respectively, which are used to expand the current searching subspaces. The side effects of involving the cross-product matrix in JDGSVD method are twofold. On the one hand, in the extraction phase, one is required to compute a -orthonormal basis of the right searching subspace, which would be numerically unstable when is ill conditioned. On the other hand, it is shown in [11]

that the error of the computed eigenvector is bounded by the size of the perturbations with a multiple

. This means, with an ill conditioned , say , the computed eigenvector may have no accuracy even if the residual is at the level of machine precision, so that the recovered generalized singular vectors have no accuracy at all. The result holds for the generalized singular vectors corresponding to any and has been numerically confirmed [11]. We remark that all the eigenvalue-based type GSVD methods share this shortcoming of JDGSVD. For these reasons, in order to compute GSVD components accurately, it is appealing to avoid forming cross-product matrices explicitly and, instead, to work on and directly.

In this paper, we first propose a basic Cross-Product Free (CPF) Jacobi-Davidson (JD) type method for computing one, i.e., , GSVD component of a regular matrix pair , which is referred to as CPF-JDGSVD in the sequel. There are two novelties. Firstly, instead of constructing left and right searching subspaces separately or independently, given a right searching subspace, the CPF-JDGSVD method generates the corresponding two left searching subspaces by acting and on the right subspace, respectively, and constructs their orthonormal bases by computing two thin QR factorizations of the matrices that are formed by premultiplying and with the matrix consisting of the orthonormal basis vectors of the right subspace. Secondly, in the extraction phase, the method projects the GSVD of onto the left and right searching subspaces without involving the cross-product matrices and , and obtains an approximation to the desired GSVD component of by computing the GSVD of the small sized projection matrix pair. To be practical, we develop a thick-restart CPF-JDGSVD algorithm with deflation for computing several, i.e., , GSVD components corresponding to the generalized singular values closest to the target .

We prove that the resulting left searching subspaces have similar accuracy as the right subspace as long as the desired generalized singular value is not very small or large. Therefore, CPF-JDGSVD generates the left searching subspaces effectively. In order to expand the searching subspaces, we give a detailed derivation of certain

correction equations whose solutions expand the searching subspaces effectively. The correction equations are supposed to be approximately solved iteratively, called inner iterations, so that CPF-JDGSVD is an inner-outer iterative method with extraction steps of approximate GSVD components called outer iterations. We establish a convergence result on the approximate singular values in terms of computable residual norms. Meanwhile, we analyze the conditioning of the correction equations and establish estimates for their condition numbers in the asymptotic sense, which critically affect the solution accuracy of the correction equations. Besides, we derive some results on inner iterations of CPF-JDGSVD. Based on these, we propose practical stopping criteria for inner iterations, making the computational cost of inner iterations be minimal at each outer iteration and guarantee that CPF-JDGSVD behaves like the exact CPF-JDGSVD where the correction equations are solved exactly.

In Section 2, we propose the CPF-JDGSVD method and present some important theoretical results on its rationale and convergence. In Section 3, we establish some properties of the correction equations and make a convergence analysis on inner iterations. Based on them, we propose practical stopping criteria for inner iterations. In Section 4 we propose a CPF-JDGSVD algorithm with restart and deflation for computing more than one GSVD components. Numerical experiments are reported in Section 5 to demonstrate the effectiveness of CPF-JDGSVD, and finally we conclude the paper in Section 6.

Let us introduce some notation to be used. We denote by the -norm of a vector or matrix, by the condition number of an arbitrary matrix with and being the largest and smallest singular values of , respectively, and by the transpose of . Throughout the paper, we assume that and themselves are modest, which can be achieved by suitable scaling. To our knowledge, and cannot be ill conditioned simultaneously in applications, meaning that the stacked matrix is well conditioned. Therefore, we assume that is modest.

## 2 A basic CPF-JDGSVD algorithm

We will propose our CPF-JDGSVD method for computing one, i.e., , GSVD component , i.e., with closest to . The method consists of three major parts: (i) the construction of left searching subspaces, (ii) an extraction approach of GSVD components, and (iii) an expansion approach of the right subspace. We will prove that the left searching subspaces constructed have similar accuracy to that of the right subspace, and establish a convergence result on the approximate generalized singular values.

### 2.1 Extraction approach

At iteration , assume that a -dimensional right searching subspace is available, from which we seek an approximation to the right generalized singular vector . For the left generalized singular vectors and , since and , we naturally construct and as left searching subspaces and seek their approximations from them. We will denote and .

Regarding the qualities of and , we have the following results.

###### Theorem 2.1.

Let be the right searching subspace and and be the left searching subspaces. Then for the generalized singular vectors it holds that

 (2.1) sin∠(U,u) ≤ ∥A∥∥x∥αsin∠(X,x), (2.2) sin∠(V,v) ≤ ∥B∥∥x∥βsin∠(X,x).
###### Proof.

For an arbitrary vector , by the definition of sine of the angle between arbitrary two nonzero vectors, we have

 (2.3) sin∠(Ax′,Ax) = minμ∥Ax−μAx′∥∥Ax∥=minμ∥A(x−μx′)∥∥Ax∥ ≤ ∥A∥∥x∥∥Ax∥minμ∥x−μx′∥∥x∥ = ∥A∥∥x∥αsin∠(x′,x),

where the last relation holds since with . Therefore, we obtain

 sin∠(U,u) = sin∠(AX,Ax)=minx′∈Xsin∠(Ax′,Ax) ≤ ∥A∥∥x∥αminx′∈Xsin∠(x′,x) = ∥A∥∥x∥αsin∠(X,x),

i.e., relation (2.1) holds. Following the same derivations as above, we obtain (2.2). ∎

Theorem 2.1 illustrates that when contains good information about the desired , the qualities of and are determined by , , , and by , , , respectively. Since , Theorem 2.3 in [7] states that

 (2.4) ∥X∥=∥∥[AB]†∥∥and∥X−1∥=∥∥[AB]∥∥,

which gives . By the assumption that is well conditioned and is modest, it is clear that is modest. Therefore, the quality of and is similar to that of provided that and are not very small, i.e., not very small or large.

It is worthwhile to notice that for an arbitrarily given -dimensional , the left searching subspaces and are subsets of the column spaces of and , respectively. When has trivial zero and infinite generalized singular values, if contains the components of the generalized singular vectors associated with the zero and infinite generalized singular values, then and contain the components of the corresponding and too. The fundamental significance of this observation is that from the right and left searching subspaces and , it is possible to extract approximations to both trivial and nontrivial GSVD components, no matter which extraction approach is used. In fact, all projection type methods for generalized eigenvalue problems and GSVD problems share a similar feature, where the corresponding and act on a given subspace that contains the components of eigenvectors or generalized right singular vectors corresponding to trivial zero or infinite eigenvalues or generalized singular values. It is clear that such feature is undesirable and may cause trouble because one is only interested in some nontrivial eigenpairs or GSVD components.

Now we propose an extraction approach that seeks an approximate generalized singular value pair and corresponding approximate generalized singular vectors , and satisfying the standard orthogonal projection:

 (2.5) ⎧⎪⎨⎪⎩A~x−~α~u=0,B~x−~β~v=0,~βAT~u−~αBT~v⊥X,with⎧⎪⎨⎪⎩∥~u∥=1,∥~v∥=1,~α2+~β2=1.

Let the columns of form an orthonormal basis of and

 (2.6) A˜X=˜UGandB˜X=˜VH

be the thin QR factorizations of and , respectively, where and are upper triangular. Suppose that and are nonsingular. Then the columns of and form orthonormal bases of and , respectively. Write , and . Then (2.5) is equivalent to

 (2.7)

That is, is a generalized singular value of the matrix pair , and , and are the corresponding generalized left and right singular vectors. We compute the GSVD of , pick up closest to the target , and take

 (2.8) (~α,~β,~u=˜Ue,~v=˜Vf,~x=˜Xd)

as an approximation to the desired GSVD component of .

For the accuracy of and , an application of (2.3) derives the following theorem.

###### Theorem 2.2.

Let , and be the approximations to , and that satisfy (2.5). Then

 (2.9) sin∠(~u,u) ≤ ∥A∥∥x∥αsin∠(~x,x), (2.10) sin∠(~v,v) ≤ ∥B∥∥x∥βsin∠(~x,x).

Theorem 2.2 illustrates that and are as good as , as the stacked matrix is well conditioned and and are not very small; see the analysis after Theorem 2.1.

Our extraction approach (2.5) mathematically amounts to realizing the standard orthogonal projection (Rayleigh–Ritz approximation) of the regular matrix pair onto . It extracts the Ritz vector associated with the Ritz value closest to and computes , and , satisfying (2.5). However, we do not form the cross-products and explicitly and thus avoid the potential accuracy loss of the computed GSVD components.

By (2.5), satisfies and , so that and . Therefore, its associated residual is

 (2.11) r=r(~α,~β,~u,~v,~x):=~βAT~u−~αBT~v

and if and only if is an exact GSVD component of . For a prescribed tolerance , is claimed to have converged if

 (2.12) ∥r∥≤(~β∥A∥1+~α∥B∥1)⋅tol.

In the following, we present our first main result, which, in terms of , gives the accuracy estimate of . To this end and also for our later use, define the function

 (2.13) h(θ,σ)=σ2−θ21+σ2forθ≥0andσ≥0.

By and , we have

 (2.14) α2i−β2iθ2=(σ2i−θ2)β2i=h(θ,σi).
###### Theorem 2.3.

Let be an approximate GSVD component of satisfying (2.5) with and be the corresponding residual defined by (2.11). Then the following results hold: (i) If

 (2.15) σmin√2+σ2min<θ<√1+2σ2max

with and the largest and smallest nontrivial generalized singular values of , respectively, then there exists a nontrivial generalized singular value of such that

 (2.16) |σ2−θ2|(1+σ2)θ≤∥X∥2∥r∥∥~x∥;

(ii) if , then

 (2.17) 1θ≤∥X∥2∥r∥∥~x∥;

(iii) if , then

 (2.18) θ≤∥X∥2∥r∥∥~x∥.
###### Proof.

By definition (2.11) of the residual and and , we have

 (2.19) θr=~αAT~u−~α2~βBT~v=(ATA−θ2BTB)~x.

Premultiplying both hand sides of the above by and exploiting (1.1), we obtain

 θXTr=XT(ATA−θ2BTB)XX−1~x=\diag{C2−θ2S2,−θ2Iq1,Iq2}X−1~x.

Taking norms on the two hand sides of the above equality and exploiting (2.14) gives

 (2.20) θ∥XTr∥ ≥ min{1,θ2,mini=1,…,q|α2i−β2iθ2|}∥X−1~x∥ ≥ ∥~x∥∥X∥min{1,θ2,mini=1,…,q|h(θ,σi)|}.

By (2.13), for and , we have

 |h(θ,σmax)| = h(θ,σmax)=σ2max−θ21+σ2max<1, |h(θ,σmax)| = −h(θ,σmax)=θ2−σ2max1+σ2max<1,

respectively, meaning . For and , we obtain

 |h(θ,σmin)| = −h(θ,σmin)=θ2−σ2min1+σ2min<θ2, |h(θ,σmin)| = h(θ,σmin)=σ2min−θ21+σ2min<θ2,

respectively, meaning . Therefore, under condition (2.15), we have

 min{1,θ2,mini=1,…,q|h(θ,σi)|} = mini=1,…,q|h(θ,σi)|=|h(θ,σ)|,

from which and (2.20) it follows that (2.16) holds.

It is straightforward to justify that, under the conditions in (ii)–(iii),

 min{1,θ2,mini=1,…,q|h(θ,σi)|} = 1, min{1,θ2,mini=1,…,q|h(θ,σi)|} = θ2,

respectively. Therefore, from (2.20) we obtain (2.17) and (2.18). ∎

From (2.6) and the orthonormality of , it is easily justified that and . Exploiting Lemma 2.4 of [11], and adapting (2.4) and its analysis to , we can show that with defined by (2.7) is modest, so is .

Theorem 2.3 indicates that when is small, will be a good approximation to some generalized singular value of . Furthermore, relation (2.16) shows that must converge to some nontrivial as . Observe that is a Ritz value of the symmetric matrix pair with respect to . It is easily seen from (1.1) that the eigenvalues of restricted to the subspace spanned by the columns of , i.e., the -orthogonal complement of with respect to , are the nontrivial . Therefore, all the lie between the nontrivial smallest and largest generalized singular values and of if has no component in ; see [20, 22]. Condition (2.15) is trivially satisfied in this case, and it is expected that the condition is fulfilled if the component of in is not rich. Relation (2.17) or (2.18) corresponds to the case that converges to the infinite or zero trivial generalized singular value of , respectively, as . This implies that or is nearly singular, which occurs when contains a quite accurate component in or . As a result, our method may encounter problems when computing the nontrivial largest or smallest nontrivial GSVD component of , but for it generally delivers some that lie between and , so that they converge to some nontrivial if the residual norms .

### 2.2 Subspace expansion

If the current GSVD approximation does not yet converge, one needs to expand the searching subspaces in order to extract a more accurate approximate GSVD component with respect to them. Since we construct the left searching subspaces by and , we only need to expand effectively and generate and correspondingly.

Keep in mind that is an eigenpair of the matrix pair with the desired generalized singular value . Suppose that an approximate is already available. We aim to seek a correction vector satisfying

 (2.21) t⊥~y:=(ATA+BTB)~x=~αAT~u+~βBT~v,

such that is an unnormalized right generalized singular vector of . Therefore, is an exact eigenpair of :

 (2.22) ATA(~x+t)=σ2BTB(~x+t).

Rearranging this equation, we obtain an equivalent form:

 (2.23) (ATA−θ2BTB)t=−(ATA−θ2BTB)~x+(σ2−θ2)BTB~x+(σ2−θ2)BTBt,

where is the current approximate generalized singular value.

Assume that is reasonably accurate but yet not converged and it is normalized such that . Then the size of must be small relative to that of . Actually, is an approximation to with the first order error since premultiplying both hand sides of (2.22) by gives rise to

 σ2=∥A(~x+t)∥2∥B(~x+t)∥2=~α2+2~α~uTAt+∥At∥2~β2+2~β~vTBt+∥Bt∥2=θ2(1+O(∥t∥)),

showing that the size of the third term in the right-hand side of (2.23) is .

Note from (2.19) that the first term in the right-hand side of (2.23) is collinear with the residual of , which is orthogonal to , as indicated by the third condition in (2.5). Therefore, the first term in the right-hand side of (2.23) is orthogonal to . Moreover, it is straightforward from condition (2.5) and definition (2.21) of that and is an oblique projector onto the orthogonal complement of . Neglecting the third term in the right-hand side of (2.23), we obtain

 (2.24) (I−~y~xT)(ATA−θ2BTB)t=−θr+(θ2−σ2)(I−~y~xT)BTB~x.

From (2.21) and (2.22), we have

 (I−~y~xT)BTB~x = BTB~x−(~xBTB~x)~y=BTB~x−~β2~y = ~α2BTB~x−~β2ATA~x=θ2BTB~x−ATA~x1+θ2 = (θ2−σ2)BTB~x1+θ2+(ATA−σ2BTB)t1+θ2=O(∥t∥),

which means that the second term in the right-hand side of (2.24) is . Neglecting this term in (2.24), we obtain

 (2.25) (I−~y~xT)(ATA−θ2BTB)t=−θrwitht⊥~y.

From the requirement we obtain . Therefore, we can replace with in (2.25). Notice that when expanding with , it is the direction other than the size of that matters. It makes no difference between solving (2.25) with the right-hand side and solving the one with . As a consequence, we have derived an ultimate correction equation

 (2.26) (I−~y~xT)(ATA−θ2BTB)(I−~x~yT)t=−r witht⊥~y.

Solving it for and orthonormalizing against yields the subspace expansion vector . The columns of the updated form an orthonormal basis of the expanded -dimensional right searching subspace .

The coefficient matrix in (2.26) dynamically depends on as outer iterations proceed. It is typical that may have little accuracy as approximations to in an initial stage, so that solving (2.26) with varying may cause unnecessary waste and even may lead to misconvergence. To this end, a more robust way is to solve the correction equation (2.26) with replaced by the fixed target in the left-hand side:

 (2.27) (I−~y~xT)(ATA−τ2BTB)(I−~x~yT)t=−rwitht⊥~y

in the initial stage until is fairly small. Approximately solving this equation or (2.26) iteratively is called inner iterations in CPF-JDGSVD.

An advantage of using (2.27) is that the solution is guaranteed to provide essential information in the direction of the desired and thus expands correctly, making approximate GSVD components converge toward the desired GSVD component. When becomes fairly small, and have become fairly good approximations to and . In this case, it may be beneficial to switch to solving (2.26) so as to gain a faster convergence for outer iterations. Precisely, if

 (2.28) ∥r∥≤(~β∥A∥1+~α∥B∥1)⋅fixtol

with fairly small, say, , we then switch to solving (2.26).

## 3 Properties of the correction equations and stopping criteria for inner iterations

For and large, we suppose that only iterative solvers are computationally viable to solve the correction equations approximately. Since the coefficient matrices in (2.26) and (2.27) are symmetric and possibly indefinite, the minimal residual method (MINRES) is a most commonly used choice [6, 21]. In this section, we establish compact upper bounds for the condition numbers of the correction equations (2.26) and (2.27) in the asymptotic sense. We also make an analysis on inner iterations. Based on them, we propose practical stopping criteria for inner iterations, which are related to the accuracy of approximate solutions and the corresponding residual norms. We focus on (2.26), and, as it will turn out, the results are directly applicable to (2.27).

### 3.1 Conditioning

The coefficient matrix in the correction equation (2.26) maps the orthogonal complement of to the orthogonal complement of , that is, its action restricts to and generates elements in . We denote this restricted linear operator by

 (3.1) M=(ATA−θ2BTB)|~y⊥→~x⊥.

The condition number determines the reliability of adopting the relative residual norm of the correction equation as the measurement of inner iteration accuracy: the smaller is, the more accurate the approximate solution of (2.26) is. As a result, it is insightful to derive a compact estimate for . However, it is generally not possible to derive a sharp estimate for a general approximation . Fortunately, we can do so in the ideal case that , which, as will be shown by (3.3), leads to with the first column of . By the continuity argument, such estimate will get insight into the asymptotic behavior of when .

Based on the GSVD (1.1) of and (1.2), we partition

where the matrices

 (3.2) ΣA,2=\diag{C2,0l1,q1,Iq2}andΣB,2=\diag{S2,Iq1,0l2,q2}

with and .

From , we obtain

 (3.3) Y=X−T=(ATA+BTB)X=[y,Y2]

with

 y=(ATA+BTB)x,Y2=(ATA+BTB)X2.

Then and are orthogonal to and , respectively, i.e.,