The Krylov Subspaces, Low Rank Approximations and Ritz Values of LSQR for Linear Discrete Ill-Posed Problems: the Multiple Singular Value Case

by   Zhongxiao Jia, et al.
Tsinghua University

For the large-scale linear discrete ill-posed problem minAx-b or Ax=b with b contaminated by white noise, the Golub-Kahan bidiagonalization based LSQR method and its mathematically equivalent CGLS, the Conjugate Gradient (CG) method applied to A^TAx=A^Tb, are most commonly used. They have intrinsic regularizing effects, where the iteration number k plays the role of regularization parameter. The long-standing fundamental question is: Can LSQR and CGLS find 2-norm filtering best possible regularized solutions? The author has given definitive answers to this question for severely and moderately ill-posed problems when the singular values of A are simple. This paper extends the results to the multiple singular value case, and studies the approximation accuracy of Krylov subspaces, the quality of low rank approximations generated by Golub-Kahan bidiagonalization and the convergence properties of Ritz values. For the two kinds of problems, we prove that LSQR finds 2-norm filtering best possible regularized solutions at semi-convergence. Particularly, we consider some important and untouched issues on best, near best and general rank k approximations to A for the ill-posed problems with the singular values σ_k=O(k^-α) with α>0, and the relationships between them and their nonzero singular values. Numerical experiments confirm our theory. The results on general rank k approximations and the properties of their nonzero singular values apply to several Krylov solvers, including LSQR, CGME, MINRES, MR-II, GMRES and RRGMRES.



There are no comments yet.


page 1

page 2

page 3

page 4


Best low-rank approximations and Kolmogorov n-widths

We relate the problem of best low-rank approximation in the spectral nor...

Adaptive Low-Rank Approximations for Operator Equations: Accuracy Control and Computational Complexity

The challenge of mastering computational tasks of enormous size tends to...

Dominant subspace and low-rank approximations from block Krylov subspaces without a gap

In this work we obtain results related to the approximation of h-dimensi...

On the sensitivity of singular and ill-Conditioned linear systems

Solving a singular linear system for an individual vector solution is an...

Preconvergence of the randomized extended Kaczmarz method

In this paper, we analyze the convergence behavior of the randomized ext...

The Multi-Symplectic Lanczos Algorithm and Its Applications to Color Image Processing

Low-rank approximations of original samples are playing more and more an...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction and Preliminaries

Consider the linear discrete ill-posed problem


where the norm

is the 2-norm of a vector or matrix, and

is extremely ill conditioned with its singular values decaying to zero without a noticeable gap. Without loss of generality, we assume that . Problem (1) typically arises from the discretization of the first kind Fredholm integral equation


where the kernel and are known functions, while is the unknown function to be sought. If is non-degenerate and satisfies the Picard condition, there exists the unique square integrable solution ; see [12, 23, 25, 39, 40]. Here for brevity we assume that and belong to the same set with . Applications include image deblurring, signal processing, geophysics, computerized tomography, heat propagation, biomedical and optical imaging, groundwater modeling, and many others; see, e.g., [1, 11, 12, 25, 36, 39, 40, 41, 52]. The right-hand side is assumed to be contaminated by a Gaussian white noise , caused by measurement, modeling or discretization errors, where is noise-free and . Because of the presence of noise and the extreme ill-conditioning of , the naive solution of (1) bears no relation to the true solution , where denotes the Moore-Penrose inverse of a matrix. Therefore, one has to use regularization to extract a best possible approximation to .

For a Gaussian white noise , we always assume that satisfies the discrete Picard condition with some constant for arbitrarily large [1, 22, 23, 25, 37]. It is an analog of the Picard condition in the finite dimensional case; see, e.g., [23, p.9], [25, p.12] and [37, p.63]. Without loss of generality, assume that . Then the two dominating regularization approaches are to solve the following two equivalent problems:


with and general-form Tikhonov regularization


with the regularization parameter [23, 25], where is a regularization matrix and its suitable choice is based on a-prior information on . Typically,

is either the identity matrix

or the scaled discrete approximation of a first or second order derivative operator. If , (4) is standard-form Tikhonov regularization, and both (3) and (4) are 2-norm filtering regularization problems.

We are concerned with the case in this paper. If , (3) and (4), in principle, can be transformed into standard-form problems [23, 25]. In this case, for (3

) of small or moderate size, an effective and reliable solution method is the truncated singular value decomposition (TSVD) method, and it obtains the 2-norm filtering best regularized solution

at some [23, 25], where is the optimal regularization parameter, called the transition point, such that . We will review the TSVD method and reformulate it and (4) when has multiple singular values. A key of solving (4) is the determination of the optimal regularization parameter such that . A number of parameter-choice methods have been developed for finding , such as the discrepancy principle, the L-curve criterion, and the generalized cross validation (GCV), etc. We refer the reader to, e.g., [23, 25] for details.

It has been theoretically and numerically justified that and essentially have the minimum 2-norm error; see [51], [22], [23, p.109-11] and [25, Sections 4.2 and 4.4]. In effect, the theory in [12] has shown that the error of is unconditionally order optimal in the Hilbert space setting, i.e., the same order as the worst case error, while is conditionally order optimal. As a result, we can naturally take as a reference standard when assessing the regularization ability of a 2-norm filtering regularization method.

For (1) large, the TSVD method and the Tikhonov regularization method are generally too demanding, and only iterative regularization methods are computationally viable. Krylov iterative solvers are a major class of methods for solving (1), and they project problem (1) onto a sequence of low dimensional Krylov subspaces and computes iterates to approximate [1, 12, 17, 18, 23, 25, 39]. Of them, the CGLS method, which implicitly applies the CG method [26] to , and its mathematically equivalent LSQR algorithm [44] have been most commonly used. The Krylov solvers CGME [4, 5, 10, 18, 19] and LSMR [5, 14] are also choices. These Krylov solvers have general regularizing effects [1, 17, 18, 19, 23, 25, 27, 28] and exhibit semi-convergence [41, p.89]; see also [4, p.314], [23, p.135] and [25, p.110]: The iterates converge to in an initial stage; afterwards the noise starts to deteriorate the iterates so that they start to diverge from and instead converge to . If we stop at the right time, then, in principle, we have a regularization method, where the iteration number plays the role of the regularization parameter. Semi-convergence is due to the fact that the projected problem starts to inherit the ill-conditioning of (1) from some iteration onwards, and the appearance of a small singular value of the projected problem amplifies the noise considerably.

The behavior of (1), (3) and (4) with critically depends on the decay rate of the singular values of . The behavior of ill-posed problems critically depends on the decay rate of . For a linear compact operator equation such as (2) in the Hilbert space setting, let be the singular values of the compact operator . The following characterization of the degree of ill-posedness of (2) was introduced in [29] and has been widely used; see, e.g., [1, 12, 23, 25, 40]: If with , , then (2) is severely ill-posed; if , then (2) is mildly or moderately ill-posed for or . Here for mildly ill-posed problems we add the requirement , which does not appear in [29] but must be met for a linear compact operator equation [20, 23]. In the one dimensional case, i.e., , (1) is severely ill-posed when the kernel function is sufficiently smooth, and it is moderately ill-posed with , where is the highest order of continuous derivatives of ; see, e.g., [23, p.8] and [25, p.10-11]. The singular values of discretized problem (1) resulting from the continuous (2) inherit the decay properties of [23, 25], provided that discretizations are fine enough, so that the classification applies to (1) as well.

Björck and Eldén in their 1979 survey [6] foresightedly expressed a fundamental concern on CGLS (and LSQR): More research is needed to tell for which problems this approach will work, and what stopping criterion to choose. See also [23, p.145]. Hanke and Hansen [20] and Hansen [24] address that a strict proof of the regularizing properties of conjugate gradients is extremely difficult. Over the years, an enormous effort has been made to the study of regularizing effects of LSQR and CGLS; see, e.g., [13, 19, 23, 25, 27, 28, 31, 33, 34, 39, 42, 45, 47]. To echo the concern of Björck and Eldén, such a definition has been introduced in [31, 33]: If a regularized solution to (1) is at least as accurate as , then it is called a best possible 2-norm filtering regularized solution. If the regularized solution by an iterative regularization solver at semi-convergence is such a best possible one, then the solver is said to have the full regularization. Otherwise, the solver is said to have only the partial regularization.

Since it had been unknown whether or not LSQR, CGME and LSMR have the full regularization for a given (1), one commonly combines them with some explicit regularization [1, 23, 25]. The hybrid LSQR variants have been advocated by Björck and Eldén [6] and O’Leary and Simmons [43], and improved and developed by Björck [3], Björck, Grimme and van Dooren [7], and Renaut et al. [46]. A hybrid LSQR first projects (1) onto Krylov subspaces and then regularizes the projected problems explicitly. It aims to remove the effects of small Ritz values and expands Krylov subspaces until they captures all needed dominant SVD components of [3, 7, 20, 43], so that the error norms of regularized solutions and the residual norms possibly decrease further until they ultimately stabilize. The hybrid LSQR, CGME and LSMR have been intensively studied in, e.g., [2, 8, 9, 19, 20, 46] and [1, 25].

If an iterative solver itself, e.g., LSQR, is theoretically proved and practically identified to have the full regularization, one simply stops it after a few iterations of semi-convergence, and no complicated hybrid variant is needed. In computation, semi-convergence can be in principle determined by a parameter-choice method, such as the L-curve criterion and the discrepancy principle. Therefore, we cannot emphasize too much the importance of proving the full or partial regularization of LSQR, CGLS, CGME and LSMR. By the definition of the full or partial regularization, a fundamental question is: Do LSQR, CGLS, LSMR and CGME have the full or partial regularization for severely, moderately and mildly ill-posed problems?

Regarding LSQR and CGLS, for the three kinds of ill-posed problems described above, the author [33, 34] has proved that LSQR has the full regularization for severely and moderately ill-posed problems with certain suitable and under the assumption that all the singular values of are simple. In applications, there are 2D image belurring problems where the matrices ’s have multiple singular values [15]. In this paper, we extend the results in [33, 34] to the multiple singular value case. In Section 2, we reformulate the TSVD method and standard-form Tikhonov regularization in the multiple singular value case, showing that they compute regularized solutions as if they work on a modified form of (1), where the coefficient matrix has the distinct singular values of as its nonzero simple singular values. In Section 3, we show that LSQR works as if it solves the same modified one of (1). In this way, we build a bridge that connects the regularizing effects of the TSVD method and those of LSQR, so that we can analyze the regularization ability of LSQR by taking the best TSVD regularized solution as the reference standard. In Sections 46, we extend the main results in [33, 34] to the multiple singular value case. Consequently, we can draw the same conclusions as those in [33, 34].

After the above, we consider some important issues that have received no attention in the literature: best, near best and general rank approximations to for the ill-posed problems with and , respectively, which include mildly ill-posed problems, and some intrinsic relationships between them and the approximation properties of their nonzero singular values. These results apply to LSQR, where the Ritz values, i.e., the nonzero singular values of rank approximation matrices generated by Golub-Kahan bidiagonalization, critically decide the regularization ability of LSQR. We will show that, unlike for severely and moderately ill-posed problems with suitable and , a best or near best rank approximation to does not mean that its nonzero singular values approximate the large singular values of in natural order. Furthermore, for , given the accuracy of the rank approximation in LSQR, we establish more insightful results on the nonzero singular values of the rank

approximation matrix, which estimate their maximum possible number that are smaller than

. These results also apply to the Krylov solvers CGME, MINRES and MR-II, and GMRES and RRGMRES [18, 25], each of which generates its own rank approximation to at iteration . All these constitutes the work of Section 7. In Section 8, we report the numerical experiments to confirm the results. Finally, we conclude the paper in Section 9.

Throughout the paper, we denote by the dimensional Krylov subspace generated by the matrix and the vector , and by and

the identity matrix and the zero matrix whose orders are omitted whenever clear from the context.

2 The reformulation and analysis of the TSVD method and standard-form Tikhonov regularization in the multiple singular value case

In order to extend the results in [33, 34] to the multiple singular value case, we need to reorganize the SVD of and reformulate the TSVD method and standard-form Tikhonov regularization by taking in (1) into account carefully. To this end, we must make numerous necessary changes and preparations, as will be detailed below.

Let the SVD of be


where with and with are orthogonal, with the distinct singular values , each is multiple and the identify matrix, and the superscript denotes the transpose of a matrix or vector. Then for a given Gaussian white noise , by (5) we obtain


where with , and the norm of the second term is very large (huge) for fixed.

The discrete Picard condition on (1) stems from the necessary requirement

with some constant , independent of and plays a fundamental role in the solution of linear discrete ill-posed problems; see, e.g., [1, 16, 22, 23, 25, 37, 46]. It states that, on average, the (generalized) Fourier coefficients decay faster than , which enables regularization to compute useful approximations to . The following common model has been used throughout Hansen’s books [23, 25] and the references therein as well as [33, 34] and the current paper:


where is a model parameter that controls the decay rates of . We remark that Hansen [23, 25] uses the individual columns of in (7) if is multiple, which is equivalent to assuming that each column of and the corresponding one of makes the same contributions to and and the TSVD method must use all the columns of and associated with to form a regularized solution. It is trivial to unify the discrete Picard condition on the individual columns of in the form of (7) for a multiple .

Based on the above, in the multiple singular value case, the TSVD method [23, 25] solves (3) by dealing with the problem


for some , where is a best rank approximation to and the most common choice (cf. [4, p.12]) is


with and .

In order to extend the results in [33, 34] to the multiple singular value case, the first key step is to take the right-hand side into consideration and to reorganize (5) so as to obtain an SVD of in some desired form by selecting a specific set of left and right singular vectors corresponding to a multiple singular value of . Specifically, for the multiple , we choose an orthonormal basis of its left singular subspace by requiring that have a nonzero orthogonal projection on just one left singular vector in the singular subspace and no components in the remaining ones. Precisely, recall that the columns of form an orthonormal basis of the unique left singular subspace associated with . Then we must have


where is the orthogonal projector onto the left singular subspace associated with . With such , define the corresponding right singular vector . We then select the other orthonormal left singular vectors which are orthogonal to and, together with , form a new orthonormal basis of the left singular subspace associated with . We define the corresponding right singular vectors in the same way as ; they, together with , form a new orthonormal basis of the right singular subspace associated with .

Write the above resulting new left and right singular vector matrices as and with and as their first columns, respectively. Then there exist orthogonal matrices and such that and . After such treatment, we obtain a desired compact SVD


with and . We remind that defined above is unique since the orthogonal projection of onto the left singular subspace associated with is unique and does not depend on the choice of its orthonormal basis.

Now we need to prove that satisfies the discrete Picard condition (7). To see this, notice that


Particularly, we have


We will take the equality when using (13) later, which does not affect all the proofs and results to be presented.

With (10) and (11), a crucial observation is that the solution to (8) becomes


which consists of the first large distinct dominant SVD components of .

Define the new matrix


where , and with and consisting of the other columns of and defined by (11), respectively.



By construction, has the nonzero simple singular values and multiple zero singular value, and . We then see from (14) that the best rank approximation to in (8) can be equivalently replaced by the best rank approximation


to , where , and because .

With the above analysis and simple justifications, we can present the following theorem.

Let , and be defined by (9), (15) and (17). Then for the TSVD solutions satisfy


Particularly, for , we have


The above results show that solving (8) amounts to solving the problem


for the same and that (8) and (21) have the same solutions and residual norms for .

Remark 2.1

Relations (18)–(21) mean that the TSVD method for solving (3) works exactly as if it solves the regularization problem


with , and it computes the same TSVD regularized solutions , to (1) and the modified problem


Relation (12) or (13) states that (23) satisfies the discrete Picard condition.

The covariance matrix of the Gaussian white noise is , and the expected value . With the SVD (15) of , it holds that , and and ; see, e.g., [23, p.70-1] and [25, p.41-2]. The noise thus affects more or less equally. Relation (13) shows that for large singular values is dominant relative to . Once from some onwards, the small singular values magnify , and the noise dominates and must be suppressed. The transition point is such that


see [25, p.42, 98] and a similar description [23, p.70-1]. In this sense, the are divided into the large and small ones. The TSVD solutions


It is easily justified from [23, p.70-1] and [25, p.71,86-8,96] that first converges to and the error and the residual norm monotonically decrease until for , afterwards diverges and instead converges to , while the residual norm stabilizes for not close to . Therefore, the index plays the role of the regularization parameter, exhibits typical semi-convergence at , and the best regularized solution has minimum 2-norm error.

By the construction of and , it follows from (5) and (15) that, for a given parameter , the solution of the Tikhonov regularization (4) is


which is a filtered SVD expansion of , where , are called filters. The above relation has proved the following result.

For the same , (4) with is equivalent to the standard-form Tikhonov regularization


of the modified (23).

by the TSVD method is a special filtered SVD expansion, where and . The best Tikhonov regularized solution , which is defined as , retains the dominant SVD components of and dampens the other small SVD components as much as possible. The semi-convergence of the Tikhonov regularization method occurs at when the parameter varies from zero to infinity.

Finally, we stress that our above changes and reformulations are purely for a mathematical analysis, which aims to extend the results in [33, 34] to the multiple singular value case. Computationally, we never need to reorganize the SVD of and construct .

3 The LSQR algorithm

The LSQR algorithm is based on Golub-Kahan bidiagonalization, Algorithm 3.1, that computes two orthonormal bases and of and for , respectively.

Algorithm 3.1
  • Take , and define with .

  • For





Algorithm 3.1 can be written in the matrix form


where is the -th canonical basis vector of , , , and


It follows from (28) that


We remind that the singular values of , called the Ritz values of with respect to the left and right subspaces and , are all simple, provided that Algorithm 3.1 does not break down until step .

At iteration , LSQR solves the problem

and computes the iterate with


where is the first canonical basis vector of , and the solution norm increases and the residual norm decreases monotonically with respect to . From and (32), we obtain


which solves the problem

Next we will prove that Algorithm 3.1 and LSQR work exactly as if they are applied to and (23), that is, they generate the same results when applied to solving Problems (1) and (23).

By (16), let us expand as

By the SVD (11) of and the SVD (15) of as well as the description on them, it is straightforward to justify that




by noting that


for any integer and


for any integer . Thus, for the given , Algorithm 3.1 works on exactly as if it does on , that is, (28)–(31) hold when is replaced by . As a result, the distinct Ritz values approximate nonzero singular values of , i.e., distinct singular values of . Particularly, from (31) we have


Moreover, (36) and (37) show

As a result, since has nonzero components in all the right singular vectors of associated with its nonzero distinct singular values , Golub-Kahan bidiagonalization cannot break down until step , and the singular values of are exactly the singular values of . At step , Golub-Kahan bidiagonalization generates the orthonormal , and the matrix111 If , it is easily justified that , Algorithm 3.1 produces the orthonormal matrices , and the lower bidiagonal with the positive diagonals and subdiagonals . This does not affect all the derivation and results followed, and we only need to replace by . For example, in (40).




Since (28)–(31) hold when is replaced by , just as the TSVD method (cf. (21)), LSQR works exactly as if it solves (22). We summarize the results as follows.

The LSQR iterate is the solution to the problem


starting with onwards, and it is a regularized solution of (23) and thus of (1) and satisfies


with defined by (32).

The rank approximation to in (41) plays a role similar to the best rank approximation to in (21). Recall that the best rank approximation to satisfies . As a result, if is a near best rank approximation to