1 Introduction and Preliminaries
Consider the linear discrete illposed problem
(1) 
where the norm
is the 2norm of a vector or matrix, and
is extremely ill conditioned with its singular values decaying to zero without a noticeable gap. Without loss of generality, we assume that . Problem (1) typically arises from the discretization of the first kind Fredholm integral equation(2) 
where the kernel and are known functions, while is the unknown function to be sought. If is nondegenerate and satisfies the Picard condition, there exists the unique square integrable solution ; see [12, 23, 25, 39, 40]. Here for brevity we assume that and belong to the same set with . Applications include image deblurring, signal processing, geophysics, computerized tomography, heat propagation, biomedical and optical imaging, groundwater modeling, and many others; see, e.g., [1, 11, 12, 25, 36, 39, 40, 41, 52]. The righthand side is assumed to be contaminated by a Gaussian white noise , caused by measurement, modeling or discretization errors, where is noisefree and . Because of the presence of noise and the extreme illconditioning of , the naive solution of (1) bears no relation to the true solution , where denotes the MoorePenrose inverse of a matrix. Therefore, one has to use regularization to extract a best possible approximation to .
For a Gaussian white noise , we always assume that satisfies the discrete Picard condition with some constant for arbitrarily large [1, 22, 23, 25, 37]. It is an analog of the Picard condition in the finite dimensional case; see, e.g., [23, p.9], [25, p.12] and [37, p.63]. Without loss of generality, assume that . Then the two dominating regularization approaches are to solve the following two equivalent problems:
(3) 
with and generalform Tikhonov regularization
(4) 
with the regularization parameter [23, 25], where is a regularization matrix and its suitable choice is based on aprior information on . Typically,
is either the identity matrix
or the scaled discrete approximation of a first or second order derivative operator. If , (4) is standardform Tikhonov regularization, and both (3) and (4) are 2norm filtering regularization problems.We are concerned with the case in this paper. If , (3) and (4), in principle, can be transformed into standardform problems [23, 25]. In this case, for (3
) of small or moderate size, an effective and reliable solution method is the truncated singular value decomposition (TSVD) method, and it obtains the 2norm filtering best regularized solution
at some [23, 25], where is the optimal regularization parameter, called the transition point, such that . We will review the TSVD method and reformulate it and (4) when has multiple singular values. A key of solving (4) is the determination of the optimal regularization parameter such that . A number of parameterchoice methods have been developed for finding , such as the discrepancy principle, the Lcurve criterion, and the generalized cross validation (GCV), etc. We refer the reader to, e.g., [23, 25] for details.It has been theoretically and numerically justified that and essentially have the minimum 2norm error; see [51], [22], [23, p.10911] and [25, Sections 4.2 and 4.4]. In effect, the theory in [12] has shown that the error of is unconditionally order optimal in the Hilbert space setting, i.e., the same order as the worst case error, while is conditionally order optimal. As a result, we can naturally take as a reference standard when assessing the regularization ability of a 2norm filtering regularization method.
For (1) large, the TSVD method and the Tikhonov regularization method are generally too demanding, and only iterative regularization methods are computationally viable. Krylov iterative solvers are a major class of methods for solving (1), and they project problem (1) onto a sequence of low dimensional Krylov subspaces and computes iterates to approximate [1, 12, 17, 18, 23, 25, 39]. Of them, the CGLS method, which implicitly applies the CG method [26] to , and its mathematically equivalent LSQR algorithm [44] have been most commonly used. The Krylov solvers CGME [4, 5, 10, 18, 19] and LSMR [5, 14] are also choices. These Krylov solvers have general regularizing effects [1, 17, 18, 19, 23, 25, 27, 28] and exhibit semiconvergence [41, p.89]; see also [4, p.314], [23, p.135] and [25, p.110]: The iterates converge to in an initial stage; afterwards the noise starts to deteriorate the iterates so that they start to diverge from and instead converge to . If we stop at the right time, then, in principle, we have a regularization method, where the iteration number plays the role of the regularization parameter. Semiconvergence is due to the fact that the projected problem starts to inherit the illconditioning of (1) from some iteration onwards, and the appearance of a small singular value of the projected problem amplifies the noise considerably.
The behavior of (1), (3) and (4) with critically depends on the decay rate of the singular values of . The behavior of illposed problems critically depends on the decay rate of . For a linear compact operator equation such as (2) in the Hilbert space setting, let be the singular values of the compact operator . The following characterization of the degree of illposedness of (2) was introduced in [29] and has been widely used; see, e.g., [1, 12, 23, 25, 40]: If with , , then (2) is severely illposed; if , then (2) is mildly or moderately illposed for or . Here for mildly illposed problems we add the requirement , which does not appear in [29] but must be met for a linear compact operator equation [20, 23]. In the one dimensional case, i.e., , (1) is severely illposed when the kernel function is sufficiently smooth, and it is moderately illposed with , where is the highest order of continuous derivatives of ; see, e.g., [23, p.8] and [25, p.1011]. The singular values of discretized problem (1) resulting from the continuous (2) inherit the decay properties of [23, 25], provided that discretizations are fine enough, so that the classification applies to (1) as well.
Björck and Eldén in their 1979 survey [6] foresightedly expressed a fundamental concern on CGLS (and LSQR): More research is needed to tell for which problems this approach will work, and what stopping criterion to choose. See also [23, p.145]. Hanke and Hansen [20] and Hansen [24] address that a strict proof of the regularizing properties of conjugate gradients is extremely difficult. Over the years, an enormous effort has been made to the study of regularizing effects of LSQR and CGLS; see, e.g., [13, 19, 23, 25, 27, 28, 31, 33, 34, 39, 42, 45, 47]. To echo the concern of Björck and Eldén, such a definition has been introduced in [31, 33]: If a regularized solution to (1) is at least as accurate as , then it is called a best possible 2norm filtering regularized solution. If the regularized solution by an iterative regularization solver at semiconvergence is such a best possible one, then the solver is said to have the full regularization. Otherwise, the solver is said to have only the partial regularization.
Since it had been unknown whether or not LSQR, CGME and LSMR have the full regularization for a given (1), one commonly combines them with some explicit regularization [1, 23, 25]. The hybrid LSQR variants have been advocated by Björck and Eldén [6] and O’Leary and Simmons [43], and improved and developed by Björck [3], Björck, Grimme and van Dooren [7], and Renaut et al. [46]. A hybrid LSQR first projects (1) onto Krylov subspaces and then regularizes the projected problems explicitly. It aims to remove the effects of small Ritz values and expands Krylov subspaces until they captures all needed dominant SVD components of [3, 7, 20, 43], so that the error norms of regularized solutions and the residual norms possibly decrease further until they ultimately stabilize. The hybrid LSQR, CGME and LSMR have been intensively studied in, e.g., [2, 8, 9, 19, 20, 46] and [1, 25].
If an iterative solver itself, e.g., LSQR, is theoretically proved and practically identified to have the full regularization, one simply stops it after a few iterations of semiconvergence, and no complicated hybrid variant is needed. In computation, semiconvergence can be in principle determined by a parameterchoice method, such as the Lcurve criterion and the discrepancy principle. Therefore, we cannot emphasize too much the importance of proving the full or partial regularization of LSQR, CGLS, CGME and LSMR. By the definition of the full or partial regularization, a fundamental question is: Do LSQR, CGLS, LSMR and CGME have the full or partial regularization for severely, moderately and mildly illposed problems?
Regarding LSQR and CGLS, for the three kinds of illposed problems described above, the author [33, 34] has proved that LSQR has the full regularization for severely and moderately illposed problems with certain suitable and under the assumption that all the singular values of are simple. In applications, there are 2D image belurring problems where the matrices ’s have multiple singular values [15]. In this paper, we extend the results in [33, 34] to the multiple singular value case. In Section 2, we reformulate the TSVD method and standardform Tikhonov regularization in the multiple singular value case, showing that they compute regularized solutions as if they work on a modified form of (1), where the coefficient matrix has the distinct singular values of as its nonzero simple singular values. In Section 3, we show that LSQR works as if it solves the same modified one of (1). In this way, we build a bridge that connects the regularizing effects of the TSVD method and those of LSQR, so that we can analyze the regularization ability of LSQR by taking the best TSVD regularized solution as the reference standard. In Sections 4–6, we extend the main results in [33, 34] to the multiple singular value case. Consequently, we can draw the same conclusions as those in [33, 34].
After the above, we consider some important issues that have received no attention in the literature: best, near best and general rank approximations to for the illposed problems with and , respectively, which include mildly illposed problems, and some intrinsic relationships between them and the approximation properties of their nonzero singular values. These results apply to LSQR, where the Ritz values, i.e., the nonzero singular values of rank approximation matrices generated by GolubKahan bidiagonalization, critically decide the regularization ability of LSQR. We will show that, unlike for severely and moderately illposed problems with suitable and , a best or near best rank approximation to does not mean that its nonzero singular values approximate the large singular values of in natural order. Furthermore, for , given the accuracy of the rank approximation in LSQR, we establish more insightful results on the nonzero singular values of the rank
approximation matrix, which estimate their maximum possible number that are smaller than
. These results also apply to the Krylov solvers CGME, MINRES and MRII, and GMRES and RRGMRES [18, 25], each of which generates its own rank approximation to at iteration . All these constitutes the work of Section 7. In Section 8, we report the numerical experiments to confirm the results. Finally, we conclude the paper in Section 9.Throughout the paper, we denote by the dimensional Krylov subspace generated by the matrix and the vector , and by and
the identity matrix and the zero matrix whose orders are omitted whenever clear from the context.
2 The reformulation and analysis of the TSVD method and standardform Tikhonov regularization in the multiple singular value case
In order to extend the results in [33, 34] to the multiple singular value case, we need to reorganize the SVD of and reformulate the TSVD method and standardform Tikhonov regularization by taking in (1) into account carefully. To this end, we must make numerous necessary changes and preparations, as will be detailed below.
Let the SVD of be
(5) 
where with and with are orthogonal, with the distinct singular values , each is multiple and the identify matrix, and the superscript denotes the transpose of a matrix or vector. Then for a given Gaussian white noise , by (5) we obtain
(6) 
where with , and the norm of the second term is very large (huge) for fixed.
The discrete Picard condition on (1) stems from the necessary requirement
with some constant , independent of and plays a fundamental role in the solution of linear discrete illposed problems; see, e.g., [1, 16, 22, 23, 25, 37, 46]. It states that, on average, the (generalized) Fourier coefficients decay faster than , which enables regularization to compute useful approximations to . The following common model has been used throughout Hansen’s books [23, 25] and the references therein as well as [33, 34] and the current paper:
(7) 
where is a model parameter that controls the decay rates of . We remark that Hansen [23, 25] uses the individual columns of in (7) if is multiple, which is equivalent to assuming that each column of and the corresponding one of makes the same contributions to and and the TSVD method must use all the columns of and associated with to form a regularized solution. It is trivial to unify the discrete Picard condition on the individual columns of in the form of (7) for a multiple .
Based on the above, in the multiple singular value case, the TSVD method [23, 25] solves (3) by dealing with the problem
(8) 
for some , where is a best rank approximation to and the most common choice (cf. [4, p.12]) is
(9) 
with and .
In order to extend the results in [33, 34] to the multiple singular value case, the first key step is to take the righthand side into consideration and to reorganize (5) so as to obtain an SVD of in some desired form by selecting a specific set of left and right singular vectors corresponding to a multiple singular value of . Specifically, for the multiple , we choose an orthonormal basis of its left singular subspace by requiring that have a nonzero orthogonal projection on just one left singular vector in the singular subspace and no components in the remaining ones. Precisely, recall that the columns of form an orthonormal basis of the unique left singular subspace associated with . Then we must have
(10) 
where is the orthogonal projector onto the left singular subspace associated with . With such , define the corresponding right singular vector . We then select the other orthonormal left singular vectors which are orthogonal to and, together with , form a new orthonormal basis of the left singular subspace associated with . We define the corresponding right singular vectors in the same way as ; they, together with , form a new orthonormal basis of the right singular subspace associated with .
Write the above resulting new left and right singular vector matrices as and with and as their first columns, respectively. Then there exist orthogonal matrices and such that and . After such treatment, we obtain a desired compact SVD
(11) 
with and . We remind that defined above is unique since the orthogonal projection of onto the left singular subspace associated with is unique and does not depend on the choice of its orthonormal basis.
Now we need to prove that satisfies the discrete Picard condition (7). To see this, notice that
(12) 
Particularly, we have
(13) 
We will take the equality when using (13) later, which does not affect all the proofs and results to be presented.
With (10) and (11), a crucial observation is that the solution to (8) becomes
(14) 
which consists of the first large distinct dominant SVD components of .
Define the new matrix
(15) 
where , and with and consisting of the other columns of and defined by (11), respectively.
Write
(16) 
By construction, has the nonzero simple singular values and multiple zero singular value, and . We then see from (14) that the best rank approximation to in (8) can be equivalently replaced by the best rank approximation
(17) 
to , where , and because .
With the above analysis and simple justifications, we can present the following theorem.
Let , and be defined by (9), (15) and (17). Then for the TSVD solutions satisfy
(18)  
(19) 
Particularly, for , we have
(20) 
The above results show that solving (8) amounts to solving the problem
(21) 
for the same and that (8) and (21) have the same solutions and residual norms for .
Remark 2.1
Relations (18)–(21) mean that the TSVD method for solving (3) works exactly as if it solves the regularization problem
(22) 
with , and it computes the same TSVD regularized solutions , to (1) and the modified problem
(23) 
Relation (12) or (13) states that (23) satisfies the discrete Picard condition.
The covariance matrix of the Gaussian white noise is , and the expected value . With the SVD (15) of , it holds that , and and ; see, e.g., [23, p.701] and [25, p.412]. The noise thus affects more or less equally. Relation (13) shows that for large singular values is dominant relative to . Once from some onwards, the small singular values magnify , and the noise dominates and must be suppressed. The transition point is such that
(24) 
see [25, p.42, 98] and a similar description [23, p.701]. In this sense, the are divided into the large and small ones. The TSVD solutions
(25) 
It is easily justified from [23, p.701] and [25, p.71,868,96] that first converges to and the error and the residual norm monotonically decrease until for , afterwards diverges and instead converges to , while the residual norm stabilizes for not close to . Therefore, the index plays the role of the regularization parameter, exhibits typical semiconvergence at , and the best regularized solution has minimum 2norm error.
By the construction of and , it follows from (5) and (15) that, for a given parameter , the solution of the Tikhonov regularization (4) is
(26) 
which is a filtered SVD expansion of , where , are called filters. The above relation has proved the following result.
For the same , (4) with is equivalent to the standardform Tikhonov regularization
(27) 
of the modified (23).
by the TSVD method is a special filtered SVD expansion, where and . The best Tikhonov regularized solution , which is defined as , retains the dominant SVD components of and dampens the other small SVD components as much as possible. The semiconvergence of the Tikhonov regularization method occurs at when the parameter varies from zero to infinity.
3 The LSQR algorithm
The LSQR algorithm is based on GolubKahan bidiagonalization, Algorithm 3.1, that computes two orthonormal bases and of and for , respectively.
Algorithm 3.1

Take , and define with .

For
 (i)

 (ii)

 (iii)

 (iv)

Algorithm 3.1 can be written in the matrix form
(28)  
(29) 
where is the th canonical basis vector of , , , and
(30) 
It follows from (28) that
(31) 
We remind that the singular values of , called the Ritz values of with respect to the left and right subspaces and , are all simple, provided that Algorithm 3.1 does not break down until step .
At iteration , LSQR solves the problem
and computes the iterate with
(32) 
where is the first canonical basis vector of , and the solution norm increases and the residual norm decreases monotonically with respect to . From and (32), we obtain
(33) 
which solves the problem
Next we will prove that Algorithm 3.1 and LSQR work exactly as if they are applied to and (23), that is, they generate the same results when applied to solving Problems (1) and (23).
By (16), let us expand as
By the SVD (11) of and the SVD (15) of as well as the description on them, it is straightforward to justify that
(34) 
and
(35) 
by noting that
(36) 
for any integer and
(37) 
for any integer . Thus, for the given , Algorithm 3.1 works on exactly as if it does on , that is, (28)–(31) hold when is replaced by . As a result, the distinct Ritz values approximate nonzero singular values of , i.e., distinct singular values of . Particularly, from (31) we have
(38) 
As a result, since has nonzero components in all the right singular vectors of associated with its nonzero distinct singular values , GolubKahan bidiagonalization cannot break down until step , and the singular values of are exactly the singular values of . At step , GolubKahan bidiagonalization generates the orthonormal , and the matrix^{1}^{1}1 If , it is easily justified that , Algorithm 3.1 produces the orthonormal matrices , and the lower bidiagonal with the positive diagonals and subdiagonals . This does not affect all the derivation and results followed, and we only need to replace by . For example, in (40).
(39) 
and
(40) 
Since (28)–(31) hold when is replaced by , just as the TSVD method (cf. (21)), LSQR works exactly as if it solves (22). We summarize the results as follows.
Comments
There are no comments yet.