1 Introduction
Computational algorithms for Xray computed tomography (CT) come in different forms. Some methods take their basis in an explicit formulation of the inverse operator, leading to the filtered back projection algorithm (for parallelbeam CT), the FDK algorithm (for conebeam CT), etc. Other methods are based on a discretization of the problem followed by solving the – usually over or underdetermined – linear system of equations by means of an iterative method. The latter approach, which is the basis for this work, is more general in the sense that it does not assume any specific scanning geometry, and it tends to produce better reconstructions in the case of limiteddata and/or limitedangle problems [17].
CT reconstruction is an inverse problem where the forward problem refers to the mapping of the object’s linear attenuation coefficient to the data in the form of projections of the object onto the detector planes for various scan positions. In the case of 2D parallelbeam CT the forward operator is known as the Radon transform. Data consists of (noisy) measurements of the attenuation of the Xrays through the object, recorded in a set of detector elements. Discretization of this problem takes the form
(1.1) 
where is a discretization of the forward projector, is the measured data, and represents the reconstructed image of the object’s interior. Moreover, represents the exact object, represents the noisefree data, and represents the measurement noise. A number of discretization schemes are available for computing the matrix , see, e.g., [17, Chapter 9], [10] and [24, Appendix A]. A discussion of the noise in CT can be found in [17, Section 4.4]; here we assume that is white Gaussian noise.
The matrix is sparse and there are no restrictions on its dimensions and ; both overdetermined and underdetermined systems are used, depending on the measurement setup. In both cases it is common to consider a least squares solution which involves the normal equations in one of the forms
and
The matrix represents the socalled back projector which maps the data back onto the solution domain [27]. The back projector is a mathematical abstraction that has no physical interpretation, but it plays a central role in filtered back projection and many other reconstruction algorithms.
In largescale CT problems, the matrix – in spite of the fact that it is sparse – may be too large to store explicitly. Instead, one must use subroutines or functions that compute the multiplications with and in a matrixfree fashion, often using graphics processing units (GPUs) or other hardware accelerators. In order to make best use of this hardware and the memory hierarchy it is common to use different discretization techniques for the forward projector and the back projector [28]. This means that the matrix which represents the back projector is typically different from the transpose of the forward projector, and we say that is an unmatched back projector or an unmatched transpose.
A consequence of this is that iterative solvers, which are supposed to solve the normal equations, instead solve the socalled unmatched normal equations in one of the forms
(1.2) 
see [5] and [3] for details. The main drawback of using an unmatched transpose is that the standard Simultaneous Iterative Reconstruction Technique (SIRT) iterative solvers (Landweber, Cimmino, CAV, DROP, SART) [16] do not converge when the iteration matrix
has one or more complex eigenvalues with a negative real part – which is very often the case in practice (see Figure
6.1 in Section 6.2). A convergence analysis of proximal gradient methods with an unmatched transpose is given in [2].As shown in [3]
, one can modify the SIRT methods such that they solve a slightly modified problem whose iteration matrix has no eigenvalues with a negative real part, thus ensuring convergence. But this does introduce a (small) perturbation of the solution, and one must compute an estimate of the leftmost eigenvalue(s) of
. In addition, the choice of a good relaxation parameter is nontrivial.An alternative is to use Kaczmarz’s method [16] which does not explicitly involve the matrix transpose; but this method has other drawbacks. In its original form, Kaczmarz’s method operates on a single row of at a time, which is unsuited for modern computer architectures. Block versions [30] have better performance but they require a good choice of the blocks, the blocks may be unmatched, and again the choice of a good relaxation parameter is nontrivial.
The present work explores an alternative approach, where we use the wellknown GMRES algorithm to solve the unmatched normal equations (1.2) with an unmatched back projector. Thus, we avoid introducing a perturbation as well as the need for eigenvalue computations for the sake of ensuring convergence of the SIRT methods, and we avoid the choice of the relaxation parameter. This makes it easier to develop a blackbox CT solver that does not rely on the user’s knowledge of convergence criteria, eigenvalue computations, etc. Our work is based on previous work on the preconditioned AB and BAGMRES methods for solving least squares problems [19] and here we explicitly demonstrate the successful use of these methods for the CT reconstruction problems.
The main goal of our work is to study the performance and the regularizing effects of the AB and BAGMRES algorithms when applied to CT reconstruction problems. To do this, we deliberately commit “inverse crime” and generate the noisefree data as , meaning that the same model (the matrix ) is used to generate and reconstruct the data. To determine which unmatched pair is preferable, one must use real (or carefully simulated) data to avoid the inverse crime – this is a topic for future research.
We remark that an alternative to using the unmatched transpose is to use Kaczmarz’s method as preconditioner (i.e., for ) in the AB and BAGRMES algorithms, as described and analyzed in [25, 4]. We shall not pursue that approach here, but leave it for future work.
Our paper is organized as follows. In Section 2 we summarize the AB and BAGMRES algorithms, and in Section 3 we present new firstorder perturbation theory for the unmatched normal equations. The behavior of the iterative methods in the presence of noisy data is discussed in Section 4, and in Section 5 we study the regularizing properties of AB and BAGMRES when . Finally, in Section 6 we present a number of numerical examples that illustrate our theory and the performance of the AB and BAGMRES algorithms. We use the following notation: denotes the range of a matrix, denotes the null space of a matrix. For Krylov subspaces we use the notation
Moreover,
denotes the orthogonal projection of vector
on the subspace , and denotes theth singular value of
.2 The ABGMRES and BAGMRES Algorithms
The two algorithms ABGMRES and BAGMRES were originally presented and analyzed in [19] as alternative methods for solving linear least squares problems according to the following principles:

ABGMRES solves , with as a right preconditioner.

BAGMRES solves with as a left preconditioner.
Here we briefly summarize these algorithms using the notation and ; we describe some stopping rules later in Section 6.4.
Algorithm ABGMRES  Algorithm BAGMRES  
Choose initial  Choose initial  
for  for  
for  for  
endfor  endfor  
stopping rule goes here  stopping rule goes here  
endfor  endfor 
The following statements about the convergence are from [19]. We emphasize that the two methods use the same Krylov subspace for the solution, but they use different objective functions.
ABGMRES

ABGMRES applies GMRES to , , i.e., forms the iterates that minimize , i.e., that minimize .

If , then ABGMRES determines a solution of without breakdown for all if and only if [19, Corollary 3.8.].
BAGMRES

BAGMRES applies GMRES to min , i.e., forms the iterates that minimize .

If and , then BAGMRES determines a solution of without breakdown for all [25, Theorem 3.1].
The conditions on the ranges and ensure the equivalence between the matched and unmatched normal equations, as seen above. Simply, implies and implies . Further, these conditions also ensure the convergence of AB and BAGMRES. However, it is not easy to check if these conditions are satisfied in practice.
If , solving the unmatched normal equations by using AB and BAGMRES we expect to obtain an approximation to the solution. This expectation can be supported by the observation that if is close to , then and can be close to and , respectively, e.g., the principal angles between the pair of the ranges and and pair and can be small from the extended theorem [34, Section 3], [31, Chapter 2]
(2.1) 
where is either the 2norm or the Frobenius norm and we assume that . Moreover, is a diagonal matrix with the principal angles between subspaces and with . In case of the matrix 2norm, we can instead use the upper bound
3 FirstOrder Perturbation Theory
To further motivate the use of the unmatched normal equations, we consider the difference between the solutions of the matched and unmatched normal equations, and we study how this difference depends on the unmatchness of the pair .
A firstorder perturbation analysis for the unmatched normal equations was given in [5, Section 2.1]. Their analysis gave a bound for the distance between the closest pair of a point in the solution set of the least squares problem and a point in the solution set of its perturbed problem. Here, we perform an alternative firstorder perturbation analysis specifically for the minimumnorm solutions, as discussed below. This analysis refines the previous perturbation analysis in [5, Section 2.1].
Generally, we can assume to have modeling error also in , in addition to . Let and be the ideal model and data, respectively, and let
(3.1) 
be the perturbed models and data. Moreover, let be a ground truth solution of and be the corresponding least squares residual, for which holds. The perturbed matrices and can be regarded as and in the unmatched normal equations (1.2).
Now, consider solving the perturbed unmatched normal equations
(3.2) 
where denotes the minimumnorm solution of the unperturbed normal equations . Note that the linear system (3.2) is consistent if . Irrespective of the inconsistency of (3.2), the minimumnorm solution of the least squares problem
(3.3) 
is given by , where denotes the MoorePenrose generalized inverse (pseudoinverse). We are concerned with the difference between the minimumnorm solution of and the minimumnorm solution of (3.3). Note that the solution of interest in (1.1) is not necessarily the minimumnorm solution but may lie close to it.
Theorem 3.1.
Assume that and are both acute perturbations of and , respectively, i.e.,
and
respectively, where denotes the orthogonal projection onto a subspace . Then, the firstorder bound of the relative error norm is given by
(3.4) 
where denotes the smallest nonzero singular value of and is the condition number of .
See Appendix A for the proof. This theorem shows that the error bound depends linearly on , , , and , whereas the bound is independent of . If the smallest nonzero singular value is very small, then the perturbations and can greatly affect the error.
The firstorder bound of the relative error norm in the “inverse crime” case, where with the minimumnorm solution and hence , is given as follows.
Corollary 3.2.
Assume that , is an acute perturbations of , and . Then, the firstorder bound of the relative error norm is given by
(3.5) 
This corollary follows directly from Theorem 3.1 and shows that the error bound is independent of and in the “inverse crime” case. Note that the higherorder terms of these quantities, such as , can contribute to the error.
The above analysis focuses on the unmatched normal equations that BAGMRES deals with. A firstorder perturbation analysis for the unmatched normal equations , that ABGMRES deals with in the consistent case is performed in [5, section 2.2]. The corresponding analysis in the inconsistent case is left open.
4 Iterative Regularization and SemiConvergence
When we discretize an inverse problem we obtain a coefficient matrix whose nonzero singular values decay gradually to zero with no gap anywhere, and has a large condition number. Therefore, it is not a good idea to naively solve the problem (1.1) with noisy data. With the notation from (1.1) and assuming that the exact solution satisfies
(4.1) 
the minimumnorm least squares solution to the noisy problem has the form . Here, the second term is highly undesired because – due to the large condition number of – it has elements that are much larger than those in . We need to use a regularization method that filters the influence from the noise.
The singular value decomposition (SVD) provides a convenient framework for analyzing this situation. Let the coefficient matrix
in (1.1) have the SVD(4.2) 
with
(4.3) 
Then we can write the minimumnorm least squares solution as
(4.4) 
Discretizations of inverse problems satisfy the discrete Picard condition (DPC) meaning that, in average, the absolute values decay faster than the singular values [15, Section 3.3]. The first term in (4.4) is equal to the exact, noisefree solution and the DPC ensures that its norm stays bounded as the problem size increases. If
is white noise then so is
meaning that the coefficients will, on average, increase due to the decreasing singular values. Consequently, the second term in (4.4) is typically much larger than the first term – and its norm increases with the problem size. All regularization methods essentially filter or dampen the SVD components corresponding to the smaller singular values, thus reducing the influence of the noise and, at the same time, computing a good approximation to .The key mechanism behind the use of iterative solvers for computing solutions to inverse problems with noisy data, such as the CT reconstruction problem, is known as semiconvergence [15, Chapter 6]. When we apply an iterative method (typically a least squares solver) to the noisy problem (1.1) then the reconstruction error , where denotes the th iterate, exhibits two different phases:

During the initial iterations decreases and appears to approach the ground truth .

After a while starts to increase, and asymptotically converges to the undesired least squares solution.
To obtain a meaningful regularized solution we must stop the iterations at the transition point where the iteration vector is as close to as possible. Development of stopping rules that seek to terminate the iterations at this point are closely related to methods for choosing regularization parameters (see, e.g., [17, Chapter 5] and [29]); overviews of stopping rules in the context of CT are given in [17, Section 11.2] and [18].
Deeper insight into the semiconvergence, and explanations when and why it manifests itself, has been a topic of research for many years. For methods where can be expressed as a filtered SVD solution of the form
(4.5) 
we have a good understanding, see, e.g., [17] and [15, Chapter 6]. For example, for the Landweber iteration with relaxation parameter we have , and for CGLS the filters can be expressed as polynomials of degree that depend on the Ritz values associated with the th CGLS iteration, cf. [13, Section 6.4.3].
For other methods where cannot easily be expressed in terms of the SVD the understanding is less mature (see [6], [33] for emerging insight into Kaczmarz’s method). The analysis of the regularizing properties of GMRES applied to is complicated by the fact that it is connected to the convergence of the Ritz values of the underlying Arnoldi algorithm.

The insight obtained from [1] is that if the noisefree data lies in a finitedimensional Krylov subspace, and if GMRES is equipped with a suitable stopping rule, then the GMRESsolution converges to the exact solution as the noise goes to zero.

The focus of [9] is socalled “hybrid methods” where regularization is applied to the Hessenberg systems in GMRES, but the main result in [9, §3.1.2] applies more generally: Assume that the system in (1.1) satisfies the DPC and that the left singular vectors of the Hessenberg matrices of two consecutive GMRES steps, applied to , resemble each other – then the Hessenberg systems in GMRES also satisfy the DPC.
Taken together, these results imply that if all the SVD components corresponding to the larger singular values are captured in order of decreasing magnitude, when GMRES is applied to , then GMRES will exhibit semiconvergence. Unfortunately, a complete understanding of these aspects has not emerged yet. Hence, while the semiconvergence aspect of GMRES is crucial in this work, we primarily rely on insight obtained from numerical experiments.
5 The Regularizing Properties of AB and BAGMRES with a Matched Transpose
To understand the regularizing properties of the AB and BAGMRES methods when , let us consider the limiting case when (the matched case).
5.1 The ABGMRES Algorithm with
The th step of ABGMRES with solves
and it determines the th iterate where . Hence,
The method minimizes
where
and
Hence, the method minimizes . In summary, the ABGMRES method with minimizes , and its iterates satisfy .
5.2 The LSQR Algorithm
Next, consider the LSQR algorithm for which is mathematically equivalent to the CGLS method. Note
(5.1) 
where . The LSQR and CGLS methods are mathematically equivalent to applying the Conjugate Gradient (CG) method to the normal equations . They minimize , where and is any solution of (5.1). Note
where
Also note
where
and
where
Hence,
where . Hence, . Thus we have
and
Thus, CGLS minimizes , where . The iterates satisfy .
Therefore, ABGMRES with , as well as LSQR and CGLS for , minimize , where , and the iterates (solutions) are in the same space, i.e., . Thus, these methods are mathematically equivalent.
In finite precision arithmetic, ABGMRES with should be numerically more stable than CGLS and LSQR, since ABGMRES is based on the Arnoldi process whereas LSQR and CGLS rely on shortterm recurrences. In fact, ABGMRES may be numerically equivalent to LSQR and CGLS with full reorthogonalization [19], and this is confirmed by our numerical experiments (which are not included here).
Now, it is well known that CGLS has good regularizing properties for discrete illposed problems, leading to semiconvergence, see, e.g. [15, 11, 12, 22]. When , we may still apply ABGMRES, while applying LSQR (which is equivalent to applying CG to the unmatched normal equations ) is not well founded and may be problematic since is neither symmetric nor positive semidefinite. Also, we may still expect semiconvergence of ABGMRES, as will be demonstrated in the numerical experiments. Specifically, in Section 6.6 we study experimentally how semiconvergence is influenced by the difference between and .
5.3 The BAGMRES Algorithm with a Matched Transpose
BAGMRES applies GMRES to . It minimizes where , and
BAGMRES with applies GMRES to . It minimizes
where
Thus,
and BAGMRES with minimizes . The iterates satisfy
BAGMRES with applies GMRES to
which is mathematically equivalent to applying MINRES to the normal equations , and which is equivqlent to LSMR [8]. Again, BAGMRES with should be numerically more stable than LSMR. It may be numerically equivalent to LSMR with full reorthogonalization, and again our numerical experiments (not included here) confirm this.
Similar to CGLS, LSMR also has good regularizing properties when applied to discrete illposed problems [8, 23]. We may still apply BAGMRES when , while it is not well founded to apply LSMR since is no longer symmetric and we cannot apply MINRES to . We expect that BAGMRES exhibits semiconvergence, as will be demonstrated in the numerical experiments.
6 Numerical Examples
In this section we illustrate the use of the AB and BAGMRES methods for CT problems with an unmatched transpose. We start with numerical results related to the eigenvalues of the iterations matrices and , and then we demonstrate the semiconvergence of the methods. All computations are performed in MATLAB using our own implementations of AB and BAGMRES which are available from us, LSQR is from Regularization Tools [14] and LSMR is from MathWorks’ File Exchange [7].
6.1 The Test Problems
Parameter  Small matrix  Large matrix  Ground truth 

Image size  [width=0.135]groundtruth  
Projection angles  
No. projection angles  180  600  
No. detector elements  128  420  
Matrix size  
Sparsity  99%  99.6% 
small matrices  0.3700  0.1402  0.2648 

large matrices  0.3747  0.1405  0.2685 
The matrices and used in these experiments are representative for the matrices in many CT software packages and applications. They are generated by means of the CPU version of the software package ASTRA [32]
; the matrices in the GPU version are not explicitly available when using this software. Three different discretization models are provided in ASTRA: the line model, the strip model, and the interpolation (or Joseph) model; see
[17, Chapter 9] for details. We can then use any of the corresponding matrices , and to generate unmatched pairs . We use a parallelbeam geometry and two different sizes of these matrices corresponding to the parameters listed in Table 6.1, while Table 6.2 lists the relative normwise differences between these matrices. The exact solution is generated by means of the functionphantomgallery(’threephases’,N)
from [16], and it is shown in Table 6.1. We add white Gaussian noise to with two different relative noise level and .
6.2 Eigenvalues
As mentioned in the Introduction, the standard SIRT iterative methods will not converge when has complex eigenvalues with negative real part. To demonstrate that this is the case for the discretizations used here, Figure 6.1 shows the leftmost and the largest eigenvalues of computed by means of MATLAB’s eigs, for two combinations of the small and large test matrices, namely, and . In both cases there are indeed eigenvalues with negative real parts, meaning that the SIRT methods do not converge. Hence, it is natural to use the AB and BAGMRES methods for these matrices. The leftmost eigenvalues of the third combination have tiny negative real parts.
6.3 Error Histories
Pair  ABGMRES  BAGMRES  LSQR  LSMR  

0.1047 (34)  0.1203 (36)  0.1039 (38)  0.1199 (41)  0.0954 (57)  0.0996 (62)  0.0946 (68)  0.0990 (77)  
8568000  829440  6703200  671744  
0.0985 (41)  0.1059 (42)  0.0978 (47)  0.1056 (50)  0.0954 (53)  0.0996 (61)  0.0946 (63)  0.0990 (77)  
10332000  967680  8290800  819200  
0.0878 (61)  0.0926 (77)  0.0871 (74)  0.0921 (85)  0.0896 (52)  0.0879 (64)  0.0875 (63)  0.0795 (80)  
15372000  1774080  13053600  1392640  
0.0899 (50)  0.0927 (60)  0.0894 (60)  0.0922 (69)  0.0896 (52)  0.0870 (64)  0.0875 (63)  0.0795 (80)  
12600000  1382400  10584000  1130496  
0.1005 (39)  0.1112 (42)  0.0998 (44)  0.1105 (47)  0.0935 (53)  0.0948 (61)  0.0928 (63)  0.0940 (77)  
9828000  967680  7761600  770048  
0.0915 (55)  0.0930 (68)  0.0901 (67)  0.0920 (84)  0.0935 (53)  0.0948 (61)  0.0928 (63)  0.0940 (77)  
13860000  1566720  11818800  1376256 
To illustrate the semiconvergence we use both ABGMRES and BAGMRES to solve systems with small and large test matrices and with the noise level . We tried all six pairs with , and for comparison we also used LSQR and LSMR (which correspond to the case ). To check our implementations, we verified numerically that AB and BAGMRES with give the same results LSQR and LSMR (we do not show there results here). Figure 6.2 shows the error histories, i.e., the relative reconstruction error versus the number of iterations for the cases and . Figure 6.3 shows the reconstructions for the case . These results are representative for all six pairs of matrices.
When we refer to the “reconstruction error,” we mean the relative error at the point of semiconvergence (i.e., the minimum of the error histories) indicated by the markers in the plots. The reconstruction errors for all pairs are shown in Table 6.3. For each of the six pairs we observe the following:

We obtain almost the same reconstruction error for ABGMRES and BAGMRES, cf. block columns 2 and 3.

We obtain almost the same reconstruction error for LSQR and LSMR, cf. block columns 4 and 5.

When then LSQR and LSMR give slightly smaller reconstruction errors than AB and BAGMRES, cf. the top block row as well as Figure 6.2.

For the other two matrices, all four methods give almost the same reconstruction errors, cf. the middle and bottom block rows.

The pairs and give marginally more accurate reconstructions than the other pairs, cf. the middle block row.

Often, ABGMRES uses just slightly fewer iterations than BAGMRES – and occasionally is uses significantly less iterations.

LSQR always uses less iterations than LSMR.
We also carried out experiments with underdetermined problems (which are not documented here). The conclusions remain the same.
Table 6.3 also lists the required amount of storage for the orthogonal vectors which are of length and for ABGMRES and BAGMRES, respectively. For the overdetermined systems used here with , in spite of BAGMES consistently using more iterations than ABGMRES, BAGMRES requires less storage due to the shorter vectors. For underdetermined systems (not reported here), ABGMRES has the advantage of less iterations and shorter vectors, cf. [19, p. 2408].
We emphasize that the choice of and is dictated by the available software, and therefore one may not always have a choice of the implementation used in and . Moreover, the above results are for matrices used in the ASTRA software (which allows easy access to the matrices); other packages may use different discretization methods. We also stress that in these experiments we perform inverse crime, meaning that the noisefree data is generated as ; hence the data is different for the three choices of meaning that we do not solve precisely the same problem for each choice of . Therefore, the above results provide important insight about the influence of , but they do not determine what is the best choice of and for a CT problem with real data and no inverse crime.
6.4 Stopping Rules
Here we demonstrate the use of two stopping rules that seek to terminate the iterations at the point of semiconvergence.

The discrepancy principle (DP) [26] terminates the iterations as soon as the residual norm is smaller than the noise level:
(6.1) Here, is a “safety factor” that can be used when we have only a rough estimate of . We use and the exact value of .

The NCP criterion uses the normalized cumulative periodogram to perform a spectral analysis of the residual vector , in order to identify when the residual is as close to being white noise as possible, which indicates that all available information has been extracted from the noisy data. See [16, Section 2.3.3] and [18, SectionII.D] for details; MATLAB code is available from us.
We apply these stopping rules to the same pairs and as in Figure 6.2, with the same noise level , and the results are shown in Figure 6.4. We obtain similar results for the other pairs and hence they are not shown here. We make the following observations:

Both DP and NCP stop the iterations before the minimum is reached. This is better than stopping too late, in which case we would include undesired noise in the solution.

Both DP and NCP stop the iterations when the error history starts to level off; the minimum of the error history is quite flat so this is acceptable.

For unknown reasons, NCP always stops the iterations a bit earlier than DP.
We conclude that both stopping rules work well for this problem. The DP stopping rule requires a good estimate of the noise level; if this is not available (as is typical in CT problems) then the performance of NCP is only slightly inferior to DP.
Comments
There are no comments yet.