Computational algorithms for X-ray computed tomography (CT) come in different forms. Some methods take their basis in an explicit formulation of the inverse operator, leading to the filtered back projection algorithm (for parallel-beam CT), the FDK algorithm (for cone-beam CT), etc. Other methods are based on a discretization of the problem followed by solving the – usually over- or underdetermined – linear system of equations by means of an iterative method. The latter approach, which is the basis for this work, is more general in the sense that it does not assume any specific scanning geometry, and it tends to produce better reconstructions in the case of limited-data and/or limited-angle problems .
CT reconstruction is an inverse problem where the forward problem refers to the mapping of the object’s linear attenuation coefficient to the data in the form of projections of the object onto the detector planes for various scan positions. In the case of 2D parallel-beam CT the forward operator is known as the Radon transform. Data consists of (noisy) measurements of the attenuation of the X-rays through the object, recorded in a set of detector elements. Discretization of this problem takes the form
where is a discretization of the forward projector, is the measured data, and represents the reconstructed image of the object’s interior. Moreover, represents the exact object, represents the noise-free data, and represents the measurement noise. A number of discretization schemes are available for computing the matrix , see, e.g., [17, Chapter 9],  and [24, Appendix A]. A discussion of the noise in CT can be found in [17, Section 4.4]; here we assume that is white Gaussian noise.
The matrix is sparse and there are no restrictions on its dimensions and ; both overdetermined and underdetermined systems are used, depending on the measurement setup. In both cases it is common to consider a least squares solution which involves the normal equations in one of the forms
The matrix represents the so-called back projector which maps the data back onto the solution domain . The back projector is a mathematical abstraction that has no physical interpretation, but it plays a central role in filtered back projection and many other reconstruction algorithms.
In large-scale CT problems, the matrix – in spite of the fact that it is sparse – may be too large to store explicitly. Instead, one must use subroutines or functions that compute the multiplications with and in a matrix-free fashion, often using graphics processing units (GPUs) or other hardware accelerators. In order to make best use of this hardware and the memory hierarchy it is common to use different discretization techniques for the forward projector and the back projector . This means that the matrix which represents the back projector is typically different from the transpose of the forward projector, and we say that is an unmatched back projector or an unmatched transpose.
A consequence of this is that iterative solvers, which are supposed to solve the normal equations, instead solve the so-called unmatched normal equations in one of the forms
see  and  for details. The main drawback of using an unmatched transpose is that the standard Simultaneous Iterative Reconstruction Technique (SIRT) iterative solvers (Landweber, Cimmino, CAV, DROP, SART)  do not converge when the iteration matrix
has one or more complex eigenvalues with a negative real part – which is very often the case in practice (see Figure6.1 in Section 6.2). A convergence analysis of proximal gradient methods with an unmatched transpose is given in .
As shown in 
, one can modify the SIRT methods such that they solve a slightly modified problem whose iteration matrix has no eigenvalues with a negative real part, thus ensuring convergence. But this does introduce a (small) perturbation of the solution, and one must compute an estimate of the leftmost eigenvalue(s) of. In addition, the choice of a good relaxation parameter is nontrivial.
An alternative is to use Kaczmarz’s method  which does not explicitly involve the matrix transpose; but this method has other drawbacks. In its original form, Kaczmarz’s method operates on a single row of at a time, which is unsuited for modern computer architectures. Block versions  have better performance but they require a good choice of the blocks, the blocks may be unmatched, and again the choice of a good relaxation parameter is nontrivial.
The present work explores an alternative approach, where we use the well-known GMRES algorithm to solve the unmatched normal equations (1.2) with an unmatched back projector. Thus, we avoid introducing a perturbation as well as the need for eigenvalue computations for the sake of ensuring convergence of the SIRT methods, and we avoid the choice of the relaxation parameter. This makes it easier to develop a black-box CT solver that does not rely on the user’s knowledge of convergence criteria, eigenvalue computations, etc. Our work is based on previous work on the preconditioned AB- and BA-GMRES methods for solving least squares problems  and here we explicitly demonstrate the successful use of these methods for the CT reconstruction problems.
The main goal of our work is to study the performance and the regularizing effects of the AB- and BA-GMRES algorithms when applied to CT reconstruction problems. To do this, we deliberately commit “inverse crime” and generate the noise-free data as , meaning that the same model (the matrix ) is used to generate and reconstruct the data. To determine which unmatched pair is preferable, one must use real (or carefully simulated) data to avoid the inverse crime – this is a topic for future research.
We remark that an alternative to using the unmatched transpose is to use Kaczmarz’s method as preconditioner (i.e., for ) in the AB- and BA-GRMES algorithms, as described and analyzed in [25, 4]. We shall not pursue that approach here, but leave it for future work.
Our paper is organized as follows. In Section 2 we summarize the AB- and BA-GMRES algorithms, and in Section 3 we present new first-order perturbation theory for the unmatched normal equations. The behavior of the iterative methods in the presence of noisy data is discussed in Section 4, and in Section 5 we study the regularizing properties of AB- and BA-GMRES when . Finally, in Section 6 we present a number of numerical examples that illustrate our theory and the performance of the AB- and BA-GMRES algorithms. We use the following notation: denotes the range of a matrix, denotes the null space of a matrix. For Krylov subspaces we use the notation
denotes the orthogonal projection of vectoron the subspace , and denotes the
th singular value of.
2 The AB-GMRES and BA-GMRES Algorithms
The two algorithms AB-GMRES and BA-GMRES were originally presented and analyzed in  as alternative methods for solving linear least squares problems according to the following principles:
AB-GMRES solves , with as a right preconditioner.
BA-GMRES solves with as a left preconditioner.
Here we briefly summarize these algorithms using the notation and ; we describe some stopping rules later in Section 6.4.
|Algorithm AB-GMRES||Algorithm BA-GMRES|
|Choose initial||Choose initial|
|stopping rule goes here||stopping rule goes here|
The following statements about the convergence are from . We emphasize that the two methods use the same Krylov subspace for the solution, but they use different objective functions.
AB-GMRES applies GMRES to , , i.e., forms the iterates that minimize , i.e., that minimize .
If , then AB-GMRES determines a solution of without breakdown for all if and only if [19, Corollary 3.8.].
BA-GMRES applies GMRES to min , i.e., forms the iterates that minimize .
If and , then BA-GMRES determines a solution of without breakdown for all [25, Theorem 3.1].
The conditions on the ranges and ensure the equivalence between the matched and unmatched normal equations, as seen above. Simply, implies and implies . Further, these conditions also ensure the convergence of AB- and BA-GMRES. However, it is not easy to check if these conditions are satisfied in practice.
If , solving the unmatched normal equations by using AB- and BA-GMRES we expect to obtain an approximation to the solution. This expectation can be supported by the observation that if is close to , then and can be close to and , respectively, e.g., the principal angles between the pair of the ranges and and pair and can be small from the extended theorem [34, Section 3], [31, Chapter 2]
where is either the 2-norm or the Frobenius norm and we assume that . Moreover, is a diagonal matrix with the principal angles between subspaces and with . In case of the matrix 2-norm, we can instead use the upper bound
3 First-Order Perturbation Theory
To further motivate the use of the unmatched normal equations, we consider the difference between the solutions of the matched and unmatched normal equations, and we study how this difference depends on the unmatchness of the pair .
A first-order perturbation analysis for the unmatched normal equations was given in [5, Section 2.1]. Their analysis gave a bound for the distance between the closest pair of a point in the solution set of the least squares problem and a point in the solution set of its perturbed problem. Here, we perform an alternative first-order perturbation analysis specifically for the minimum-norm solutions, as discussed below. This analysis refines the previous perturbation analysis in [5, Section 2.1].
Generally, we can assume to have modeling error also in , in addition to . Let and be the ideal model and data, respectively, and let
be the perturbed models and data. Moreover, let be a ground truth solution of and be the corresponding least squares residual, for which holds. The perturbed matrices and can be regarded as and in the unmatched normal equations (1.2).
Now, consider solving the perturbed unmatched normal equations
where denotes the minimum-norm solution of the unperturbed normal equations . Note that the linear system (3.2) is consistent if . Irrespective of the inconsistency of (3.2), the minimum-norm solution of the least squares problem
is given by , where denotes the Moore-Penrose generalized inverse (pseudoinverse). We are concerned with the difference between the minimum-norm solution of and the minimum-norm solution of (3.3). Note that the solution of interest in (1.1) is not necessarily the minimum-norm solution but may lie close to it.
Assume that and are both acute perturbations of and , respectively, i.e.,
respectively, where denotes the orthogonal projection onto a subspace . Then, the first-order bound of the relative error norm is given by
where denotes the smallest nonzero singular value of and is the condition number of .
See Appendix A for the proof. This theorem shows that the error bound depends linearly on , , , and , whereas the bound is independent of . If the smallest nonzero singular value is very small, then the perturbations and can greatly affect the error.
The first-order bound of the relative error norm in the “inverse crime” case, where with the minimum-norm solution and hence , is given as follows.
Assume that , is an acute perturbations of , and . Then, the first-order bound of the relative error norm is given by
This corollary follows directly from Theorem 3.1 and shows that the error bound is independent of and in the “inverse crime” case. Note that the higher-order terms of these quantities, such as , can contribute to the error.
The above analysis focuses on the unmatched normal equations that BA-GMRES deals with. A first-order perturbation analysis for the unmatched normal equations , that AB-GMRES deals with in the consistent case is performed in [5, section 2.2]. The corresponding analysis in the inconsistent case is left open.
4 Iterative Regularization and Semi-Convergence
When we discretize an inverse problem we obtain a coefficient matrix whose nonzero singular values decay gradually to zero with no gap anywhere, and has a large condition number. Therefore, it is not a good idea to naively solve the problem (1.1) with noisy data. With the notation from (1.1) and assuming that the exact solution satisfies
the minimum-norm least squares solution to the noisy problem has the form . Here, the second term is highly undesired because – due to the large condition number of – it has elements that are much larger than those in . We need to use a regularization method that filters the influence from the noise.
The singular value decomposition (SVD) provides a convenient framework for analyzing this situation. Let the coefficient matrixin (1.1) have the SVD
Then we can write the minimum-norm least squares solution as
Discretizations of inverse problems satisfy the discrete Picard condition (DPC) meaning that, in average, the absolute values decay faster than the singular values [15, Section 3.3]. The first term in (4.4) is equal to the exact, noise-free solution and the DPC ensures that its norm stays bounded as the problem size increases. If
is white noise then so ismeaning that the coefficients will, on average, increase due to the decreasing singular values. Consequently, the second term in (4.4) is typically much larger than the first term – and its norm increases with the problem size. All regularization methods essentially filter or dampen the SVD components corresponding to the smaller singular values, thus reducing the influence of the noise and, at the same time, computing a good approximation to .
The key mechanism behind the use of iterative solvers for computing solutions to inverse problems with noisy data, such as the CT reconstruction problem, is known as semi-convergence [15, Chapter 6]. When we apply an iterative method (typically a least squares solver) to the noisy problem (1.1) then the reconstruction error , where denotes the th iterate, exhibits two different phases:
During the initial iterations decreases and appears to approach the ground truth .
After a while starts to increase, and asymptotically converges to the undesired least squares solution.
To obtain a meaningful regularized solution we must stop the iterations at the transition point where the iteration vector is as close to as possible. Development of stopping rules that seek to terminate the iterations at this point are closely related to methods for choosing regularization parameters (see, e.g., [17, Chapter 5] and ); overviews of stopping rules in the context of CT are given in [17, Section 11.2] and .
Deeper insight into the semi-convergence, and explanations when and why it manifests itself, has been a topic of research for many years. For methods where can be expressed as a filtered SVD solution of the form
we have a good understanding, see, e.g.,  and [15, Chapter 6]. For example, for the Landweber iteration with relaxation parameter we have , and for CGLS the filters can be expressed as polynomials of degree that depend on the Ritz values associated with the th CGLS iteration, cf. [13, Section 6.4.3].
For other methods where cannot easily be expressed in terms of the SVD the understanding is less mature (see ,  for emerging insight into Kaczmarz’s method). The analysis of the regularizing properties of GMRES applied to is complicated by the fact that it is connected to the convergence of the Ritz values of the underlying Arnoldi algorithm.
The insight obtained from  is that if the noise-free data lies in a finite-dimensional Krylov subspace, and if GMRES is equipped with a suitable stopping rule, then the GMRES-solution converges to the exact solution as the noise goes to zero.
The focus of  is so-called “hybrid methods” where regularization is applied to the Hessenberg systems in GMRES, but the main result in [9, §3.1.2] applies more generally: Assume that the system in (1.1) satisfies the DPC and that the left singular vectors of the Hessenberg matrices of two consecutive GMRES steps, applied to , resemble each other – then the Hessenberg systems in GMRES also satisfy the DPC.
Taken together, these results imply that if all the SVD components corresponding to the larger singular values are captured in order of decreasing magnitude, when GMRES is applied to , then GMRES will exhibit semi-convergence. Unfortunately, a complete understanding of these aspects has not emerged yet. Hence, while the semi-convergence aspect of GMRES is crucial in this work, we primarily rely on insight obtained from numerical experiments.
5 The Regularizing Properties of AB- and BA-GMRES with a Matched Transpose
To understand the regularizing properties of the AB- and BA-GMRES methods when , let us consider the limiting case when (the matched case).
5.1 The AB-GMRES Algorithm with
The th step of AB-GMRES with solves
and it determines the th iterate where . Hence,
The method minimizes
Hence, the method minimizes . In summary, the AB-GMRES method with minimizes , and its iterates satisfy .
5.2 The LSQR Algorithm
Next, consider the LSQR algorithm for which is mathematically equivalent to the CGLS method. Note
where . The LSQR and CGLS methods are mathematically equivalent to applying the Conjugate Gradient (CG) method to the normal equations . They minimize , where and is any solution of (5.1). Note
where . Hence, . Thus we have
Thus, CGLS minimizes , where . The iterates satisfy .
Therefore, AB-GMRES with , as well as LSQR and CGLS for , minimize , where , and the iterates (solutions) are in the same space, i.e., . Thus, these methods are mathematically equivalent.
In finite precision arithmetic, AB-GMRES with should be numerically more stable than CGLS and LSQR, since AB-GMRES is based on the Arnoldi process whereas LSQR and CGLS rely on short-term recurrences. In fact, AB-GMRES may be numerically equivalent to LSQR and CGLS with full reorthogonalization , and this is confirmed by our numerical experiments (which are not included here).
Now, it is well known that CGLS has good regularizing properties for discrete ill-posed problems, leading to semi-convergence, see, e.g. [15, 11, 12, 22]. When , we may still apply AB-GMRES, while applying LSQR (which is equivalent to applying CG to the unmatched normal equations ) is not well founded and may be problematic since is neither symmetric nor positive semi-definite. Also, we may still expect semi-convergence of AB-GMRES, as will be demonstrated in the numerical experiments. Specifically, in Section 6.6 we study experimentally how semi-convergence is influenced by the difference between and .
5.3 The BA-GMRES Algorithm with a Matched Transpose
BA-GMRES applies GMRES to . It minimizes where , and
BA-GMRES with applies GMRES to . It minimizes
and BA-GMRES with minimizes . The iterates satisfy
BA-GMRES with applies GMRES to
which is mathematically equivalent to applying MINRES to the normal equations , and which is equivqlent to LSMR . Again, BA-GMRES with should be numerically more stable than LSMR. It may be numerically equivalent to LSMR with full reorthogonalization, and again our numerical experiments (not included here) confirm this.
Similar to CGLS, LSMR also has good regularizing properties when applied to discrete ill-posed problems [8, 23]. We may still apply BA-GMRES when , while it is not well founded to apply LSMR since is no longer symmetric and we cannot apply MINRES to . We expect that BA-GMRES exhibits semi-convergence, as will be demonstrated in the numerical experiments.
6 Numerical Examples
In this section we illustrate the use of the AB- and BA-GMRES methods for CT problems with an unmatched transpose. We start with numerical results related to the eigenvalues of the iterations matrices and , and then we demonstrate the semi-convergence of the methods. All computations are performed in MATLAB using our own implementations of AB- and BA-GMRES which are available from us, LSQR is from Regularization Tools  and LSMR is from MathWorks’ File Exchange .
6.1 The Test Problems
|Parameter||Small matrix||Large matrix||Ground truth|
|No. projection angles||180||600|
|No. detector elements||128||420|
The matrices and used in these experiments are representative for the matrices in many CT software packages and applications. They are generated by means of the CPU version of the software package ASTRA 
; the matrices in the GPU version are not explicitly available when using this software. Three different discretization models are provided in ASTRA: the line model, the strip model, and the interpolation (or Joseph) model; see[17, Chapter 9] for details. We can then use any of the corresponding matrices , and to generate unmatched pairs . We use a parallel-beam geometry and two different sizes of these matrices corresponding to the parameters listed in Table 6.1, while Table 6.2 lists the relative norm-wise differences between these matrices. The exact solution is generated by means of the function
As mentioned in the Introduction, the standard SIRT iterative methods will not converge when has complex eigenvalues with negative real part. To demonstrate that this is the case for the discretizations used here, Figure 6.1 shows the leftmost and the largest eigenvalues of computed by means of MATLAB’s eigs, for two combinations of the small and large test matrices, namely, and . In both cases there are indeed eigenvalues with negative real parts, meaning that the SIRT methods do not converge. Hence, it is natural to use the AB- and BA-GMRES methods for these matrices. The leftmost eigenvalues of the third combination have tiny negative real parts.
6.3 Error Histories
|0.1047 (34)||0.1203 (36)||0.1039 (38)||0.1199 (41)||0.0954 (57)||0.0996 (62)||0.0946 (68)||0.0990 (77)|
|0.0985 (41)||0.1059 (42)||0.0978 (47)||0.1056 (50)||0.0954 (53)||0.0996 (61)||0.0946 (63)||0.0990 (77)|
|0.0878 (61)||0.0926 (77)||0.0871 (74)||0.0921 (85)||0.0896 (52)||0.0879 (64)||0.0875 (63)||0.0795 (80)|
|0.0899 (50)||0.0927 (60)||0.0894 (60)||0.0922 (69)||0.0896 (52)||0.0870 (64)||0.0875 (63)||0.0795 (80)|
|0.1005 (39)||0.1112 (42)||0.0998 (44)||0.1105 (47)||0.0935 (53)||0.0948 (61)||0.0928 (63)||0.0940 (77)|
|0.0915 (55)||0.0930 (68)||0.0901 (67)||0.0920 (84)||0.0935 (53)||0.0948 (61)||0.0928 (63)||0.0940 (77)|
To illustrate the semi-convergence we use both AB-GMRES and BA-GMRES to solve systems with small and large test matrices and with the noise level . We tried all six pairs with , and for comparison we also used LSQR and LSMR (which correspond to the case ). To check our implementations, we verified numerically that AB- and BA-GMRES with give the same results LSQR and LSMR (we do not show there results here). Figure 6.2 shows the error histories, i.e., the relative reconstruction error versus the number of iterations for the cases and . Figure 6.3 shows the reconstructions for the case . These results are representative for all six pairs of matrices.
When we refer to the “reconstruction error,” we mean the relative error at the point of semi-convergence (i.e., the minimum of the error histories) indicated by the markers in the plots. The reconstruction errors for all pairs are shown in Table 6.3. For each of the six pairs we observe the following:
We obtain almost the same reconstruction error for AB-GMRES and BA-GMRES, cf. block columns 2 and 3.
We obtain almost the same reconstruction error for LSQR and LSMR, cf. block columns 4 and 5.
When then LSQR and LSMR give slightly smaller reconstruction errors than AB- and BA-GMRES, cf. the top block row as well as Figure 6.2.
For the other two matrices, all four methods give almost the same reconstruction errors, cf. the middle and bottom block rows.
The pairs and give marginally more accurate reconstructions than the other pairs, cf. the middle block row.
Often, AB-GMRES uses just slightly fewer iterations than BA-GMRES – and occasionally is uses significantly less iterations.
LSQR always uses less iterations than LSMR.
We also carried out experiments with under-determined problems (which are not documented here). The conclusions remain the same.
Table 6.3 also lists the required amount of storage for the orthogonal -vectors which are of length and for AB-GMRES and BA-GMRES, respectively. For the overdetermined systems used here with , in spite of BA-GMES consistently using more iterations than AB-GMRES, BA-GMRES requires less storage due to the shorter -vectors. For underdetermined systems (not reported here), AB-GMRES has the advantage of less iterations and shorter -vectors, cf. [19, p. 2408].
We emphasize that the choice of and is dictated by the available software, and therefore one may not always have a choice of the implementation used in and . Moreover, the above results are for matrices used in the ASTRA software (which allows easy access to the matrices); other packages may use different discretization methods. We also stress that in these experiments we perform inverse crime, meaning that the noise-free data is generated as ; hence the data is different for the three choices of meaning that we do not solve precisely the same problem for each choice of . Therefore, the above results provide important insight about the influence of , but they do not determine what is the best choice of and for a CT problem with real data and no inverse crime.
6.4 Stopping Rules
Here we demonstrate the use of two stopping rules that seek to terminate the iterations at the point of semi-convergence.
The discrepancy principle (DP)  terminates the iterations as soon as the residual norm is smaller than the noise level:
Here, is a “safety factor” that can be used when we have only a rough estimate of . We use and the exact value of .
The NCP criterion uses the normalized cumulative periodogram to perform a spectral analysis of the residual vector , in order to identify when the residual is as close to being white noise as possible, which indicates that all available information has been extracted from the noisy data. See [16, Section 2.3.3] and [18, SectionII.D] for details; MATLAB code is available from us.
We apply these stopping rules to the same pairs and as in Figure 6.2, with the same noise level , and the results are shown in Figure 6.4. We obtain similar results for the other pairs and hence they are not shown here. We make the following observations:
Both DP and NCP stop the iterations before the minimum is reached. This is better than stopping too late, in which case we would include undesired noise in the solution.
Both DP and NCP stop the iterations when the error history starts to level off; the minimum of the error history is quite flat so this is acceptable.
For unknown reasons, NCP always stops the iterations a bit earlier than DP.
We conclude that both stopping rules work well for this problem. The DP stopping rule requires a good estimate of the noise level; if this is not available (as is typical in CT problems) then the performance of NCP is only slightly inferior to DP.