1 block preconditioners
Consider a block matrix equation,
(1) 
where and are square (although potentially different sizes), and at least one is invertible. Here we assume without loss of generality that is nonsingular, and focus on the case of (we refer to the case of as saddlepoint in this paper, unless otherwise specified).
Such systems arise in numerous applications, and are often solved iteratively using either fixedpoint or Krylov methods with some form of block preconditioning [3, 22]. The four most common block preconditioners are block diagonal, block upper triangular, block lower triangular, and block LDU, which we will denote , , , and , respectively. It is generally believed that one of the diagonal blocks in these preconditioners must approximate the appropriate Schur complement. Given we assumed to be nonsingular, here we focus on the Schur complement, . Assume the action of and are available, and define block preconditioners as follows:
(2)  
Note, is a (more practical) variation on blockdiagonal preconditioning that assumes is nonsingular and its inverse available, rather than the Schur complement . The subscript on blockdiagonal preconditioners corresponds to the sign on the (2,2)block. For the common case of a symmetric indefinite matrix , with symmetric negative definite (SND) and symmetric positive definite (SPD), blockdiagonal preconditioners with a negative (2,2)block (like or ) are SPD, and a threeterm recursion such as preconditioned MINRES can be applied [15].
In the following, we restate and tighten known results regarding fixedpoint and minimal residual iterations [15, 17] in exact arithmetic, using preconditioners in (1), for the general case (1) and the saddlepoint case, where . First, note the following proposition.
Proposition 1
Let be invertible. Then dim(ker( dim(ker()).
Proof
Note that if is singular, such that . Under the assumption that is nonsingular, expanding in block form (1) shows that can be singular if and only if , with corresponding ker. Thus the dimension of ker equals the dimension of ker, and is nonsingular if and only if is nonsingular.
For blockdiagonal preconditioner , there is an implicit assumption that and are invertible, in which case (by Proposition 1) is invertible and is invertible. Proofs regarding the number of preconditioned minimalresidual iterations to exact convergence with preconditioner in [11, 10]
include the possibility of a zero eigenvalue in the characteristic polynomial. Since this cannot be the case, the maximum number of iterations is three, not four as originally stated in
[11, 10] (see Proposition 3).Proposition 2 ( block preconditioners)

If and are nonsingular, convergence of fixedpoint and minimal residual methods preconditioned with is defined by convergence of an equivalent method applied to the preconditioned Schur complements, and (roughly twice the number of iterations as the larger of the two) [20].
Proposition 3 ( block saddlepoint preconditioners ()

Fixedpoint iterations preconditioned with do not necessarily converge [20].

Minimal residual methods preconditioned with converge in at most three iterations (see [11, 10], and Proposition 1).
Note that blocktriangular and LDUtype preconditioners have a certain optimality – in exact arithmetic and using the exact Schur complement, convergence is guaranteed in two iterations, while convergence using an approximate Schur complement is exactly defined by the preconditioned Schur complementproblem. This provides general guidance in the development of block preconditioners, that the Schur complement should be well approximated for a robust method.
The above propositions suggest one might also want a blockdiagonal preconditioner to approximate the Schur complement in one of the blocks. Indeed, blockdiagonal preconditioners are often designed to approximate a Schur complement in one of the blocks (for example, see [18]). However, it turns out that blockdiagonal preconditioners with an exact Schur complement, for , are not optimal in the sense that blocktriangular and LDU preconditioners are. That is, blockdiagonally preconditioned minimalresidual methods are not guaranteed to converge in iterations (see Section 2 and Theorem 1), and convergence is not defined by the preconditioned Schur complement.
Numerous papers have looked at eigenvalue analyses for block preconditioning (for example, [4, 7, 2, 5, 11, 10, 1, 18, 12, 16]). This paper falls into the series of notes on the spectrum of various block preconditioners with exact inverses [11, 10, 4, 5], here presenting results for the nonsymmetric and nonsaddlepoint, blockdiagonal case. In addition to the notes, this work is perhaps most related to [18]. There, a lengthy eigenvalue analysis is done on approximate blockdiagonal preconditioning, with approximations to and , but the cases of exact inverses as in (1) are not directly considered or discussed.
Section 2 presents the new theory, deriving the spectrum of preconditioned operators and as a nonlinear function of the rank of and , and the generalized eigenvalues of matrix pencils or . A brief discussion on implications, particularly for symmetric matrices is given in Section 2.1, where examples are constructed to show that blockdiagonal preconditioning with an exact Schur compelment can converge relatively slowly, that blockdiagonal preconditioning with an approximate Schur complement can be as fast as, or significantly slower than, block triangular preconditioning (rather than slower when [8]), and that blockdiagonal preconditioning is likely more robust with symmetric indefinite matrices than SPD. A practical example in the simulation of neutralparticle transport is discussed in Section 3.
2 Blockdiagonal preconditioning
It is shown in [11, 10] that blockdiagonal and blocktriangular preconditioners for saddlepoint problems, with an exact Schur complement, have three and two nonzero eigenvalues. Thus, the minimal polynomial for minimal residual methods will be exact in a Krylov space of at most that size.
Theorem 1 derives the full eigenvalue decomposition for general operators, preconditioned by the blockdiagonal preconditioners and . In particular, the spectrum of the preconditioned operator is fully defined by the generalized eigenvalue problems or , depending on whether we assume or are nonsingular, respectively. If , then , and the results from [11, 10] immediately follow. For , the preconditioned operator can have more eigenvalues, up to a full set of
distinct eigenvalues (or, the generalized eigenvectors could not form a complete basis for the space), and no such statements can be made on the guaranteed convergence of minimalresidual methods.
Theorem 1 (Eigendecomposition of blockdiagonal preconditioned operators)
Assume that and are invertible. Let be a generalized eigenpair of . Then, the spectra of the preconditioned operator for blockdiagonal preconditioners (1) are given by:
Assume that and are invertible. Now let be a generalized eigenpair of . Then, the spectra of the preconditioned operator for blockdiagonal preconditioners (1) are given by:
The multiplicity of the eigenvalues are given by the dimensions of the nullspace of and . Eigenvectors for each eigenvalue can also be written in a closed form as a function of , and are derived in the proof (see Section 2.2).
Now assume is nonsingular. By continuity of eigenvalues as a function of matrix entries,
and the maximum number of iterations for exact convergence of minimalresidual methods converges to three. Note that the limit of does not include because by assuming is nonsingular, there is an implicit assumption that ket, which means an eigenvalue of (see Lemma 2).
2.1 Discussion
The primary implication of Theorem 1 is that blockdiagonal preconditioned minimalresidual methods with an exact Schur complement do not necessarily converge in a fixed number of iterations. Recall blockdiagonal preconditioning is often applied to symmetric matrices for use with MINRES, and convergence of such Krylov methods is better understood than GMRES, Here, we include a brief discussion on the eigenvalues of blockdiagonally preconditioned symmetric block matrices with semidefinite diagonal blocks.
Consider realvalued, symmetric, nonsingular matrices with nonzero saddlepoint structure, that is,
(3) 
where is SPD and is symmetric semidefinite (positive or negative definite). Such matrices arise often in practice; see review papers [3, 22] for many different examples. From Theorem 1, we can consider the spectrum of the preconditioned operator based on the generalized spectrum of . Figure 1 plots the magnitude of eigenvalues and as a function of in a loglog scale (excluding zero on the logarithmic xaxis).
Here we can see that when , provides a clearly superior preconditioning of eigenvalues than . Conversely, for , likely provides a comparable or potentially superior preconditioning of eigenvalues. The following example demonstrates how SPD operators may be less amenable to blockdiagonal Schurcomplement preconditioning.
Example 1 (SPD operators)
Suppose is SPD. Then , and are also SPD, and is symmetric positive semidefinite. Notice that , and the eigenvalues of are real and nonnegative. It follows that and . Looking at Figure 1, eigenvalues of the preconditioned operator are largely outside of the region where a Schur complement provides excellent preconditioning of eigenvalues.
Now let , and be random
matrices uniformly distributed between
. Then, for positive constants , define(4) 
where is chosen such that and have the same minimal magnitude eigenvalue, and . Here and are symmetric matrices, with SPD (1,1)block, and SPD or SND (2,2)block. Table 1 shows the number of block preconditioned GMRES iterations to converge to
relative residual tolerance, using a one vector for the righthand side, and various
. Results are shown for block preconditioners , , , and , a lowertriangular preconditioner using instead of .


N  
1/1/1  3  167  259  347  276  347  3  122  115  245  101  245 
20/1/1  3  36  74  75  74  75  3  33  61  65  60  65 
1/10/1  3  434  399  879  416  880  3  343  98  687  88  688 
1/10/10  3  501  102  1001  97  1001  3  496  67  995  62  993 

From Table 1, note that blockdiagonal preconditioning with an exact Schur complement can make for a relatively poor preconditioner (see row 1/1/1). Moreover, for a problem with some sense of blockdiagonal dominance (in this case, scaling the offdiagonal blocks by 1/20 compared with diagonal blocks; see row 20/1/1), the exact Schur complement provides almost no improvement over a standard blockdiagonal preconditioner, . Last, it is worth pointing out that for all cases, convergence of GMRES is faster applied to () than (), in some cases up to faster, which is generally consistent with Figure 1 and the corresponding discussion.^{1}^{1}1Note that and are still fundamentally different and there may be other factors that explain the difference in convergence. However, for all tests with a given set of random matrices and fixed , the condition number, minimum magnitude eigenvalue, and maximum magnitude eigenvalue of and are within 1%.
Finally, we consider how the theory extends to approximate block preconditioners. Consider with constants and 1/10/1 (4), and two (artificial) approximate Schur complements,
for error scaling . Here, is a random error matrix with entries uniformly distributed between , where is the average magnitude of entries in . Note that for we have , while and generally become decreasingly accurate approximations to as increases. Also, yields . Then, consider block preconditioners , and , which take the form of and in (1), replacing or . Figure 2 shows the ratio of blockdiagonal to blocklowertriangular preconditioned GMRES iterations to relative residual tolerance, for and , as a function of the corresponding error scaling. Results for are also shown with .
Note that for and , and are significantly more effective as preconditioners than and , in both cases requiring less GMRES iterations (with and only requiring iterations). At , and, consistent with theory in [20], takes almost exactly twice as many iterations as (interestingly, the same ratio holds for , although this appears to be coincidence). Finally, for , takes almost the same number of iterations as . Despite both being relatively poor preconditioners, this again demonstrates that blockdiagonal preconditioning can yield convergence just as fast as blocktriangular preconditioning in some cases. Last, for , we observe the expected result that a blockdiagonal preconditioner will require twice as many iterations as block lowertriangular using the same Schur complement approximation, when [8].
The above discussion highlights some of the intricacies of Schurcomplement preconditioning, and some of the practical insights we can learn from this analysis. An interesting open question is whether the Schur complement is optimal in any sense for blockdiagonal preconditioners, or if there is a different “optimal” operator for the (2,2)block. Of course, optimal in what sense is perhaps the larger question, because as demonstrated in Figure 1 and Table 1 (and generally known [9]), the eigenvalues of the preconditioned operator can all be very nicely bounded, but GMRES can observe arbitrarily slow convergence.
2.2 Proofs
Proof (Proof of Theorem 1)
The proof is presented as a sequence of lemmas for each preconditioner.
Lemma 1 (Spectrum of )
Proof
By nature of the preconditioner, we assume that and are nonsingular and, thus, so is . Noting the identity , one can expand to the block form
Now consider the corresponding eigenvalue problem,
(5) 
which can be expanded as the set of equations
If or are not full rank, then for each ker or each ker, there exist eigenpairs or , respectively (the denotes a column vector). This accounts for dim(ker( dim(ker( eigenvalues. For ker and ker, solving for yields . Plugging in reduces (5) to an eigenvalue problem for ,
(6) 
Let and assume . Expanding, we have , and solving for yields .
Now consider the eigenvalue problem in (6) for . Applying to both sides and rearranging yields the generalized eigenvalue problem . Subtracting from both sides then yields the equivalent generalized eigenvalue problem
(7) 
Thus, let be a generalized eigenvalue of in (7). Solving for yields as a function of generalized eigenvalues of (note, for ker, we have ). Furthermore, the generalized eigenvectors in (7) are the same as eigenvectors in the (2,2)block for the preconditioned operator (5). Plugging into the equation above for , we can express the eigenvalues and eigenvectors of as a nonlinear function of the generalized eigenvalues of ,
(8) 
Eigenvectors of the former set in (8) are given by for each generalized eigenpair of (7) and eigenvalue . Note that each generalized eigenpair of corresponds to two eigenpairs of , because is a quadratic equation in . The latter eigenvalue in (8) has multiplicity given by the sum of dimensions of ker and ker, with eigenvectors as defined previously in the proof.
Lemma 2 (Spectrum of )
Proof
The proof proceed analogous to Lemma 1. The blockpreconditioned eigenvalue problem can be expressed as the set of equations
If or are not full rank, then for each ker or each ker, there exist eigenpairs or , respectively. This accounts for dim(ker( dim(ker( eigenvalues. For ker and ker, solving for yields . Plugging in reduces to an eigenvalue problem for ,
(9) 
Let and assume . Expanding, we have , and solving for yields . Plugging as in Lemma 1 yields the spectrum of as a nonlinear function of the generalized eigenvalues of ,
(10) 
Eigenvectors of the former set in (10) are given by for each generalized eigenpair of (7) and eigenvalue . Note that each generalized eigenpair of corresponds to two eigenpairs of , because is a quadratic equation in . The latter eigenvalues and in (10) have multiplicity given by dimensions of ker and ker, respectively, with eigenvectors as defined previously in the proof.
Lemma 3 (Spectrum of )
Proof
Observe
with eigenvalues and eigenvectors determined by the offdiagonal matrix on the right. If or are not full rank, then for each ker or each ker, there exist eigenpairs or , respectively. For ker and ker, we consider eigenvalues of the offdiagonal matrix in (2.2). Squaring this matrix results in the blockdiagonal eigenvalue problem
(11) 
We can restrict ourselves to the case of ker and ker. Note by similarity that the spectrum of is equivalent to that of . Now suppose , and let . Then . Thus if is a nonzero eigenpair of , then is an eigenpair of . To that end, eigenvalues of (11) consist of 0, corresponding to the kernels of and , and the nonzero eigenvalues of the lower diagonal block (or upper), all with multiplicity two.
Now note that , and let be a generalized eigenvalue of . Then, , and the spectrum of can be written as a nonlinear function of the generalized eigenvalues of ,
with inclusion of the latter set of eigenvalues depending on the rank of and /, respectively.
Note, each vector ker correspond to a ker. In the context of the generalized eigenvalues of , each is onetoone with an eigenvalue of , which can be seen by plugging into (2.2).
Lemma 4 (Spectrum of )
Proof
Observe
(12) 
with eigenvalues and eigenvectors determined by the saddlepoint matrix on the right. If or are not full rank, then for each ker or each ker, there exist eigenpairs or , respectively. For ker and ker, analogous to Lemma 1 we can formulate an eigenvalue problem for the saddlepoint matrix, with eigenvalue and block eigenvector . Solving for and plugging into the equation for yields an equivalent reduced eigenvalue problem,
(13) 
Note, this is wellposed in the sense that we only care about ker and ker. Noting that , (13) is equivalent to the generalized eigenvalue problem
(14) 
Letting be a generalized eigenvalue of in (14) and solving for yields . Including the minus identity perturbation in (12) yields the spectrum of as a nonlinear function of the generalized eigenvalues of in (14), given by
Eigenvectors corresponding to eigenvalues are given by for each generalized eigenpair of (7) and eigenvalue . Note that each generalized eigenpair of corresponds to two eigenpairs of , because is a quadratic equation in . The latter eigenvalue in (8) has multiplicity given by the sum of dimensions of ker and ker, with eigenvectors as defined previously in the proof.
3 Applications in transport: Vef
3.1 The variable Eddington factor (VEF) method
The steadystate, monoenergetic, discreteordinates, linear Boltzmann equation with isotropic scattering and source is given by
(15)  
Here, the direction of particle motion, , is discretized into angles corresponding to a quadrature rule for the unit sphere, , for quadrature weights . A standard approach to solve (15) is to (independently) invert the lefthand side for each , update , and repeat until convergence. Unfortunately, for many applications of interest, this process can converge arbitrarily slowly.
An alternative scheme for solving (15) is the Variable Eddington factor (VEF) method, a nonlinear iterative scheme that builds an exact reducedorder model of the transport equation [21, 6]. The VEF reducedorder model is given by
(16) 
It is derived by taking the zeroth and first angular moments of (
15), and introducing the Eddington tensor as a closure:
(17) 
The algorithm is then to replace the scattering sum in (15) with the solution of the VEF equations, , invert (15) for each angle, update the Eddington tensor, and solve the VEF equations.
Here we consider an mixed finite element discretization of (16), as in [14, 13]. This yields the block system
(18) 
where is an mass matrix, a (block diagonal) mass matrix, the divergence matrix, and the Eddington gradient matrix, defined as . Note that for , and (18) simplifies to a saddlepoint system with zero (2,2)block. Due to the Eddington tensor, and, thus, at every VEF iteration a nonsymmetric block operator of the form in (1) must be solved.
3.2 Solution of VEF discretization
There are a number of advantages to VEF over other methods of solving (15) [14, 13], but the system in (18) remains difficult to solve. However, because the diagonal blocks of (18) are mass matrices and relatively easy to invert, direct application or preconditioning of the Schur complement, , is feasible, which corresponds to a driftdiffusionlike equation. The remaining challenge is that is an mass matrix and, although the action of can be computed rapidly using Krylov methods, a closed form to construct is typically not available. A common solution is to approximate using quadrature lumping, resulting in a diagonal approximation, . Thus, consider the four preconditioners,
(19) 
Consider a 1d discretization of (16), where is discretized with quadratic and with linear , with and absorption to demonstrate the effect of the block ( when ). In the limit of , , wherein scaling the second row of (18) by then results in a symmetric operator. Although this is unlikely to occur on the whole domain in practice, we perform tests on this case as well to demonstrate that results do not depend on nonsymmetry in (18). Table 2 shows the number of GMRES iterations to converge to relative residual tolerance using block preconditioners in (19).


Unlumped  Lumped  
Diagonal  Triangular  Diagonal  Triangular  
Nonsymm. 
0  401  2  2  44  23 
0  4001  4  4  53  25  
0.9  401  12  2  57  27  
0.9  4001  12  4  77  32  
Symm. 
0  401  2  2  44  23 
0  4001  4  4  55  25  
0.9  401  12  2  57  27  
0.9  4001  12  4  76  26  

The unlumped blockdiagonal and triangular preconditioners converge in two or four iterations when the block is zero, largely consistent with theory in [11, 10]. However, when , blockdiagonal preconditioning requires 12 iterations. Although this is better than the several hundred seen in Table 1, it is still