A note on 2× 2 block-diagonal preconditioning

For 2x2 block matrices, it is well-known that block-triangular or block-LDU preconditioners with an exact Schur complement (inverse) converge in at most two iterations for fixed-point or minimal-residual methods. Similarly, for saddle-point matrices with a zero (2,2)-block, block-diagonal preconditioners converge in at most three iterations for minimal-residual methods, although they may diverge for fixed-point iterations. But, what happens for non-saddle-point matrices and block-diagonal preconditioners with an exact Schur complement? This note proves that minimal-residual methods applied to general 2x2 block matrices, preconditioned with a block-diagonal preconditioner, including an exact Schur complement, do not (necessarily) converge in a fixed number of iterations. Furthermore, examples are constructed where (i) block-diagonal preconditioning with an exact Schur complement converges no faster than block-diagonal preconditioning using diagonal blocks of the matrix, and (ii) block-diagonal preconditioning with an approximate Schur complement converges as fast as the corresponding block-triangular preconditioning. The paper concludes by discussing some practical applications in neutral-particle transport, introducing one algorithm where block-triangular or block-LDU preconditioning are superior to block-diagonal, and a second algorithm where block-diagonal preconditioning is superior both in speed and simplicity.

• 13 publications
• 1 publication
11/06/2019

On fixed-point, Krylov, and 2× 2 block preconditioners for nonsymmetric problems

The solution of matrices with 2× 2 block structure arises in numerous ar...
04/07/2020

Nearly optimal scaling in the SR decomposition

In this paper we analyze the nearly optimal block diagonal scalings of t...
02/03/2020

A power Schur complement Low-Rank correction preconditioner for general sparse linear systems

An effective power based parallel preconditioner is proposed for general...
11/02/2020

Identification of Matrix Joint Block Diagonalization

Given a set 𝒞={C_i}_i=1^m of square matrices, the matrix blind joint blo...
09/28/2017

Preconditioners for Saddle Point Problems on Truncated Domains in Phase Separation Modelling

The discretization of Cahn-Hilliard equation with obstacle potential lea...
08/18/2021

Schur complement based preconditioners for twofold and block tridiagonal saddle point problems

In this paper, two types of Schur complement based preconditioners are s...
05/26/2019

1 2×2 block preconditioners

Consider a block matrix equation,

 [A11A12A21A22][x1x2][b1b2], (1)

where and are square (although potentially different sizes), and at least one is invertible. Here we assume without loss of generality that is nonsingular, and focus on the case of (we refer to the case of as saddle-point in this paper, unless otherwise specified).

Such systems arise in numerous applications, and are often solved iteratively using either fixed-point or Krylov methods with some form of block preconditioning [3, 22]. The four most common block preconditioners are block diagonal, block upper triangular, block lower triangular, and block LDU, which we will denote , , , and , respectively. It is generally believed that one of the diagonal blocks in these preconditioners must approximate the appropriate Schur complement. Given we assumed to be nonsingular, here we focus on the -Schur complement, . Assume the action of and are available, and define block preconditioners as follows:

 L :=[A110A21S22],U:=[A11A120S22], D± (2) M :=[I0A21A−111I][A1100S22][IA−111A120I].

Note, is a (more practical) variation on block-diagonal preconditioning that assumes is nonsingular and its inverse available, rather than the Schur complement . The subscript on block-diagonal preconditioners corresponds to the sign on the (2,2)-block. For the common case of a symmetric indefinite matrix , with symmetric negative definite (SND) and symmetric positive definite (SPD), block-diagonal preconditioners with a negative (2,2)-block (like or ) are SPD, and a three-term recursion such as preconditioned MINRES can be applied [15].

In the following, we restate and tighten known results regarding fixed-point and minimal residual iterations [15, 17] in exact arithmetic, using preconditioners in (1), for the general case (1) and the saddle-point case, where . First, note the following proposition.

Proposition 1

Let be invertible. Then dim(ker( dim(ker()).

Proof

Note that if is singular, such that . Under the assumption that is nonsingular, expanding in block form (1) shows that can be singular if and only if , with corresponding ker. Thus the dimension of ker equals the dimension of ker, and is nonsingular if and only if is nonsingular.

For block-diagonal preconditioner , there is an implicit assumption that and are invertible, in which case (by Proposition 1) is invertible and is invertible. Proofs regarding the number of preconditioned minimal-residual iterations to exact convergence with preconditioner in [11, 10]

include the possibility of a zero eigenvalue in the characteristic polynomial. Since this cannot be the case, the maximum number of iterations is three, not four as originally stated in

[11, 10] (see Proposition 3).

Proposition 2 (2×2 block preconditioners)

1. Fixed-point iterations [20] and minimal residual methods [11, 10] preconditioned with , , and converge in at most two iterations.

2. If and are nonsingular, convergence of fixed-point and minimal residual methods preconditioned with is defined by convergence of an equivalent method applied to the preconditioned Schur complements, and (roughly twice the number of iterations as the larger of the two) [20].

Proposition 3 (2×2 block saddle-point preconditioners (A22=0))

1. Fixed-point iterations preconditioned with do not necessarily converge [20].

2. Minimal residual methods preconditioned with converge in at most three iterations (see [11, 10], and Proposition 1).

Note that block-triangular and LDU-type preconditioners have a certain optimality – in exact arithmetic and using the exact Schur complement, convergence is guaranteed in two iterations, while convergence using an approximate Schur complement is exactly defined by the preconditioned Schur complement-problem. This provides general guidance in the development of block preconditioners, that the Schur complement should be well approximated for a robust method.

The above propositions suggest one might also want a block-diagonal preconditioner to approximate the Schur complement in one of the blocks. Indeed, block-diagonal preconditioners are often designed to approximate a Schur complement in one of the blocks (for example, see [18]). However, it turns out that block-diagonal preconditioners with an exact Schur complement, for , are not optimal in the sense that block-triangular and LDU preconditioners are. That is, block-diagonally preconditioned minimal-residual methods are not guaranteed to converge in iterations (see Section 2 and Theorem 1), and convergence is not defined by the preconditioned Schur complement.

Numerous papers have looked at eigenvalue analyses for block preconditioning (for example, [4, 7, 2, 5, 11, 10, 1, 18, 12, 16]). This paper falls into the series of notes on the spectrum of various block preconditioners with exact inverses [11, 10, 4, 5], here presenting results for the nonsymmetric and non-saddle-point, block-diagonal case. In addition to the notes, this work is perhaps most related to [18]. There, a lengthy eigenvalue analysis is done on approximate block-diagonal preconditioning, with approximations to and , but the cases of exact inverses as in (1) are not directly considered or discussed.

Section 2 presents the new theory, deriving the spectrum of preconditioned operators and as a nonlinear function of the rank of and , and the generalized eigenvalues of matrix pencils or . A brief discussion on implications, particularly for symmetric matrices is given in Section 2.1, where examples are constructed to show that block-diagonal preconditioning with an exact Schur compelment can converge relatively slowly, that block-diagonal preconditioning with an approximate Schur complement can be as fast as, or significantly slower than, block triangular preconditioning (rather than slower when [8]), and that block-diagonal preconditioning is likely more robust with symmetric indefinite matrices than SPD. A practical example in the simulation of neutral-particle transport is discussed in Section 3.

2 Block-diagonal preconditioning

It is shown in [11, 10] that block-diagonal and block-triangular preconditioners for saddle-point problems, with an exact Schur complement, have three and two nonzero eigenvalues. Thus, the minimal polynomial for minimal residual methods will be exact in a Krylov space of at most that size.

Theorem 1 derives the full eigenvalue decomposition for general operators, preconditioned by the block-diagonal preconditioners and . In particular, the spectrum of the preconditioned operator is fully defined by the generalized eigenvalue problems or , depending on whether we assume or are nonsingular, respectively. If , then , and the results from [11, 10] immediately follow. For , the preconditioned operator can have more eigenvalues, up to a full set of

distinct eigenvalues (or, the generalized eigenvectors could not form a complete basis for the space), and no such statements can be made on the guaranteed convergence of minimal-residual methods.

Theorem 1 (Eigendecomposition of block-diagonal preconditioned operators)

Assume that and are invertible. Let be a generalized eigenpair of . Then, the spectra of the preconditioned operator for block-diagonal preconditioners (1) are given by:

 σ(D−1+A) =(˜λ+12±12√(˜λ−1)(˜λ+3))\scalebox1.5$∪${1}, σ(D−1−A) =(−˜λ−12±12√(˜λ−1)2+4)\scalebox1.5$∪${±1}.

Assume that and are invertible. Now let be a generalized eigenpair of . Then, the spectra of the preconditioned operator for block-diagonal preconditioners (1) are given by:

 σ(ˆD−1+A) =(1±√1−ˆλ)\scalebox1.5$∪${±1}, σ(ˆD−1−A) =(±√ˆλ)\scalebox1.5$∪${±1}.

The multiplicity of the eigenvalues are given by the dimensions of the nullspace of and . Eigenvectors for each eigenvalue can also be written in a closed form as a function of , and are derived in the proof (see Section 2.2).

Now assume is nonsingular. By continuity of eigenvalues as a function of matrix entries,

 limA22→0σ(D−1+A) ={1,1±i√32},limA22→0σ(D−1−A)={1,1±√52},

and the maximum number of iterations for exact convergence of minimal-residual methods converges to three. Note that the limit of does not include because by assuming is nonsingular, there is an implicit assumption that ket, which means an eigenvalue of (see Lemma 2).

2.1 Discussion

The primary implication of Theorem 1 is that block-diagonal preconditioned minimal-residual methods with an exact Schur complement do not necessarily converge in a fixed number of iterations. Recall block-diagonal preconditioning is often applied to symmetric matrices for use with MINRES, and convergence of such Krylov methods is better understood than GMRES, Here, we include a brief discussion on the eigenvalues of block-diagonally preconditioned symmetric block matrices with semi-definite diagonal blocks.

Consider real-valued, symmetric, nonsingular matrices with nonzero saddle-point structure, that is,

 A=[A11A12AT12A22], (3)

where is SPD and is symmetric semidefinite (positive or negative definite). Such matrices arise often in practice; see review papers [3, 22] for many different examples. From Theorem 1, we can consider the spectrum of the preconditioned operator based on the generalized spectrum of . Figure 1 plots the magnitude of eigenvalues and as a function of in a log-log scale (excluding zero on the logarithmic x-axis).

Here we can see that when , provides a clearly superior preconditioning of eigenvalues than . Conversely, for , likely provides a comparable or potentially superior preconditioning of eigenvalues. The following example demonstrates how SPD operators may be less amenable to block-diagonal Schur-complement preconditioning.

Example 1 (SPD operators)

Suppose is SPD. Then , and are also SPD, and is symmetric positive semi-definite. Notice that , and the eigenvalues of are real and non-negative. It follows that and . Looking at Figure 1, eigenvalues of the preconditioned operator are largely outside of the region where a Schur complement provides excellent preconditioning of eigenvalues.

Now let , and be random

matrices uniformly distributed between

. Then, for positive constants , define

 (4)

where is chosen such that and have the same minimal magnitude eigenvalue, and . Here and are symmetric matrices, with SPD (1,1)-block, and SPD or SND (2,2)-block. Table 1 shows the number of block preconditioned GMRES iterations to converge to

relative residual tolerance, using a one vector for the right-hand side, and various

. Results are shown for block preconditioners , , , and , a lower-triangular preconditioner using instead of .

From Table 1, note that block-diagonal preconditioning with an exact Schur complement can make for a relatively poor preconditioner (see row 1/1/1). Moreover, for a problem with some sense of block-diagonal dominance (in this case, scaling the off-diagonal blocks by 1/20 compared with diagonal blocks; see row 20/1/1), the exact Schur complement provides almost no improvement over a standard block-diagonal preconditioner, . Last, it is worth pointing out that for all cases, convergence of GMRES is faster applied to () than (), in some cases up to faster, which is generally consistent with Figure 1 and the corresponding discussion.111Note that and are still fundamentally different and there may be other factors that explain the difference in convergence. However, for all tests with a given set of random matrices and fixed , the condition number, minimum magnitude eigenvalue, and maximum magnitude eigenvalue of and are within 1%.

Finally, we consider how the theory extends to approximate block preconditioners. Consider with constants and 1/10/1 (4), and two (artificial) approximate Schur complements,

 S∗22 =A22−(1−ε∗)A21A−111A12,S×22=A22−A21A−111A12+ε×E,

for error scaling . Here, is a random error matrix with entries uniformly distributed between , where is the average magnitude of entries in . Note that for we have , while and generally become decreasingly accurate approximations to as increases. Also, yields . Then, consider block preconditioners , and , which take the form of and in (1), replacing or . Figure 2 shows the ratio of block-diagonal to block-lower-triangular preconditioned GMRES iterations to relative residual tolerance, for and , as a function of the corresponding error scaling. Results for are also shown with .

Note that for and , and are significantly more effective as preconditioners than and , in both cases requiring less GMRES iterations (with and only requiring iterations). At , and, consistent with theory in [20], takes almost exactly twice as many iterations as (interestingly, the same ratio holds for , although this appears to be coincidence). Finally, for , takes almost the same number of iterations as . Despite both being relatively poor preconditioners, this again demonstrates that block-diagonal preconditioning can yield convergence just as fast as block-triangular preconditioning in some cases. Last, for , we observe the expected result that a block-diagonal preconditioner will require twice as many iterations as block lower-triangular using the same Schur complement approximation, when [8].

The above discussion highlights some of the intricacies of Schur-complement preconditioning, and some of the practical insights we can learn from this analysis. An interesting open question is whether the Schur complement is optimal in any sense for block-diagonal preconditioners, or if there is a different “optimal” operator for the (2,2)-block. Of course, optimal in what sense is perhaps the larger question, because as demonstrated in Figure 1 and Table 1 (and generally known [9]), the eigenvalues of the preconditioned operator can all be very nicely bounded, but GMRES can observe arbitrarily slow convergence.

2.2 Proofs

Proof (Proof of Theorem 1)

The proof is presented as a sequence of lemmas for each preconditioner.

Proof

By nature of the preconditioner, we assume that and are nonsingular and, thus, so is . Noting the identity , one can expand to the block form

 D−1+A

Now consider the corresponding eigenvalue problem,

 =λ[xy], (5)

which can be expanded as the set of equations

 x+A−111A12y =λx, S−122A21(x+A−111A12y) =(λ−1)y.

If or are not full rank, then for each ker or each ker, there exist eigenpairs or , respectively (the denotes a column vector). This accounts for dim(ker( dim(ker( eigenvalues. For ker and ker, solving for yields . Plugging in reduces (5) to an eigenvalue problem for ,

 S−122A21A−111A12y =(1−λ)2λy. (6)

Let and assume . Expanding, we have , and solving for yields .

Now consider the eigenvalue problem in (6) for . Applying to both sides and rearranging yields the generalized eigenvalue problem . Subtracting from both sides then yields the equivalent generalized eigenvalue problem

 A22y =(1+ˆλ)S22y. (7)

Thus, let be a generalized eigenvalue of in (7). Solving for yields as a function of generalized eigenvalues of (note, for ker, we have ). Furthermore, the generalized eigenvectors in (7) are the same as eigenvectors in the (2,2)-block for the preconditioned operator (5). Plugging into the equation above for , we can express the eigenvalues and eigenvectors of as a nonlinear function of the generalized eigenvalues of ,

 σ(D−1+A)=(˜λ+12±12√(˜λ−1)(˜λ+3))\scalebox1.5$∪${1}, (8)

Eigenvectors of the former set in (8) are given by for each generalized eigenpair of (7) and eigenvalue . Note that each generalized eigenpair of corresponds to two eigenpairs of , because is a quadratic equation in . The latter eigenvalue in (8) has multiplicity given by the sum of dimensions of ker and ker, with eigenvectors as defined previously in the proof.

Proof

The proof proceed analogous to Lemma 1. The block-preconditioned eigenvalue problem can be expressed as the set of equations

 x+A−111A12y =λx, −S−122A21(x+A−111A12y) =(λ+1)y.

If or are not full rank, then for each ker or each ker, there exist eigenpairs or , respectively. This accounts for dim(ker( dim(ker( eigenvalues. For ker and ker, solving for yields . Plugging in reduces to an eigenvalue problem for ,

 S−122A21A−111A12y =(1−λ)(1+λ)λy. (9)

Let and assume . Expanding, we have , and solving for yields . Plugging as in Lemma 1 yields the spectrum of as a nonlinear function of the generalized eigenvalues of ,

 σ(D−1−A)=(−˜λ−12±12√(˜λ−1)2+4)\scalebox1.5$∪${±1}. (10)

Eigenvectors of the former set in (10) are given by for each generalized eigenpair of (7) and eigenvalue . Note that each generalized eigenpair of corresponds to two eigenpairs of , because is a quadratic equation in . The latter eigenvalues and in (10) have multiplicity given by dimensions of ker and ker, respectively, with eigenvectors as defined previously in the proof.

Proof

Observe

 ˆD−1+A =[A−11100A−122][A11A12A21A22]=I+[0A−111A12A−122A210].

with eigenvalues and eigenvectors determined by the off-diagonal matrix on the right. If or are not full rank, then for each ker or each ker, there exist eigenpairs or , respectively. For ker and ker, we consider eigenvalues of the off-diagonal matrix in (2.2). Squaring this matrix results in the block-diagonal eigenvalue problem

 [A−111A12A−122A2100A−122A21A−111A12][xy] =ˆλ[xy]. (11)

We can restrict ourselves to the case of ker and ker. Note by similarity that the spectrum of is equivalent to that of . Now suppose , and let . Then . Thus if is a nonzero eigenpair of , then is an eigenpair of . To that end, eigenvalues of (11) consist of 0, corresponding to the kernels of and , and the nonzero eigenvalues of the lower diagonal block (or upper), all with multiplicity two.

Now note that , and let be a generalized eigenvalue of . Then, , and the spectrum of can be written as a nonlinear function of the generalized eigenvalues of ,

 σ(ˆD−1+A) =(1±√1−˜λ)\scalebox1.5$∪${±1},

with inclusion of the latter set of eigenvalues depending on the rank of and /, respectively.

Note, each vector ker correspond to a ker. In the context of the generalized eigenvalues of , each is one-to-one with an eigenvalue of , which can be seen by plugging into (2.2).

Proof

Observe

 ˆD−1−A =[A−11100−A−122][A11A12A21A22]=−I+[2IA−111A12−A−122A210], (12)

with eigenvalues and eigenvectors determined by the saddle-point matrix on the right. If or are not full rank, then for each ker or each ker, there exist eigenpairs or , respectively. For ker and ker, analogous to Lemma 1 we can formulate an eigenvalue problem for the saddle-point matrix, with eigenvalue and block eigenvector . Solving for and plugging into the equation for yields an equivalent reduced eigenvalue problem,

 A−122A21A−111A12y =ˆλ(2−ˆλ)y. (13)

Note, this is well-posed in the sense that we only care about ker and ker. Noting that , (13) is equivalent to the generalized eigenvalue problem

 S22y =(1−ˆλ)2A22y. (14)

Letting be a generalized eigenvalue of in (14) and solving for yields . Including the minus identity perturbation in (12) yields the spectrum of as a nonlinear function of the generalized eigenvalues of in (14), given by

 σ(ˆD−1−A)=(±√˜λ)\scalebox1.5$∪${±1},

Eigenvectors corresponding to eigenvalues are given by for each generalized eigenpair of (7) and eigenvalue . Note that each generalized eigenpair of corresponds to two eigenpairs of , because is a quadratic equation in . The latter eigenvalue in (8) has multiplicity given by the sum of dimensions of ker and ker, with eigenvectors as defined previously in the proof.

3 Applications in transport: H1⊗L2 Vef

3.1 The variable Eddington factor (VEF) method

The steady-state, mono-energetic, discrete-ordinates, linear Boltzmann equation with isotropic scattering and source is given by

 Ωd⋅∇ψd(x)+σt(x)ψd(x) =σs(x)4πN∑d′=1wd′ψd′(x)+Q(x)4π,x∈D, (15) ψd(x) =¯¯¯¯ψ(x,Ωd),x∈∂D and Ωd⋅n<0.

Here, the direction of particle motion, , is discretized into angles corresponding to a quadrature rule for the unit sphere, , for quadrature weights . A standard approach to solve (15) is to (independently) invert the left-hand side for each , update , and repeat until convergence. Unfortunately, for many applications of interest, this process can converge arbitrarily slowly.

An alternative scheme for solving (15) is the Variable Eddington factor (VEF) method, a nonlinear iterative scheme that builds an exact reduced-order model of the transport equation [21, 6]. The VEF reduced-order model is given by

 ∇⋅J(x)+σa(x)φ(x)=Q(x)∇⋅[E(x)φ(x)]+σt(x)J(x)=0. (16)

It is derived by taking the zeroth and first angular moments of (

15

), and introducing the Eddington tensor as a closure:

 E(x)=∑Nd=1Ωd⊗Ωdψd(x)wd∑Nd=1ψd(x)wd. (17)

The algorithm is then to replace the scattering sum in (15) with the solution of the VEF equations, , invert (15) for each angle, update the Eddington tensor, and solve the VEF equations.

Here we consider an mixed finite element discretization of (16), as in [14, 13]. This yields the block system

 [Mt−GBMa][Jφ]=[gf], (18)

where is an mass matrix, a (block diagonal) mass matrix, the divergence matrix, and the Eddington gradient matrix, defined as . Note that for , and (18) simplifies to a saddle-point system with zero (2,2)-block. Due to the Eddington tensor, and, thus, at every VEF iteration a non-symmetric block operator of the form in (1) must be solved.

3.2 Solution of H1⊗L2 VEF discretization

There are a number of advantages to VEF over other methods of solving (15) [14, 13], but the system in (18) remains difficult to solve. However, because the diagonal blocks of (18) are mass matrices and relatively easy to invert, direct application or preconditioning of the Schur complement, , is feasible, which corresponds to a drift-diffusion-like equation. The remaining challenge is that is an mass matrix and, although the action of can be computed rapidly using Krylov methods, a closed form to construct is typically not available. A common solution is to approximate using quadrature lumping, resulting in a diagonal approximation, . Thus, consider the four preconditioners,

 D=[Mt00Ma+BM−1tG],L=[Mt0BMa+BM−1tG],˜D=[Mt00Ma+B˜Mt−1G],˜L=[Mt0BMa+B˜Mt−1G]. (19)

Consider a 1d discretization of (16), where is discretized with quadratic and with linear , with and absorption to demonstrate the effect of the block ( when ). In the limit of , , wherein scaling the second row of (18) by then results in a symmetric operator. Although this is unlikely to occur on the whole domain in practice, we perform tests on this case as well to demonstrate that results do not depend on nonsymmetry in (18). Table 2 shows the number of GMRES iterations to converge to relative residual tolerance using block preconditioners in (19).

The unlumped block-diagonal and triangular preconditioners converge in two or four iterations when the block is zero, largely consistent with theory in [11, 10]. However, when , block-diagonal preconditioning requires 12 iterations. Although this is better than the several hundred seen in Table 1, it is still