Log In Sign Up

Algebraic multigrid block preconditioning for multi-group radiation diffusion equations

The paper focuses on developing and studying efficient block preconditioners based on classical algebraic multigrid for the large-scale sparse linear systems arising from the fully coupled and implicitly cell-centered finite volume discretization of multi-group radiation diffusion equations, whose coefficient matrices can be rearranged into the (G+2)×(G+2) block form, where G is the number of energy groups. The preconditioning techniques are based on the monolithic classical algebraic multigrid method, physical-variable based coarsening two-level algorithm and two types of block Schur complement preconditioners. The classical algebraic multigrid is applied to solve the subsystems that arise in the last three block preconditioners. The coupling strength and diagonal dominance are further explored to improve performance. We use representative one-group and twenty-group linear systems from capsule implosion simulations to test the robustness, efficiency, strong and weak parallel scaling properties of the proposed methods. Numerical results demonstrate that block preconditioners lead to mesh- and problem-independent convergence, and scale well both algorithmically and in parallel.


page 1

page 2

page 3

page 4


Generalizing Reduction-Based Algebraic Multigrid

Algebraic Multigrid (AMG) methods are often robust and effective solvers...

Block preconditioning methods for asymptotic preserving scheme arising in anisotropic elliptic problems

Efficient and robust iterative solvers for strong anisotropic elliptic e...

A Multilevel Block Preconditioner for the HDG Trace System Applied to Incompressible Resistive MHD

We present a scalable block preconditioning strategy for the trace syste...

Efficient solution of 3D elasticity problems with smoothed aggregation algebraic multigrid and block arithmetics

Efficient solution of 3D elasticity problems is an important part of man...

Robust block preconditioners for poroelasticity

In this paper we study the linear systems arising from discretized poroe...

Convergence of the PCTL algorithm for solving the discretized 3T energy equations in RHD problems

For solving the three-temperature linear system arising from the radiati...

1 Introduction

The multi-group radiation diffusion (MGD) equations have a broad range applications, including inertial confinement fusion (ICF) and astrophysics m-04 . As a result of the complicated nonlinear couplings among dozens of physical quantities at multiple temporal and spatial scales, MGD equations are often discretized by the finite volume method allowing for local conservations h-03 ; n-01 ; h-04 ; g-01 ; p-03 ; s-02 ; x-05 , resulting in a series of nonsymmetric but positive definite large-scale sparse linear systems. The overall complexity increases not only with mesh sizes but also with the level of couplings between physical quantities. It must be emphasized that these numerical solutions play a time-consuming role (about 80% in general) in ICF numerical simulations, due to the fact that the coefficient matrices are invariably ill-conditioned.

To effectively address the aforesaid bottlenecks, numerous preconditioned Krylov subspace methods have been proposed in an efficient and scalable manner over the past decades, see b-01 ; m-01 ; x-03 ; y-01 ; y-02 ; x-01 ; x-02 ; h-02 ; y-03 and references cited therein. These preconditioners are conceived as approximate inverses and they fall into the category of incomplete LU factorizations, domain decomposition preconditioners, monolithic algebraic multigrid (AMG) methods and their symmetric / nonsymmetric combinations. Since each of those coefficient matrices has an underlying block structure, one can also determine a block preconditioner to separate the global problem into easier-to-solve subproblems and form an object-oriented framework to allow for code-reuse and incorporate single-physics experience into multi-physics simulations. The objective is to approximately invert numerous individual scalar systems instead of the fully coupled systems. Preconditioners of this type had been proposed and analyzed in the literature, including the block diagonal preconditioner d-01 ; l-01 ; z-03 , block lower / upper triangular preconditioner b-04 ; b-02 ; c-01 , product (splitting) preconditioner r-02 ; w-01 ; z-02 and constraint preconditioner b-06 ; k-01 ; d-02 . Block preconditioners with multigrid components had proven very successful in a variety of applications, e.g., liquid crystal directors modeling b-07 , multiphase flow in porous media b-03 , Stokes problem c-02 , incompressible Navier-Stokes problem e-01 , second-order Agmon-Douglis-Nirenberg elliptic systems l-02 , magnetohydrodynamics model m-02 , Dirichlet biharmonic problem p-02 , electrical activity in the heart s-01 , Brinkman problem v-01 , all-speed melt pool flow physics w-03 and fully coupled flow and geomechanics w-02 . Our focus in this work is on the block preconditioning based on the classical AMG method owing to its general applicability, high efficiency and easy-to-use user interface.

In the recent work a-01 , four types of operator-based preconditioners have been developed in the Jacobian-free Newton-Krylov method for solving two-dimensional three-temperature energy equations. These preconditioners are application specific, relying on physical properties of the energy equations as well as different linearizations on different terms in the nonlinear residual. They are demonstrated numerically to be very effective. However, the corresponding preconditioning matrix has to be assembled explicitly in each time step. In this study, we restrict our consideration to a variety of purely algebraic block preconditioners, which are based only on information available from the original block system, without constructing the preconditioning matrix.

The arrangement of this work proceeds as follows. Section 2 provides the mathematical formulation for MGD equations and the fully coupled and implicitly cell-centered finite volume discretization. Section 3 describes several AMG block-based preconditioning strategies, including an adaptive preconditioning strategy and four approximations on matrix inverse to improve performance. In Section 4, we compare the performance of these preconditioners for representative MGD linear systems from capsule implosion simulations, on both sequential and parallel computers. Finally, we discuss conclusions in Section 5.

2 Problem formulation and discretization

This article is concerned with the time-dependent MGD equations in a certain symmetric geometry p-01



  • , , , respectively denote the number of energy groups, velocity of light, density of the medium and energy transfer coefficient between electron and ion;

  • , , respectively denote the -th spectral radiation and electron scattering energy densities and highly nonlinear radiation diffusion coefficient;

  • , , respectively denote the -th radiation source, scattering and absorption coefficients of the Planck-averaged electron energy;

  • , , respectively denote the specific heat capacity, temperature and nonlinear thermal-conductivity coefficient of electron ( = ) or ion ( = ).

Various discretization schemes (see q-01 ) can be applied to reduce the continuous differential equations to finite dimensional sparse linear systems. Utilizing the (adaptive) backward Euler for the temporal discretization, followed by the frozen-in coefficients for the linearization, and then an appropriate cell-centered finite volume for the spatial discretization, we obtain a series of sparse block structured linear systems from (2.1) by grouping together the unknowns corresponding to the same physical quantity:


in which and for group index , causing the coefficient matrix is generally positive definite but necessarily nonsymmetric. For these linear systems, we use the restarted generalized minimal residual (GMRES()) method, where is the number of Krylov directions to orthogonalize against. Furthermore, preconditioning the GMRES() solver is essential for rapid convergence.

It is worth noting that the diagonal blocks () have the same nonzero structure of a discrete purely elliptic problem, while the coupling terms () are nonsingular diagonal matrices. Typically, algebraic characteristics of submatrices () are much better than those of the matrix . This is the fundamental motivation for devising block preconditioners to solve (2.2).

3 Preconditioning strategies

This section is devoted to four preconditioning strategies: the monolithic classical AMG, physical-variable based coarsening two-level (PCTL) and two types of block Schur complement preconditioners. Furthermore, two further improvements are introduced in Sections 3.4 and 3.5.

3.1 Monolithic classical AMG preconditioner

Nowadays classical AMG developed in r-01 is quite mature and one of the most popular preconditioners in real applications, since its virtue of scalability and applicability on complicated domains and unstructured grids. It has Setup and Solve phases. The former phase builds all the ingredients required by a hierarchy of grids, the finest to the coarsest, under the assumption that no information on the underlying geometry, grids and continuous operators is available, while the latter phase performs V-cycle, W-cycle or F-cycle. Regarding more detailed information and challenges, we refer to the review articles s-04 ; x-07 ; x-06

and related references therein. It should be noted, however, that different coarsening schemes, interpolation procedures and relaxation methods result in different operator complexities, Setup time, cycle time and convergence rates. We apply BoomerAMG

h-01 , a parallel implementation of classical AMG in the HYPRE package, on the fully-coupled monolithic linear system as a black-box preconditioner, empirically with a strength threshold 0.25, a single V-cycle with Falgout (a hybrid RS / CLJP) h-01

coarsening for test runs, an aggressive coarsening process on the finest level, one presmoothing and postsmoothing sweep performed by the hybrid Gauss-Seidel in symmetric ordering (i.e., down cycle with relaxations swept first all coarse points and then all fine points, and up cycle using the opposite traversal), classical modified interpolation, at most 100 degrees of freedom (DoFs) at the coarsest level which is solved via Gaussian elimination, and other parameters chosen from the default configuration.

3.2 PCTL preconditioner

The PCTL algorithm was first proposed by Xu and his co-authors for 3-temperature () linear systems x-03 . The four-level multigrid reduction preconditioner proposed recently by Bui, Wang and Osei-Kuffuor b-08 can be viewed as a variation of PCTL. The generalization of PCTL used to precondition (2.2) is straightforward. Various components have to be chosen. Taking the special structure of the coefficient matrix into account, we set the electron temperature variables to be coarse points, and the others as fine points. The relaxation routine used here is C / F block relaxation sweeps, i.e., variables associated with coarse points are relaxed followed by the other unknowns. Denote by

the interpolation matrix to match up with the aforementioned coarse-grid selection, where submatrices () and . However, they are often significantly denser, even though

are all diagonal matrices. From the implementation point of view, they are restricted to be diagonal and give an exact approximation only for constant functions. That is, for vector

, the resulting vectors containing all the diagonal entries of blocks (), denoted by , can be respectively computed as follows:

The restriction is defined as and the coarse-grid operator is taken as


Noting that () are all diagonal, it can be deduced that and have the same nonzero structure. The preconditioning phase of the PCTL algorithm takes the following form:

  1. (pre-smoothing step) Run block relaxation once to subsystems:

  2. Solve the coarse-grid system: ;

  3. (coarse-grid correction) Set .

where is an incoming arbitrary Krylov vector and is the outgoing Krylov solution.

3.3 Two types of block Schur complement preconditioners

Noting that the coefficient matrix can be rewritten in a factored form


where the matrix and

is the identity matrix with the same size of

. At this point, it is important to emphasize that the factorization (3.2) breaks away from the original system. Then a similar factorization

is processed to decouple the remaining in two separate systems, where

The error associated with the global factorization would be zero if is inverted exactly and is determined completely. However, it would be prohibitively expensive to explicitly form the full block matrix , we instead implement with for , ignoring all off-diagonal elements of . In this situation, we obtain the completely decoupled shape, leading to the following specific implementation procedure:

  1. (the intermediate ion segment) ;

  2. (the intermediate electron segment) ;

  3. , ;

  4. (the corrected electron segment) ;

  5. (the corrected ion segment) .

This preconditioning strategy is denoted as Schur1, whose error can be formulated as follows:


denotes the zero matrix of suitable size.

The last preconditioning strategy, denoted by Schur2, is motivated by reducing the block system into a and block systems. It is of approximate block factorization type. Assembling the block decompositions


we obtain the full Schur complement preconditioning matrix of the form


where and are the Schur complement matrices, respectively defined by

The first term in (3.3) represents the “elimination” stage, while the second factor in (3.3) executes the “substitution” stage. Schur2 can be viewed as an extension to the matrix of the primitive variable block Schur complement preconditioner proposed recently by Weston et al. w-03 . In the practical implementation, similar to Schur1, we take with . In this setting, the preconditioning operation proceeds as follows:

  1. (the intermediate electron segment) ;

  2. , ;  ;

  3. (the corrected electron segment) .

Observe that the action of Schur2 contains (), once and twice. The error associated with this preconditioner can be written as

By comparing with the block matrix , it is not difficult to see that there are more nonzero data items in . A consequence of this is that, Schur1 may be able to perform better than Schur2 in terms of the convergence behavior.

3.4 Fast approximations of the matrix inverse

It is customary to take advantage of the diagonal dominance property in order to efficiently approximate the action of each matrix inverse of the preceding block preconditioners. We start with a definition.

Definition 1

For a given threshold , define the weak diagonally dominant factor for the diagonal blocks ():

Remark 1

Besides is a strictly increasing function, the larger the factor , the weaker diagonally dominant the matrix becomes.

It is clear by Definition 1 that can be regarded as a diagonal matrix if the indicator holds. is taken to be in our experiments. We implement two options for approximating the operation :

  1. When , this operation is performed by a fixed number of Jacobi iterations;

  2. Otherwise, a fixed number of AMG V-cycles is used.

It is worthwhile to point out that the interpolation operator appeared in PCTL preconditioner should be constructed by another option:

  1. The matrix inverse is solved by a number of AMG V-cycles to some prescribed accuracy.

Also, notice that the inverse of still consists in the Schur complement matrices and . This can be avoided by a diagonal approximation. In this fashion, and are replaced by

where denotes the diagonal matrix whose diagonal entries are those of . Therefore, () and can be formed explicitly and approximated by option (#3). Approximations on and (), denoted by and , are carried out in exactly the same manner. Alternatively, another more accurate approach can be used to eliminate the quite slow convergence when options (#1)-(#3) break down: the behavior of can be approximated in an iterative fashion, that is,

  1. Taking the zero vector as the initial guess, the approximate solution at the -th step is given by

    where () is iteratively approximated by option (#1), (#2) or (#3).

We see that option (#4) would be much more expensive. In addition, we can treat the remaining matrix inverses , and () in the same fashion as above.

3.5 An adaptive block preconditioning strategy

The coupling relationship can be used as the clear distinction to automatically choose a preconditioner according to each different system from nonlinear iterations at each integration time-step. Although the change from one linear system to the next may be relatively small, the accumulative change after many time-steps would be significant. An indicator is introduced in Definition 2 to measure the coupling strength for each coupling term.

Definition 2

For a given threshold , define the weak coupling factor for the coupling term () relative to :

Remark 2

The larger the factor , the weaker coupling between the -th and -th physical quantities appears. Furthermore, is a strictly increasing function.

With this definition, it is possible to assume that can be neglected without a loss in robustness, provided that is larger than a switch criterion . The smaller is chosen, this is considered to be more aggressive. In particular, if is satisfied for some physical-variable , then it can be extracted from the preconditioning matrix, i.e., the -th row and column must be removed. In this regard, the -th approximate solution is computed independently, and the remaining unknowns can be treated in the same way as in the previous scenario. In our computations, and are set to and , respectively. It should be mentioned that this adaptive block preconditioner strategy can be imposed on the block preconditioners described in Sections 3.2-3.3.

Remark 3

Obviously, the block lower (upper) triangular preconditioner is equivalent to the particular case when and ( and ) for , while the block diagonal preconditioner corresponds to the simplest case that and for .

Remark 4

It must be emphasized that it is necessary to perform the coarse-grid correction step in PCTL preconditioner except when the conditions () are all satisfied.

3.6 A summary of block preconditioning strategies

In summary, the block Schur complement preconditioners utilize legacy algorithms to deal with the reduced scalar subsystems in an operator-splitting fashion, whereas the PCTL preconditioner is derived from the inherence that there is no coupling between the radiation and ion temperature variables, however, they are separately coupled with the electron temperature variables. Here, one should be aware that they are based on the divide-and-conquer programming paradigm, namely, each task is first split into more tractable subtasks, and then their results are gathered and coupled.

Five types of operations that are performed in Setup and Preconditioning phases of these block preconditioners are compared in Table 1 with respect to a certain subsystem. It is important to note that all the coupling terms () are diagonal. Consequently, all occurred matrix-vector multiplications reduce to the so-called Hadamard products of two or three vectors except the one in the second step of PCTL preconditioner, and the matrix update confined within diagonal entries is mathematically equivalent to one vector update. Note that the additional matrix inverses as well as matrix updates are required in Setup phase of PCTL. As can be seen, compared with Schur2, there are

more matrix-vector multiplications in PCTL, as well as one more matrix inverse in Schur1. Evidently, Schur2 is often preferred over PCTL and Schur1 when they exhibit similar inner and outer convergence behaviors. It was mentioned above that Schur1 would probably be more accurate, giving rise to a faster convergence and also better efficiency than Schur2 in many cases.

Setup phase Preconditioning phase
matrix inverse matrix update matrix inverse matrix-vector multiplication Hadamard product of vectors vector update
Schur1 0 0 0
Schur2 0 0 0
Table 1: Operation comparisons in Setup and Preconditioning phases of PCTL, Schur1 and Schur2.

Each adaptive block preconditioned GMRES() solver consists of two levels. The adaptive block preconditioning strategy described in Section 3.5 is considered at the first level. Its purpose is to find an appropriate preconditioner with less computation time for the problems relatively easier to solve. The second level bifurcates every matrix inverse appeared in the preceding block preconditioners into the fast or accurate approximation proposed in Section 3.4.

3.7 Implementation details

The three block preconditioners described in Section 3.2-3.3 are implemented on the basis of the HYPRE package using C and message passing interface (MPI) with data located in distributed memory. The parallel executable code decomposes the global communicator, illustrated in Figure 1, such that each subsystem owns a group holding an ordered collection of processor identifiers for the purpose of running parallelized modules to solve large problems.

Figure 1: The communicator splitting and processor partitioning, where is the number of processors.

An effective parallel data decomposition plays a key role in achieving good performance. Each subsystem should be partitioned in such a way that each part is of about equal size to avoid load balance issues. In our parallel implementation, all diagonal blocks are stored using the ParCSR matrix data structure within the associated communication group, while all coupling terms are represented by the ParVector data structure of global length . More detailed information about ParCSR and ParVector can be found in Falgout et al. f-01 . Each matrix inverse is accomplished by BoomerAMG h-01 with one of options depicted in Section 3.4, where we use V(1,1)-cycles with hybrid Gauss-Seidel relaxation in symmetric ordering, coarse-grid matrices formed algebraically by Galerkin process, and algebraic interface-based coarsening recently presented by Xu and Mo in x-02 to gain a good balance between convergence and complexity.

It is apparent that Schur1 and Schur2 are far easier to implement than PCTL because of the construction and numerical solution of the coarse-grid system, involving much more complicated global communication between different processors y-04 , e.g., all diagonal blocks () and coupling terms () should be distributed locally (data sets used in setup and preconditioning phases) and globally (data sets used to generate the globally distributed matrix by formula (3.1) in setup phase). However, their distributed local data sets are enough for Schur1 and Schur2, whose parallel flow diagrams and MPI communication patterns are depicted in Figure 2. The numerical solution of a certain subsystem at step 1, 2, 3 and 4 is followed by the data exchange process labeled (a), (b), (c) and (d), respectively. Another communication group is to be established in setup phase so that processors with the same local identifier in the partition (shown in Figure 1) are grouped together for the purpose of making exchanging messages (such as (b) and (c) in Schur1) easier and more efficient to implement. At this point, the two MPI functions MPI_Bcast and MPI_Gather can be utilized to spread elements from the root processor (belonging to comm_E) to the others and take elements from other processors and gather them to the root processor, respectively.

Figure 2: The parallel flow diagram and MPI communication pattern of Schur1 (left) and Schur2 (right).

4 Numerical results

In this section, we provide experimental results intended to study the effectiveness and parallel scalability of the block preconditioners against BoomerAMG h-01 implemented in the parallel C library HYPRE version 2.15.1. These results are obtained on the Tianhe-1 supercomputer y-05 , which is a multi-array, configurable and cooperative parallel system with a theoretical peak speed of 1.372 petaflops, composed of high performance general-purpose microprocessors and a high-speed Infiniband network. Our code is compiled with mpich-3.1.3 using the icc compiler version 11.1.059 and -O2 optimization level. The convergence of the preconditioned GMRES() was halted when the Euclidean norm of the current relative residual was smaller than . Unless otherwise stated, the tolerances for stopping are chosen to be for performance consideration in approximating , , and () in Schur complement preconditioners, as well as the construction of the interpolation and block relaxations in the PCTL preconditioner.

The following notations are given in tables below to illustrate results: denotes the number of preconditioned GMRES() iterations; represents the CPU time measured in seconds to solve the linear system by the preconditioned GMRES() solver; PCTL, Schur1 and Schur2 stand for the adaptive PCTL, Schur1 and Schur2 variants, respectively; indicates the number of used processors.

4.1 Numerical experiments on one processor

Results in this subsection have been structured in two distinct parts. They are based on two suites of representative MGD linear systems from two-dimensional capsule implosion simulations: the first suite includes 4 three-temperature (i.e., only one-group) linear systems, respectively denoted by -; the other suite consists of 37 three-temperature and 20 twenty-group linear systems.

The performance of seven preconditioned GMRES(30) solvers is investigated for problems - on two different grids, as shown in Table 2. We can observe from this table that (i) AMG preconditioned GMRES(30) doesn’t converge robustly enough, and achieves convergence in an excessive number of iterations (shown by ); (ii) PCTL, Schur1 and Schur2 exhibit a better convergence behavior, and, as expected, Schur1 is numerically more robust than the other two preconditioners; (iii) Schur1 and Schur2 require less wall time than that of PCTL; (iv) The adaptive variants are much more efficient, although there are additional arithmetic operations to determine an appropriate and inexpensive preconditioner.

AMG PCTL PCTL Schur1 Schur1 Schur2 Schur2 AMG PCTL PCTL Schur1 Schur1 Schur2 Schur2
9 3.6 3 1.7 2 1.5 3 1.4 3 1.4 4 1.6 4 1.6 9 14.8 3 7.6 2 6.4 3 7.1 3 6.9 5 7.7 5 7.8
80 31.2 12 6.1 10 5.3 4 3.1 5 2.9 4 3.0 5 2.9 100 164.2 17 270.3 14 28.7 4 13.5 6 14.0 4 11.8 6 13.0
29 12.1 11 5.4 6 3.8 4 3.1 5 2.9 6 3.4 6 3.2 32 56.4 14 28.2 7 17.4 4 14.3 6 13.6 8 18.0 8 16.2
11 4.8 3 11.6 6 3.9 3 9.1 6 3.9 3 9.3 6 3.9 11 20.6 3 24.9 6 16.3 3 18.5 6 16.3 3 19.5 6 16.3
Table 2: Number of iterations and wall time of seven preconditioned GMRES(30) solvers.

An important observation to make in Table 2 is why the computational cost of PCTL becomes unacceptable for on grid, despite its reasonable iteration counts. The focus is on the inner iteration comparisons among PCTL, PCTL, Schur1, Schur1, Schur2 and Schur2. The results are illustrated in Table 3, where (i) entries with the superscript indicate that option (#1) is used and option (#2) otherwise, here we consider just a single Jacobi iteration or AMG V-cycle to approximate matrix inverse; (ii) entries with the superscript inform that the block diagonal preconditioner with option (#1) or (#2) is utilized to improve performance. Since the simulation is much more sensitive to the degree of accuracy in radiation temperatures, three iterations are used to solve approximately for them, while one iteration is applied to subsystems associated with electron and ion temperatures. The primary reason why PCTL is rather expensive is that a total number of 623 AMG V-cycles are required to solve all of the subsystems associated with all radiation temperatures. This is remedied in PCTL by using option (#1) or (#2) instead of option (#3), however, with many more coarse-grid correction steps (from 9 to 15). Fortunately, there is a decrease in the number of outer iterations (from 17 to 14), resulting in a much faster preconditioner.

PCTL PCTL Schur1 Schur1 Schur2 Schur2
0 1 4 3 3 4 1 1 3 3 6 3 7 6 3 14 3 5 4
2 1 10 623 9 9 6 1 15 15 15 10 12 9 14 7 9 10 5 13 7
3 1 5 19 5 11 6 1 8 8 8 10 14 10 14 7 17 14 9 17 9
7 1 6 27 4 1 0 0 0 5 25 5 7 27 4
Table 3: Inner iteration comparisons of six block preconditioners for the grid.

Next we examine the iteration counts and wall time as functions of the second suite of representative linear systems. A plot of this data appears in Figure 3 with problem sizes roughly 2.3 and 16.9 million DoFs, respectively. It can be easily seen that PCTL, Schur1, Schur2 and their adaptive variants converge much faster than AMG, however, with 3 twenty-group exceptions which could not be solved by PCTL, Schur1 and Schur2. The reason for this is that the tolerance for stopping in option (#3) is not small enough. However, the smaller the tolerance, the higher their total costs. It should be emphasized that, although they may demonstrate a slightly slower convergence, PCTL, Schur1 and Schur2 are less expensive to use, because the actual convergence behavior is produced by the block diagonal preconditioner. From the viewpoint of the total wall time, Schur1 and Schur2 have comparable computational cost, and they are both observed to be the most computationally efficient option, resulting in a 13.7% (three-temperature) and 41.6% (twenty-group) reduction compared with PCTL, as well as 5.5 (three-temperature) and 8.5 (twenty-group) times faster than AMG.

Figure 3: The iteration counts and wall time as functions of representative three-temperature (top) and twenty-group (bottom) linear systems.

4.2 Parallel experiments

Run times, strong and weak parallel scaling properties are both of particular interest in this subsection. Since our previous performance evaluations showed that PCTL, Schur1 and Schur2 leads generally to better convergence, robustness and efficiency, we only coded their parallel implementations while PCTL, Schur1 and Schur2 are not considered here. Timings are measured by the MPI function MPI_Wtime in seconds.

We consider 6 twenty-group linear systems, respectively denoted by -, from the previous subsection and examine the strong parallel scalabilities of AMG, PCTL, Schur1 and Schur2 preconditioned GMRES(30) solvers on a grid (see Table 4). The simulation is run on 22 to 352 cores, each time doubling the number of used cores. The problem size on each processor is 48,000 DoFs for 352 cores. The results in Table 4 show that these four preconditioners exhibit good convergence (in terms of number of iterations) and numerical (regarding wall time) scalabilities in the strong sense. We note that all of PCTL, Schur1 and Schur2 preconditioned GMRES(30) solvers converge in at most 3 steps robustly with respect to the number of processors and problem character, and run averagely 3.4, 4.1 and 5.5 times faster than AMG on 352 cores.

19 22.7 18 14.7 17 9.4 19 5.8 16 3.0 2 6.8 2 4.3 2 2.5 2 1.7 2 1.1
18 22.0 18 15.0 18 9.9 18 5.6 17 3.1 2 6.7 2 4.3 2 2.5 2 1.8 2 1.2
18 21.9 17 14.5 19 10.2 19 5.8 19 3.3 2 6.8 2 4.2 2 2.5 2 1.7 2 1.1
34 55.6 32 23.1 36 16.5 32 8.2 34 4.5 2 6.9 2 4.2 2 2.6 2 1.6 2 1.1
26 53.7 24 22.5 24 13.7 27 7.4 30 4.3 2 7.0 2 4.3 2 2.6 2 1.6 2 1.1
27 49.8 29 31.3 33 17.8 29 7.6 32 4.4 3 8.5 3 5.2 2 2.7 3 2.1 2 1.1
Schur1 Schur2
2 6.8 2 4.0 2 2.5 2 1.6 2 0.9 2 6.1 2 3.8 2 2.1 2 1.3 2 0.7
2 6.7 2 4.1 2 2.5 2 1.5 2 0.9 2 6.1 2 3.9 2 2.1 2 1.2 2 0.7
2 6.8 2 4.1 2 2.4 2 1.4 2 0.9 2 6.2 2 3.8 2 2.3 2 1.1 2 0.6
2 6.8 2 4.2 2 2.5 2 1.5 2 0.9 2 6.2 2 3.9 2 2.2 2 1.3 2 0.7
2 6.8 2 4.1 2 2.5 3 1.9 2 1.0 2 6.2 2 4.0 2 2.3 2 1.2 2 0.7
3 8.2 3 5.0 2 2.5 3 1.9 2 0.9 3 7.4 3 4.7 2 2.2 3 1.6 2 0.7
Table 4: Number of iterations and wall time of four preconditioned GMRES(30) solvers on a grid in a strong scaling study.

The subsequent experiments are run for a weak parallel scaling study to analyze the convergence and numerical scalabilities of GMRES(30) solvers preconditioned by AMG, PCTL, Schur1 and Schur2. We start with a mesh size of and then refine the mesh in all directions up to , so the largest problem has about 16.9 million unknowns. We run these problems using 22, 88 and 352 cores (with 48,000 DoFs per processor), respectively. As shown in Table 5, all of the preconditioners tested here weakly scale well in the numerical sense. PCTL, Schur1 and Schur2 preconditioned GMRES(30) solvers converge robustly with respect to the discrete problem size, while the number of iterations of AMG preconditioned GMRES(30) solver for varies from 16 on the grid to 30 on the grid to achieve convergence. Regarding the parallel efficiency calculated by s-05 , where is the wall time using processors, the average parallel efficiency of AMG, PCTL, Schur1 and Schur2 preconditioned GMRES(30) solvers is 72.2%, 79.3%, 74.7% and 75.1%, respectively. Furthermore, in terms of wall time, Schur2 has obvious advantages over AMG, PCTL and Schur1.

AMG PCTL Schur1 Schur2
15 1.33 15 2.29 16 3.01 2 0.62 2 0.87 2 1.08 2 0.48 2 0.67 2 0.91 2 0.35 2 0.50 2 0.71
14 1.25 17 2.49 17 3.13 2 0.58 2 0.85 2 1.15 2 0.46 2 0.65 2 0.93 2 0.36 2 0.49 2 0.69
14 1.24 18 2.63 19 3.26 2 0.58 2 0.84 2 1.11 2 0.47 2 0.65 2 0.91 2 0.35 2 0.48 2 0.64
28 2.05 30 3.55 34 4.52 2 0.59 2 0.88 2 1.12 2 0.46 2 0.64 2 0.89 2 0.35 2 0.46 2 0.67
16 1.47 18 2.61 30 4.31 2 0.58 3 0.98 2 1.11 2 0.45 3 0.76 2 0.95 2 0.34 3 0.58 2 0.70
21 1.68 21 2.78 32 4.40 2 0.59 2 0.86 2 1.09 2 0.46 3 0.76 2 0.94 2 0.34 3 0.59 2 0.72
Table 5: Number of iterations and wall time of four preconditioned GMRES(30) solvers in a weak scaling study.

5 Conclusions

Three types of AMG block-based preconditioning techniques and two improvements have been introduced to solve the linear systems resulting from MGD equations. They are easy to set up, due to the fact that they only involve operations on blocks that are readily extracted from the monolithic linear system. Our experiments show that they all scale well both algorithmically and in parallel, and exhibit overall better convergence behavior and computational efficiency than the monolithic AMG preconditioner.


This work is financially supported by Science Challenge Project (TZ2016002), National Natural Science Foundation of China (11601462, 11971414), Project of Scientific Research Fund of Hunan Provincial Science and Technology Department (2018WK4006), Hunan Provincial Civil-Military Integration Industrial Development Project “Adaptive Multilevel Solver and Its Application in ICF Numerical Simulation” and Hunan Provincial Natural Science Foundation of China (2018JJ3494). The numerical calculations in this paper have been done on the supercomputing system of the National Supercomputing Center in Changsha.



  • (1) H. B. An, Z. Y. Mo, X. W. Xu, X. W. Jia, Operator-based preconditioning for the 2-D 3-T energy equations in radiation hydrodynamics simulations, J. Comput. Phys. 385 (2019) 51–74.
  • (2) Z. Z. Bai, G. H. Golub, L. Z. Lu, J. F. Yin,

    Block triangular and skew-Hermitian splitting methods for positive-definite linear systems

    , SIAM J. Sci. Comput. 26 (2005) 844–863.
  • (3) Z. Z. Bai, M. K. Ng, Z. Q. Wang, Constraint preconditioners for symmetric indefinite matrices, SIAM J. Matrix Anal. Appl. 31 (2009) 410–433.
  • (4) C. Baldwin, P. N. Brown, R. Falgout, F. Graziani, J. Jones, Iterative linear solvers in 2D radiation-hydrodynamics code: Methods and performance, J. Comput. Phys. 154 (1999) 1–40.
  • (5) F. P. A. Beik, M. Benzi, Block preconditioners for saddle point systems arising from liquid crystal directors modeling, Calcolo 55 (2018) 29.
  • (6) M. Benzi, B. Uçar,

    Block triangular preconditioners for M-matrices and Markov chains

    , Electron. Trans. Numer. Anal. 26 (2007) 209–227.
  • (7) Q. M. Bui, H. C. Elman, J. D. Moulton, Algebraic multigrid preconditioners for multiphase flow in porous media, SIAM J. Sci. Comput. 39 (2017) S662–S680.
  • (8) Q. M. Bui, L. Wang, D. Osei-Kuffuor,

    Algebraic multigrid preconditioners for two-phase flow in porous media with phase transitions

    , Adv. Water Resour. 114 (2018) 19–28.
  • (9) Z. H. Cao, Positive stable block triangular preconditioners for symmetric saddle point problems, Appl. Numer. Math. 57 (2007) 899–910.
  • (10) A. M. A. Côrtes, L. Dalcin, A. F. Sarmiento, N. Collier, V. M. Calo, A scalable block-preconditioning strategy for divergence-conforming B-spline discretizations of the Stokes problem, Comput. Methods Appl. Mech. Engrg. 316 (2017) 839–858.
  • (11) E. de Sturler, J. Liesen, Block-diagonal and constraint preconditioners for nonsymmetric indefinite linear systems. Part I: Theory, SIAM J. Sci. Comput. 26 (2005) 1598–1619.
  • (12) H. S. Dollar, Constraint-style preconditioners for regularized saddle-point problems, SIAM J. Matrix Anal. Appl. 29 (2007) 672–684.
  • (13) H. Elman, V. E. Howle, J. Shadid, R. Shuttleworth, R. Tuminaro, A taxonomy and comparison of parallel block multi-level preconditioners for the incompressible Navier-Stokes equations, J. Comput. Phys. 227 (2008) 1790–1808.
  • (14) R. D. Falgout, J. E. Jones, U. M. Yang, The design and implementation of hypre, a library of parallel high performance preconditioners, Lect. Notes Comput. Sci. Eng. 51 (2006) 267–294.
  • (15) Y. N. Gao, X. K. Zhao, Y. H. Li, Finite volume element methods for two-dimensional three-temperature radiation diffusion equations, Numer. Math. Theor. Meth. Appl. 9 (2016) 470–496.
  • (16) A. Hasečić, S. Muzaferija, I. Demirdžić, Finite volume method for radiative transport in multiphase flows with free surfaces, Numer. Heat Trans. A 70 (2016) 347–365.
  • (17) P. Hassanzadeh, G. D. Raithby, Finite-volume solution of the second-order radiative transfer equation: Accuracy and solution cost, Numer. Heat Trans. B 53 (2008) 374–382.
  • (18) V. E. Henson, U. M. Yang, BoomerAMG: A parallel algebraic multigrid solver and preconditioner, Appl. Numer. Math. 41 (2002) 155–177.
  • (19) Q. Y. Hu, L. Zhao, Domain decomposition preconditioners for the system generated by discontinuous Galerkin discretization of 2D-3T heat conduction equations, Commun. Comput. Phys. 22 (2017) 1069–1100.
  • (20) C. Keller, N. I. M. Gould, A. J. Wathen, Constraint preconditioning for indefinite linear systems, SIAM J. Matrix Anal. Appl. 21 (2000) 1300–1317.
  • (21) B. Lee, Multigrid for second-order ADN elliptic systems, Numer. Linear Algebra Appl. 26 (2019) e2256.
  • (22) J. J. Lee, K. A. Mardal, R. Winther, Parameter-robust discretization and preconditioning of Biot’s consolidation model, SIAM J. Sci. Comput. 39 (2017) A1–A24.
  • (23) Y. C. Ma, K. B. Hu, X. Z. Hu, J. Xu, Robust preconditioners for incompressible MHD models, J. Comput. Phys. 316 (2016) 721–746.
  • (24) D. Mihalas, B. Weibel-Mihalas, Foundations of radiation hydrodynamics, Dover, 1999.
  • (25) Z. Y. Mo, L. J. Shen, G. Wittum, Parallel adaptive multigrid algorithm for 2-D 3-T diffusion equations, Int. J. Comput. Math. 81 (2004) 361–374.
  • (26) C. Y. Nie, S. Shu, H. Y. Yu, J. Wu, Superconvergence and asymptotic expansions for bilinear finite volume element approximations, Numer. Math. Theor. Meth. Appl. 6 (2013) 408–423.
  • (27) W. B. Pei, The construction of simulation algorithms for laser fusion, Commun. Comput. Phys. 2 (2007) 255–270.
  • (28) G. Peng, Z. M. Gao, W. J. Yan, X. L. Feng, A positivity-preserving finite volume scheme for three-temperature radiation diffusion equations, Appl. Numer. Math. (2020), doi:
  • (29) J. Pestana, R. Muddle, M. Heil, F. Tisseur, M. Mihajlović, Efficient block preconditioning for a finite element discretization of the Dirichlet biharmonic problem, SIAM J. Sci. Comput. 38 (2016) A325–A345.
  • (30) A. Quarteroni, A. Valli,

    Numerical approximation of partial differential equations

    , Springer-Verlag, 1994.
  • (31) Y. H. Ran, J. G. Wang, D. L. Wang, On preconditioners based on HSS for the space fractional CNLS equations, East Asian J. Appl. Math. 7 (2017) 70–81.
  • (32) J. W. Ruge, K. Stüben, Algebraic multigrid, in Multigrid Methods, Front. Appl. Math. 3 (1987) 73–130.
  • (33) S. Sahni, V. Thanvantri, Performance metrices: Keeping the focus on runtime, IEEE Parall. Distrib. 4 (1996) 43–56.
  • (34) Z. Q. Sheng, J. Y. Yue, G. W. Yuan, Monotone finite volume schemes of nonequilibrium radiation diffusion equations on distorted meshes, SIAM J. Sci. Comput. 31 (2009) 2915–2934.
  • (35) K. Stüben, A review of algebraic multigrid, J. Comput. Appl. Math. 128 (2001) 281–309.
  • (36) J. Sundnes, G. T. Lines, K. A. Mardal, A. Tveito, Multigrid block preconditioning for a coupled system of partial differential equations modeling the electrical activity in the heart, Comput. Method Biomec. 5 (2002) 397–409.
  • (37) P. S. Vassilevski, U. Villa, A block-diagonal algebraic multigrid preconditioner for the Brinkman problem, SIAM J. Sci. Comput. 35 (2013) S3–S17.
  • (38) C. Wang, T. Z. Huang, C. Wen, A new preconditioner for indefinite and asymmetric matrices, Appl. Math. Comput. 219 (2013) 11036–11043.
  • (39) B. Weston, R. Nourgaliev, J. P. Delplanque, A. T. Barker, Preconditioning a Newton-Krylov solver for all-speed melt pool flow physics, J. Comput. Phys. 397 (2019) 108847.
  • (40) J. A. White, R. I. Borja, Block-preconditioned Newton-Krylov solvers for fully coupled flow and geomechanics, Comput. Geosci. 15 (2011) 647–659.
  • (41) H. Xie, X. Xu, C. L. Zhai, H. Yong, A positivity-preserving finite volume scheme for heat conduction equation on generalized polyhedral meshes, Commun. Comput. Phys. 24 (2018) 1375–1408.
  • (42) J. Xu, L. Zikatanov, Algebraic multigrid methods, Acta Numer. 26 (2017) 591–721.
  • (43) X. W. Xu, Parallel algebraic multigrid methods: State-of-the art and challenges for extreme-scale applications, J. Num. Method Comp. Appl. 40 (2019) 243–260.
  • (44) X. W. Xu, Z. Y. Mo, Algebraic interface-based coarsening AMG preconditioner for multiscale sparse matrices with applications to radiation hydrodynamics computation, Numer. Linear Algebra Appl. 24 (2017) e2078.
  • (45) X. W. Xu, Z. Y. Mo, H. B. An, Algebraic two-level iterative method for 2-D 3-T radiation diffusion equations, Chinese J. Comput. Phys. 26 (2009) 1–8.
  • (46) X. W. Xu, Z. Y. Mo, H. B. An, An adaptive AMG preconditioning strategy for solving large-scale sparse linear systems, Sci. Sin. Inf. 46 (2016) 1411–1420.
  • (47) X. J. Yang, X. K. Liao, W. X. Xu, J. Q. Song, Q. F. Hu, J. S. Su, L. Q. Xiao, K. Lu, Q. Dou, J. P. Jiang, C. Q. Yang, TH-1: China’s first petaflop supercomputer, Front. Comput. Sci. China 4 (2010) 445–455.
  • (48) X. Q. Yue, S. Shu, J. X. Wang, Z. Y. Zhou, Substructuring preconditioners with a simple coarse space for 2-D 3-T radiation diffusion equations, Commun. Comput. Phys. 23 (2018) 540–560.
  • (49) X. Q. Yue, S. Shu, X. W. Xu, Z. Y. Zhou, An adaptive combined preconditioner with applications in radiation diffusion equations, Commun. Comput. Phys. 18 (2015) 1313–1335.
  • (50) X. Q. Yue, X. W. Xu, S. Shu, JASMIN-based two-dimensional adaptive combined preconditioner for radiation diffusion equations in inertial fusion research, East Asian J. Appl. Math. 7 (2017) 495–507.
  • (51) X. Q. Yue, Z. Y. Zhou, X. W. Xu, S. Shu, A parallel adaptive PCTL preconditioner for 3-T radiation diffusion equations, Nat. Sci. J. Xiangtan Univ. 40 (2018) 6–10.
  • (52) Q. F. Zhang, C. J. Zhang, Block preconditioning strategies for nonlinear viscous wave equations, Appl. Math. Model. 37 (2013) 5801–5813.
  • (5