In the simulation of astrophysics and inertial confinement fusion (ICF), it is needed to solve radiation diffusion equations. This is a strongly nonlinear system, and the discretized equations are solved by some nonlinear iterative methods, such as Newton kind of methods, Picard method, and source iteration method for multi-group radiation diffusion equations [2, 13, 17, 25]. In each nonlinear iteration, some linearized equations should be solved. In numerical simulations of ICF, usually time steps are needed to finish the whole simulation, and a series of linear equations should be solved. The solution time for linear equations dominates the real numerical simulation cost [1, 23, 25]. It is important to design an efficient method for solving linear equations in ICF and astrophysics simulations.
Usually, the preconditioned Krylov methods  are used to solve the linear systems in ICF simulations. For a preconditioned Krylov method, there are four basic procedures: the construction and application of a preconditioner, the choice of an initial iterate, the computation of the next iterate and the check for stopping criterion [9, 22]. Each procedure has its own effect on the iterative method. In past years, much research work has been focused on preconditioning. Among them, AMG preconditioner is the most widely used one in the simulation of radiation diffusion equations [2, 17, 23, 25].
For a series of linear systems, several techniques for choosing an efficient initial iterate have been proposed in recent years. For linear systems with the same coefficient matrix and different right-hand sides, a method for constructing an efficient initial iterate was given in . Later, Chan further considered the choice of initial iterate for a series of equations with different coefficient matrices in . Fischer proposed another method, in which an approximate solution based upon previous solutions was used . In , a model reduction method based on proper orthogonal decomposition (POD) was used to guess a better initial iterate. The projection methods were also used in [3, 6, 10, 14]. In [9, 20], a better initial guess was given by the extrapolation of previous solutions. An et al. proposed a kind of method for choosing a nonlinear initial iterate for solving two dimension three-temperature (2D 3T) heat conduction equations in . By solving the nonlinear system on a sub-domain where the temperature varied greatly, the method in  can improve the computational efficiency effectively. In [8, 12, 24], some subsystems are constructed to accelerate the convergence of nonlinear problems.
In the simulation of ICF, with time step advancing, the physical variables, such as density, pressure, temperature, etc., vary greatly in some local domain and mildly in the rest domain. This is reflected in the radiation diffusion equations, where the unknown is the temperature or radiation energy. From the perspective of solving linear systems from one nonlinear iteration to the next iteration, or from one time step to the next, the solution only varies greatly in some local domain while mildly in the rest. In ICF simulations, the initial iterate for solving linear equations is usually set as the previous nonlinear iterate or previous time step solution. Therefore, the solution on some local domain where the solution varies greatly has a strong influence on the convergence of an iterative method, while the solution on the rest domain has little influence on the method.
In this paper, we propose a local character-based method for solving the linear systems arising from the simulation of radiation diffusion equations. In this method, a local domain is first constructed, corresponding to a subset of the global degrees-of-freedom, on which the solution is expected to vary greatly. Then, the subsystem on the local domain is solved. At last, the whole system is solved by using the solution on the local domain. Two methods are given to construct the local domain. Specifically, the contributions of this paper are summarized as follows:
A local character-based method for solving linear systems of radiation diffusion equations is proposed.
Two methods are given to construct the local domain. One is based on the spatial gradient and the other is based on the residual.
Some numerical experiments are conducted to verify the effectiveness of the proposed method. Compared with the AMG preconditioned Krylov method, the local character-based method can reduce the solution cost in most cases, and 40% cost can be saved at most.
The rest of the paper is organized as follows. The detail of the local character-based method is presented in section 2, and also two methods for constructing the local domain are presented in this section. In section 3, some numerical results are given and also the choice of the parameter for the local character-based method is analyzed in this section. Finally, section 4 is the summary and some remarks of the paper.
2 The local character-based method
In this section, the local character-based method is introduced. The basic algorithm is presented in section 2.1. Two methods for constructing the local domain are introduced in the rest. The first one, which is based on the spatial gradient, will be presented in section 2.2. The second one, which is based on the residual, will be presented in section 2.3.
2.1 The local character-based algorithm
Consider a linear system resulting from implicit time propagation of discretized time dependent partial differential equations (PDEs), or the implicit linearization of a nonlinear system of PDEs:
Eq (2.1) is usually obtained from some real applications, such as the application of radiation diffusion problem, etc. For solving Eq (2.1), an iterative method, particularly a Krylov method is often used. Because the linear system is obtained in the nonlinear iteration on some time steps, an effective initial iterate , which is the solution of the last nonlinear iterate or the solution at last time step, is used. In the simulation process of ICF, the temperature and radiation energy only varies greatly in some local domain and varies mildly in the rest domain. Therefore, if an iterative method is used to solve Eq (2.1), only some components of the iterate will change greatly from the initial iterate to the converged solution.
For purpose of clarity, in the following we introduce some notations. Let be the index set of all components (degrees-of-freedom) of the linear system (2.1). The linear system considered in this paper is discretized from a partial differential equations, each component corresponds to one spatial point (for example, a node of the grid) in the computing domain of PDEs. Therefore, a component will also be called a point in the following discussion. can also be regarded as an algebraic “domain”. Assume that, from the initial iterate to the converged iterate, the solution varies greatly at components. For convenience, the index set of these components is denoted as . Each component in is also called a bad point. is called a local domain in the following. Let . Based on the partition
the linear system (2.1) can be partitioned as
Here is a permutation matrix obtained by exchanging the rows and (
) of the identity matrix.
In Eq (1), is expected to change greatly from the initial iterate to the converged solution, while is expected to change mildly from the initial iterate to the converged solution. In the local character-based method, first the subsystem
will be solved, and the solution will be obtained. Then
In Algorithm 1, there are two key ingredients: the first one is the construction of the subset in Line 2; the second one is the solution of the local subsystem in Line 5. The construction of is the base of the algorithm. By , the whole system can be partitioned into two subsystems, and particularly the local subsystem is defined. By solving the local subsystem, a solution on is obtained. After assembling the whole solution as the initial iterate, the whole system will be solved by an iterative method. Because the solution on is expected to change mildly, is expected to be very near to the final solution. Therefore will be very near to the solution. Actually, in some of our numerical cases, the residual corresponding to is so small that the convergence criteria is satisfied, and is a final solution.
The construction of the local domain, or the subset , plays a key role in Algorithm 1. In the following, two methods will be introduced to construct the local domain.
2.2 Gradient-based local domain construction
The linear system (2.1) is discretized from a partial differential equations (such as radiation diffusion equations). For the solution of a partial differential equations, its gradient can be defined. For example, if is a solution of a two dimensional radiation diffusion equation at time step , then the gradient of is defined as . The norm of the gradient, , can be used to predict the variation of the solution from time step to time step .
For the linear system (2.1), the similar norm of the gradient can be defined, which can be used to construct the local domain . For a point , if the norm of the gradient of the initial iterate is large at this point, then it is likely that the solution may change greatly around this point, which will be considered as an element for the local domain . Specifically, for a given row of the matrix , the non-zero entries of this row reflect the adjacency relationship of the point to other points. Therefore, the gradient at point (component ) is defined by
Note that this definition is not the exact gradient of a field (such as temperature, density, etc.) in two-dimensional or three-dimensional space, but just a mimicry. More precisely, this is very similar to the norm of the gradient at each point . For example, if is obtained by the five-point difference scheme, then the summation in (4) includes four terms, as shown in Fig. 1, in which and are used as indices for cells. By definition in (4), the gradient of cell is
2.3 Residual-based local domain construction
The second method for constructing the local domain is based on the residual of the linear system. The method consists of two steps. First find the components with large residual, and the bad points set, or the local domain, , will be constructed. Second consider the influence of the bad points on other degrees-of-freedom, and the set of bad points then extended to include other degrees-of-freedom on which the solution is also likely to change.
2.3.1 Finding bad points
To iteratively solve the linear system (2.1), a commonly used convergence criteria is given by
where is a prescribed tolerance and is an iterative solution. Since
the condition (5) will be satisfied if
which is equivalent to
Now, by the initial iterate and the convergence criteria tolerance , a component is defined as a bad point if
Specifically, the initial local domain is defined by
2.3.2 Expanding the local domain
When the local subsystem of the bad points is solved, the solution at the bad points is changed. This then has further influence on points adjacent to the bad points. Therefore, the set of the bad points will further be extended. That is, the subset of the bad points (local domain) obtained by (6) will further be expanded.
By the residuals, the influence of the local domain to the point can be given.
Assume that is a local domain. For one point , let be the residual at point with the initial iterate, and let be the residual at point after solving the local domain subsystem. If and , then the local domain has influence on the point ; otherwise, the local domain has no influence on the point .
It is easy to see that
which means the relative variation of the solution at all the bad points is less than one (This is satisfied in our numerical tests). Then it is easy to see that
This inequality shows that if
then the local domain will have not influence on point (by Definition 2.1). Otherwise, the local domain will be expected to have influence on point . In this case, the local domain will be extended. Specifically, assume that is a neighbor of . If
then will be incorporated into the local domain subset .
Note that the extension process can be implemented several times until the size of the subset no longer increases. In practice, a few times of expanding is enough. The specific process is given in Algorithm 3.
In Algorithm 3, two steps are used to construct the local domain . The first step is to construct by the component residuals, and those components with large residuals are incorporated into . The second step is to extend by at most iterations. In each expanding iteration, some neighbors of are further incorporated into .
The following example shows the specific implementation of Algorithm 3. Consider a linear system with the coefficient matrix given by
The solution and the initial iterate are set, respectively, as
The right hand side is set to .
Consider the application of Algorithm 3 to this linear system. The convergence criterion is set to , and the parameter for maximal number of extension is set to . It is easy to check that . The initial residual is
In the first step of Algorithm 3, points 1, 2 and 3 are marked as the bad points, i.e., . In the second step of Algorithm 3, is further extended. The loop of extension process is implemented 4 times. After each loop, the main results are listed in the following table. In the fourth iteration, and , therefore, and no point will be incorporated into . The loop will break.
2.4 Parallelization of the local character-based methods
For parallel computing, assume that processors are used. For simplicity, assume that , where is an integer. The matrix is partitioned by rows, that is
where is a matrix, . The solution and right-hand side are partitioned correspondingly
where and are vectors, . For convenience, let
be the component index set of the -th processor, .
The key work in the parallel version of the algorithm is to construct the local domain . When the local domain is constructed, the local domain subsystem can be obtained easily. For solving the local domain subsystem, any parallel solution method can be used. In the following, Algorithm 4 and Algorithm 5 are given for the gradient based and residual based local domain construction in parallel computing case.
3 Numerical Results
In this section, one test model and two suites of real application linear systems are used to test the effectiveness of the local character-based method. The test model is a two-dimensional nonlinear heat conduction equation. The two suites of real application linear systems are respectively the multi-group radiation diffusion equations and the three temperature energy equations.
All experiments are carried out on an E5-2620 CPU, which is clocked at 2.10 GHz and has a 16 GB memory. E5-2620 has a total of 12 cores on two sockets.
The local character-based method is compared with the typical solution method. In the test, the compared method is a BoomerAMG preconditioned GMRES method. For BoomerAMG, Falgout coarsening (a combination of CLJP and the classical Ruge-Stuben coarsening) and “classical” interpolation are used, the maximal number of levels is set to 8, and one symmetric Gauss-Seidel iteration is used for both pre-relaxation and post-relaxation. The maximal GMRES iteration number is set to 80, and the Krylov dimension is 40.
In the local character-based method, the local domain subsystem is solved by the BoomerAMG preconditioned GMRES method with completely the same parameters for solving the global system. One Gauss-Seidel iteration is applied to smooth . In the local character-based methods, the parameters in Algorithm 2 and in Algorithm 3 are predefined by users, and some numerical analysis is given for different and .
For the local character-based method, after solving the sub-domain system, the assembled solution in (3) satisfies the convergence criteria in most of the tests. Therefore, no further iteration is implemented for the global system in most cases.
For the purpose of presenting the results, the following notations are used to represent the specific methods:
: BoomerAMG preconditioned GMRES method.
: Local character-based method in which the local domain is constructed by Algorithm 2.
: Local character-based method in which the local domain is constructed by Algorithm 3.
Furthermore, the following notations will be used to report the results.
: the scale of the global linear system.
: the scale of the sub-domain linear system in the local character-based method.
: percentage of the scale of the sub-domain linear system to the scale of the global system.
: CPU time for solving the linear system by BoomerAMG preconditioned GMRES method.
: CPU time for solving the linear system by the local character-based method.
: CPU time for constructing the local domain in the local character-based method.
: CPU time for solving the sub-domain system in the local character-based method.
: the speedup of CPU time for solving the linear system by the local character-based method to BoomerAMG preconditioned GMRES method.
3.1 Test of two-dimensional heat conduction equation
Consider the following two-dimensional (2-D) heat conduction equations:
In this test, , , and are set respectively as
By using the backward Euler method to discretize in time, one obtains
The mesh is , and a five-point finite difference method is used to discretize in space, and a nonlinear system is obtained.
In this test, the time step size is . The simulation is implemented from physical time 0 to 1, and the total number of time steps is 100. The Picard method is used to solve the system, and the linear system in Picard iteration will be solved by , , or . The parameter in is set to , and the in is set to 1. For clarity, the time advancing and nonlinear iteration process is given in Algorithm 6.
The loop in Algorithm 6 is for time advancing, and the loop is for nonlinear iteration. Eq. (9) is the Picard linearized system. The convergence tolerance for the linear system is , and the stopping criterion for the nonlinear Picard iteration is . In the computation, the previous nonlinear iterate is used as the initial guess for solving the current linear system.
First a comparison between the solutions obtained by the three different methods is conducted. Fig. 2 shows the difference in solution obtained by different methods as a function of time, given by
Here, , , and are the solutions obtained by , , and , respectively, at each time step. From this figure, one can see that for all 100 time steps, the maximal value of and are, respectively, and . This shows that the approximation solution obtained by different methods are very close and they can be considered almost the same.
In the test, if , , or is used to solve all the linear systems for all 100 time steps, the total CPU time for solving the linear systems is respectively , , and . Therefore, the speedups of and to are about 1.40 and 1.32, respectively.
Fig. 3 shows the average speedups of the two local character-based methods at each time step. At same time, the average percentage of the scale of the sub-domain linear system to the scale of the global system, , for two local character-based methods is shown in the figure. From this figure, one can see that both methods have the similar performance in the simulation. With time advancing, the speedup deceases, and increases. At later simulation time, performs a little better than .
Fig. 4 shows the specific speedup and percentage of the scale of the sub-domain linear system to the scale of the global system at each nonlinear iteration for the two local character-based methods. In the figure, the results at four time steps are given. From this figure, one can see that have a relatively stable performance at each time step. At the same time, the percentage of the sub-domain linear system is stable for all nonlinear iterations at each time. With time advancing, the performance of decreases. For , one can see that it has similar performance for the early 6 nonlinear iterations at all time steps. For the later nonlinear iterations, performs unstable. At about three nonlinear iterations, its performance deteriorates.
For 2-D heat conduction equation, the local domains determined by and are almost the same. As examples, Fig. 5 shows the specific local domain in the first nonlinear iteration at time steps 25, 50, 75, and 100. At the same time, the solutions at these two time steps are shown in the figure. One see can that the local domain and are almost the same, and they just contain the region where the solution varies greatly. Here represents the local domain obtained by Algorithm 2 and represents the local domain obtained by Algorithm 3.
3.2 Scalability of the local character-based methods for the 2-D heat conduction equation
In this section, the scalability test of the local character-based methods are done for the 2-D heat conduction equation. In the test, the scale on each processor is fixed as , and the number of processors varies from 1 to 256. For all tests, the time step size is fixed as , and the number of time steps is 100. The test results are listed in Table 2. In the table, the following notations are used:
: CPU time for solving all the linear systems in the simulation.
: Total number of linear equations in the simulation.
: The average CPU time for solving one linear system.
: The speedup of and to (based on the average CPU time for solving one linear system).
From Table 2, one can see that and scale better than by comparing the the total CPU time for solving all the linear equations, . By comparing the numbers of linear equations for different methods, we find that more linear equations are solved for and than . By comparing the average CPU time for solving one linear equation, one can see that and perform much better than . Correspondingly, the speedups of and (for solving one linear equation) are from 1.42 to 3.79. With the increase of the scale, the speedup increases for both and .
3.3 Test of multi-group radiation diffusion equations
In this subsection, the local character-based method will be used to solve the multi-group radiation diffusion equations in real applications. The linear equations for the test are extracted from the radiation hydrodynamics code—Lared-integration, which is used for inertial confinement fusion (ICF) study .
3.3.1 Multi-group radiation diffusion equations
The multi-group radiation diffusion equations are a set of coupled partial differential equations, which are given as follows:
In the equations,
is the number of energy groups.
is the radiation energy density of the -th group.
is the radiation diffusion coefficient of the -th group.
is the integration of the Planck function of the -th group.
is the cross section of the -th group.
In real applications, the multi-group radiation diffusion equations are coupled to electron-ion equations, and they form a strong nonlinear system. For more details about the radiation diffusion model, see [11, 25]. We will focus on the solution for the multi-group radiation diffusion equations in this paper. Equations (10) are a time-dependent nonlinear system. The backward Euler method is used to discretize in time. For solving the nonlinear system after temporal discretization, a nonlinear iteration (based on Picard-Newton linearization) is used, and in each nonlinear iteration, the source iteration method is used to solve the multi-group radiation diffusion equations [11, 16]. In each source iteration, the radiation diffusion equations are decoupled, and each group radiation diffusion equation can be solved independently. By considering the time advancing, Algorithm 7 can be used to describe the main solution procedure for multi-group radiation diffusion equations.
A Lagrangian method is used in the Lared-integration code and the mesh moves with the motion of the fluid. Consequently, the multi-group radiation diffusion equations are discretized on two-dimensional deforming meshes. A nine-point scheme is used to discretize the diffusion equations.
3.3.2 Linear systems for test
In the test, the number of cells is 7040. The number of groups is , and therefore, 64 equations must be solved in each source iteration.
Linear systems are tested for all groups, in a given source iteration of a given nonlinear iteration, at time steps 1,000 and 5,000. For notation, each linear system is denoted as TNSG, which means the linear system of group in the th source iteration of th nonlinear iteration at th time step. For example, T1000N0S0G0 represents the linear system of -th group in -th source iteration of -th nonlinear iteration at time step 1,000. All the coefficient matrices of the linear system have the same sparsity pattern, shown in Fig. 6.
3.3.3 Test results
We test a total of 128 linear systems on two time steps, i.e., T1000N0S0G0-G63 and T5000N0S0G0-G63. In this test, the parameter in is set to , and in is set to 1. The convergence tolerance