Hybridized discontinuous Galerkin (HDG) methods introduced a decade ago CockburnGopalakrishnanLazarov:2009:UHO have now been developed for a wide range of PDEs including, but not limited to, Poisson-type equation CockburnGopalakrishnanLazarov:2009:UHO ; CockburnGopalakrishnanSayas10 , Stokes equation CockburnGopalakrishnan09 ; NguyenPeraireCockburn10 ; rhebergen2017analysis , Euler and Navier-Stokes equations NguyenPeraireCockburn11 ; MoroNguyenPeraire11 ; rhebergen2018hybridizable , wave equations NguyenPeraireCockburn11b ; NguyenPeraireCockburn11a ; GriesmaierMonk11 ; lee2017analysis . In Bui-Thanh15 ; Bui-Thanh15a ; bui2016construction , an upwind HDG framework was proposed that provides a unified and a systematic construction of HDG methods for a large class of PDEs.
Roughly speaking, HDG methods combine the advantages of hybrid(ized) methods and discontinuous Galerkin (DG) discretizations. In particular, its inherent characteristics from DG include: i) arbitrary high-order with compact stencil; ii) ability to handle complex geometries; iii) local conservation; and iv) upwinding for hyperbolic systems. On the other hand, it also possesses the advantages of hybrid(ized) methods, namely, i) having smaller and sparser linear system for steady state problems or time-dependent problems with implicit time integrators; ii) -adaptivity-ready using the trace space; iii) facilitating multinumerics with different hybrid(ized) methods in different parts of the domain; and iv) when applicable, providing superconvergence by local post-processing AB85 ; CockburnGopalakrishnanLazarov:2009:UHO . Thus for complex multiphysics applications with disparate spatial and temporal scales (e.g. magnetohydrodynamics and atmospheric flows), high-order HDG spatial discretization together with high-order implicit time integrator is a strong candidate for large scale simulations in modern extreme-scale computing architectures owing to its high computation-to-communication ratio.
The main challenge facing hybridized methods is, however, the construction of scalable solvers/preconditioners for the resulting trace systems. Over the past 30 years, a tremendous amount of research has been devoted to the convergence of multigrid methods for such linear systems, both as iterative methods and as preconditioners for Krylov subspace methods. Optimal convergence with respect to the number of unknowns is usually obtained under mild elliptic regularity assumptions bramble1993multigrid ; Bramble94uniformconvergence ; bpx1991 . Multigrid algorithms have been developed for mortar domain decomposition methods 339353 ; YotovMultigrid . Several multigrid algorithms have been proposed for hybridized mixed finite element methods Bren_MG_MFE_92 ; Chen_equiv_96 , whose optimal convergence has already been established Braess:1990 ; Brenner:nonconfmg . Multigrid algorithms based on restricting the trace (skeletal) space to linear continuous finite element space has been proposed for hybridized mixed methods Gopalakrishnan09aconvergent , hybridized discontinuous Galerkin methods cockburn2014multigrid and weak Galerkin methods chen2015auxiliary .
Iterative solvers/preconditioners for solving HDG trace systems are, however, scarced. Recent efforts on multigrid methods have been presented for elliptic PDEs cockburn2014multigrid ; chen2014robust ; kronbichler2018performance ; wildey2018unified . Attempts using domain decomposition type solvers/preconditioners have been proposed for elliptic PDEs GanderHajian:2015:ASM ; gander2018analysis , Maxwell’s equations li2014hybridizable ; he2016optimized , and hyperbolic systems iHDG ; iHDGII ; diosady2011domain . Recently, an approximate block factorization preconditioner for Stokes equations have been developed in rhebergen2018preconditioning . Thus, there is a critical need for developing robust, scalable solvers/preconditioners for high-order HDG methods to tackle high-fidelity large-scale simulations of multiphysics/multiscale systems. As a step towards to achieve this goal, we propose a multilevel approach for both solving and preconditioning the trace system of HDG discretizations. As will be demonstrated, unlike existing approaches our proposed algorithms are reasonably robust and scalable beyond elliptic PDEs.
Now let us briefly discuss the main idea behind our approach. The goal is to advance the nested dissection george1973nested —a fill-in reducing direct solver strategy—to create a scalable and robust solver utilizing the high-order and variational structure of HDG methods. This is achieved by projecting the skeletal data at different levels to either same or high-order polynomial on a set of increasingly coarser edges/faces. Exploiting the concept of two-level domain decomposition methods we make use of our approach as a coarse solver together with a fine scale solver (e.g. block-Jacobi) to create a solver/preconditioner for solving the trace system iteratively. Thanks to its root in direct solver strategy, the behavior of our approach seems to depend only on the solution smoothness, but otherwise is independent of the nature of the underlying PDE. Indeed, the numerical experiments show that the algorithms are robust even for transport equation with discontinuous solution and elliptic equations with highly heterogeneous and discontinuous permeability. For convection-diffusion equations our multilevel preconditioning algorithms are scalable and reasonably robust for not only diffusion-dominated but also moderately convection-dominated regimes. We show that the two-level approach can also be interpreted as a multigrid algorithm with specific intergrid transfer and smoothing operators
. Our complexity estimates show that the cost of the multilevel algorithms is somewhat in between the cost of nested dissection and standard multigrid solvers.
This paper is organized as follows. Section 2 introduces the model problem, notations, and an upwind HDG method considered in this paper. In Section 3, we first recall the nested dissection approach and then explain how it can be advanced using HDG variational structure and the two-level domain decomposition approach. We also show that our approach can be interpreted as a multigrid method, and estimate the complexity of the proposed multilevel solvers. Section 4 presents several numerical examples to study the robustness and scalability of the proposed algorithm for Poisson, transport and convection-diffusion equations as the mesh and the solution order are refined. Finally, Section 5 summarizes our findings and discusses future research directions.
2 Model Problem, notations, and an upwind HDG method
We consider the following model problem
where is an open, bounded, and connected subset of , with 222Note that the treatment for 1D problems is trivial and hence omitted.. Here,
is a symmetric, bounded, and uniformly positive definite tensor. Letbe a conforming partition of into non-overlapping elements , , with Lipschitz boundaries such that and . The mesh size is defined as . We denote the skeleton of the mesh by : the set of all (uniquely defined) interfaces between elements. We conventionally identify
as the outward normal vector on the boundaryof element (also denoted as ) and as the outward normal vector of the boundary of a neighboring element (also denoted as ). Furthermore, we use to denote either or in an expression that is valid for both cases, and this convention is also used for other quantities (restricted) on a face .
For simplicity, we define as the -inner product on a domain and as the -inner product on a domain if . We shall use as the induced norm for both cases. Boldface lowercase letters are conventionally used for vector-valued functions and in that case the inner product is defined as , and similarly , where is the number of components () of . Moreover, we define and whose induced norms are clear, and hence their definitions are omitted. We employ boldface uppercase letters, e.g. , to denote matrices and tensors. We denote by the space of tensor product polynomials of degree at most in each dimension on a domain . We use the terms “skeletal unknowns” and “trace unknowns” interchangeably and they both refer to the unknowns on the mesh skeleton.
First, we cast equation (1) into the following first-order form:
For simplicity, we have suppressed the explicit statement that equations (3a), (3b) and (3c) must hold for all test functions , , and , respectively (this is implicitly understood throughout the paper), where , and are defined as
and similar spaces , and on and can be defined by replacing with and with , respectively.
3 A Multilevel solver for the HDG trace system
The HDG solution process involves the following steps:
The main advantage of this Schur complement approach is that, for high-order, the global trace system is much smaller and sparser compared to the linear system for the volume unknowns CockburnGopalakrishnanLazarov:2009:UHO ; bui2016construction . Since Steps 1. and 3. are embarassingly parallel, the main bottle neck for HDG in large scale simulations is the solution of the global trace system (Step 2.).
3.1 A brief review on solvers/preconditioners for HDG trace system
In this section we briefly discuss existing works on solvers for HDG methods and our contributions. The first geometric multigrid solver for HDG methods was introduced in cockburn2014multigrid . The main idea was to transfer the residual from the skeletal space to linear continuous Galerkin FEM space, and then carry out the standard multigrid algorithm. A similar concept with few modifications was pursued in chen2014robust for the simulation of high frequency Helmholtz equation discretized by HDG. In kronbichler2018performance the authors studied a version of the multigrid algorithm proposed in cockburn2014multigrid along with multigrid algorithms for continuous Galerkin and interior penalty discontinuous Galerkin for standard elliptic equation. They concluded that both continuous and interior penalty discontinuous Galerkin algorithms with multigrid outperforms HDG with multigrid in terms of time to solution by a significant margin. One level Schwarz type domain decomposition algorithms in the context of HDG have been studied for elliptic equation GanderHajian:2015:ASM ; gander2018analysis , hyperbolic systems iHDG ; iHDGII and Maxwell’s equations li2014hybridizable ; he2016optimized . A balancing domain decomposition by constraints algorithm for HDG was introduced in diosady2011domain and studied for Euler and Navier-Stokes equations. A unified geometric multigrid algorithm based on Dirichlet-to-Neumann maps was developed in wildey2018unified for hybridized methods including HDG. An approximate block factorization based preconditioner for HDG discretization of Stokes system was presented in rhebergen2018preconditioning .
The objective of our current work is to develop a robust multilevel solver and preconditioner for HDG discretizations for a wide variety of PDEs. The ultimate goal is to significantly reduce factorization and memory costs compared to a direct solver. Unlike cost reduction strategies for direct solvers in martinsson2009fast ; gillman2014direct ; ho2016hierarchical which utilizes the elliptic nature of PDEs, here we exploit the high-order and variational structure of HDG methods. As a result, our method is applicable to not only elliptic but also parabolic, hyperbolic, and mixed-type PDEs. For ease of the exposition and implementation, we will focus only on structured grids. While extending the algorithm to block-structured or nested grids is fairly straightforward, applying it to a completely unstructured grid is a non-trivial task and hence left for future work.
3.2 Nested dissection
As nested dissection idea is utilized in the proposed multilevel algorithm, we briefly review its concept (more details can be found in george1973nested ; george1978automatic ; lipton1979generalized ; liu1992multifrontal ; duff1983multifrontal ; davis2006direct ). Nested dissection (ND) is a fill-in reducing ordering strategy introduced in 1973 george1973nested for efficient solution of linear systems. Consider a solution on an quadrilateral HDG skeletal mesh in Figure 1(a) (the boundary nodes are eliminated for clarity). In the ND algorithm, we identify a set of separators which divide the mesh into independent subdomains. For example, the black nodes in Figure 1(a) divide the mesh into four independent subdomains each of which can be recursively divided into four subdomains and so on. We then order the nodes such that the red ones are ordered first, followed by blue and then the black ones. This will enable a recursive Schur complement approach in a multilevel fashion and the nodes remaining in the system after elimination at each level is shown in Figure 1 (for three levels).
There are several advantages to this algorithm. First, it can be shown that the serial complexity of factorization for an matrix arising from 2D problems with this procedure is , and the memory requirement is george1973nested . Whereas with a naive lexicographic ordering, it is for factorization and for memory george1973nested . Moreover, in 2D it is optimal in the sense that the lower bound of the cost for factorization using any ordering algorithm is george1973nested . Second, all the leaf calculations at any level are independent of each other and hence are amenable to parallel implementation. However, in 3D the cost is for factorization and for memory george1973nested , and we no longer have the tremendous savings as in 2D eijkhout2014introduction . This can be intuitively understood using Figure 1. The separator fronts (the crosses at each level in Figure 1) grow in size as the level increases. For example, the black crosses have more edges and nodes than the blue ones, which in turn has more edges and nodes than the red ones. On the last level, the size is for 2D and the cost of a dense matrix factorization for the separator front matrix corresponding to the last level is . In 3D the size of the separator at last level is and hence the factorization cost becomes . Thus in order to cut down the factorization and storage cost of the ND algorithm we need to reduce the front growth.
There have been many efforts in this direction over the past decade martinsson2009fast ; xia2009superfast ; schmitz2012fast ; gillman2014direct ; ho2016hierarchical . The basic idea in these approaches is to exploit the low rank structure of the off-diagonal blocks, a characteristic of elliptic PDEs, to compress the fronts. In this way one can obtain a solver which is or in both 2D and 3D gillman2014direct ; ho2016hierarchical . Unfortunately, since the compression capability is a direct consequence of the ellipticity, it is not trivially applicable for convection-dominated or pure hyperbolic PDEs. Our goal here is to construct a multilevel algorithm that is independent of the nature of PDE and at the same time more efficient than ND. At the heart of our approach is the exploitation of the high-order properties of HDG and the underlying variational structure.
3.3 Direct multilevel solvers
In our multilevel algorithm, we start with the ND ordering of the original fine mesh (red, blue, and black edges) as in Figure 1(a). Here, by edges we mean the original elemental edges (faces) on the fine mesh. Let us denote the fine mesh as level 0. In Figure 2(a), all red crosses have 4 edges, blue crosses have 8 edges and black cross has 16 edges. On these edges are the trace spaces (5c), and thus going from level to level the separator front grows by a factor of two. We propose to reduce the front growth by lumping the edges so that each cross at any level has only four (longer) edges as on level 1 separator fronts. We accomplish this goal by projecting the traces on original fine mesh skeletal edges into a single trace space on a single longer edge (obtained by lumping the edges). Below are the details on how we lump edges and how we construct the projection operators.
The lumping procedure is straightforward. For example, longer edges on each blue cross in Figure 2(b) are obtained by lumping the corresponding two blue (shorter) edges. Similarly, longer edges on each black cross in Figure 2(b) are obtained by lumping the corresponding four black (shorter) edges. The resulting skeleton mesh with the same number of edges on the separator fronts in all levels forms level 1 in our multilevel algorithm.
Next, we project the traces spaces on shorter edges into a single trace space on the corresponding lumped edge. The three obvious choices for the solution order of the single trace spaces: (1) lower than, (2) same as, or (3) higher than the solution order on the shorter edges. Low-order option is not sensible as we have already coarsened in . In particular, additional coarsening in , i.e. option (1), makes the solver even further away from being “direct”. Moreover, since we already invert matrices of size for separators in level 1, low-order choice will not help in reducing the cost. For option (2), we obtain separators which are increasingly coarsened in , and clearly when we apply the ND algorithm this approach do not yield a direct solution nor -convergence. However, it can be used in the construction of an iterative solver/preconditioner as we shall discuss in section 3.4.
Option (3), i.e. high-order projection, is more interesting as we now explain. Due to exponential convergence in for smooth solution CockburnGopalakrishnanSayas10 , we can compensate for the coarseness in by means of refinement in . As a result, for sufficiently smooth solution, projecting to high-order trace spaces can provide accurate approximations to the original ND solution while almost avoiding the front growth. In our algorithm we enrich the order in the following fashion: if is the solution order on the original fine mesh, then level 1 separators have solution of order , for separators on level 2, for separators on level 3 and so on. For practical purposes we also (arbitrarily) limit the growth to order to actually stop the front growth after 10 orders. Specifically, for a generic level we take the solution order on the separator fronts as . We would like to point out that this enriching strategy is natural, but by no means optimal. Optimality requires balancing accuracy and computational complexity, which in turns requires rigorous error analysis of the enrichment. This is, however, beyond the scope of this paper and thus left for future research.
To the end of the paper, we denote option (2) as multilevel (ML) and option (3) as enriched multilevel (EML) to differentiate between them. In Figures 3 and 4 are different levels corresponding to the ML and EML approaches for solution order of on the original fine mesh. Level 0 of both cases corresponds to Figure 1(a). Note that the number of circles on each edge is equal to the solution order plus one. For example, the solution order on each edge of Figure 3(c) is , while the enriched solution order is for each edge in Figure 4(c).
3.4 Combining multilevel approaches with domain decomposition methods
As discussed in Section 3.3, both ML and EML strategies are approximations of direct solver. A natural idea is to use them as “coarse” scale solvers in a two-level domain decomposition method quarteroni1999domain ; smith2004domain ; toselli2006domain . In particular, either ML or EML approach can be used to capture the smooth components and to provide global coupling for the algorithm, whereas a fine scale solver can capture the high-frequency localized error and the small length scale details and sharp discontinuities. This combined approach can be employed in an iterative manner as a two-level solver in the domain decomposition methods.
We select block-Jacobi as our fine scale solver, where each block corresponds to an edge in the original fine mesh in Figure 1(a). The reason for this choice is that block-Jacobi is straightforward to parallelize, insensitive to ordering direction for problems with convection and also reasonably robust with respect to problem parameters. This is also supported by our previous work on geometric multigrid methods wildey2018unified for elliptic PDEs, where we compared few different smoothers and found block-Jacobi to be a competitive choice. We combine the fine and coarse scale solvers in a multiplicative way as this is typically more effective than additive two-level solvers especially for nonsymmetric problems toselli2006domain .
We would like to point out that due to the approximate direct solver characteristic of our coarse scale solvers, regardless of the nature of the underlying PDEs, our two-level approaches are applicable. This will be verified in various numerical results in Section 4. Next we layout the algorithm for our iterative multilevel approach.
3.5 Iterative multilevel solvers/preconditioners
In Figure 5 we show schematic diagrams of the two-level approaches described in Section 3.4 combining block-Jacobi fine-scale solver and ML or EML coarse-scale solvers (coarse in the sense). Algorithm 1 describes in details every step of these iterative multilevel solvers and how to implement them. In particular, there we present the algorithm for linear problems or linear systems arising from Newton linearization or Picard linearization for nonlinear problems. Note that we perform the factorizations of both coarse- and fine-scale matrices before the start of the iteration process so that during each iteration only back solves are needed. To precondition the GMRES algorithm we simply use one v-cycle of these iterative approaches.
3.6 Relationship between iterative multilevel approach and multigrid method
In this section we will show that the iterative multilevel approach presented in Algorithm 1 can be viewed as a multigrid approach with specific prolongation, restriction, and smoothing operators. To that end, let us consider a sequence of interface grids where each contains the set of edges which remain at level . Here, is the fine interface grid and is the coarsest one. Each partition is in turn associated with a skeletal (trace) space . We decompose as , where is the set of interior edges, corresponding to separator fronts at level , and is the set of remaining (boundary) edges. To illustrate this decomposition, let us consider Figures 3 and 4. Red edges, blue, and black lumped edges are , , and , respectively, and for . We also decompose the trace space on into two parts and corresponding to and , respectively. Specifically, we require such that each can be uniquely expressed as , where
The spaces for ML algorithm is given by
whereas for EML algorithm it is given by
If the trace system at level 0 is given by
Given the decomposition , the trace system (6) at the th level can be written as
We next specify the prolongation, restriction and smoothing operators. Since, all of the operators except the ones between level 0 and level 1, correspond to ideal operators in trottenberg2000multigrid we explain them briefly here.
3.7 Prolongation operator
We define the prolongation operator for our iterative algorithm as
Here, we denote by the projection from and the identity operator on the boundary. Clearly, apart from , the prolongation operator is nothing but the ideal prolongation in algebraic multigrid methods trottenberg2000multigrid as well as in the Schur complement multigrid methods reusken1994multigrid ; wagner1997schur ; dendy1982black ; de1990matrix .
3.8 Restriction operator
We define the restriction operator for our iterative algorithm as
Here, is the adjoint of . Similar to prolongation, apart from , the restriction operator is the ideal restriction operator trottenberg2000multigrid . Given the restriction and prolongation operators, the Galerkin coarse grid operator is constructed as
Recall from the Algorithm 1 and Figure 5 that we have both pre- and post-smoothing steps using block-Jacobi at level 0. Either ML or EML algorithm implicitly provides additional smoothing. Indeed, let us consider two generic levels and . At level , given the decomposition (7) we can write the inverse of as vassilevski2008multilevel ; reusken1994multigrid
where is the Galerkin coarse grid matrix (it is also the Schur complement of in (7) trottenberg2000multigrid ). As can be seen, the second term on the right hand side of (11) is the coarse grid correction while the first term is the additive smoothing applied only to the interior nodes. Another way reusken1994multigrid to look at this is the following. If the coarse grid correction is then the block-Jacobi smoothing applied only on the interior nodes with initial guess as is given by
From the definition of prolongation operator for in (8) we see that , where “” is a term that will subsequently be multiplied by , and is thus not relevant for our discussion. As a result, obtained from (12) is the same as acting on . In other words, the implicit smoothing is equivalent to block-Jacobi smoothing on the interior nodes with the initial guess as . In AMG literature trottenberg2000multigrid this is called F-smoothing, where F stands for fine nodes. To summarize, the smoothing operator at different levels is given by
where is the block-Jacobi smoothing operator at level 0 with each block corresponding to an edge in the original fine mesh. If we denote by and the number of pre- and post-smoothing steps at level we have
Note that instead of post-smoothing inside the ML or EML solver we could also consider pre-smoothing (i.e. ) and the result remains the same trottenberg2000multigrid . Now with these specific set of operators the iterative multilevel algorithm 1 is equivalent to the following multigrid v-cycle
where . Here, the action of on a function/vector is defined recursively in the multigrid algorithm 2 and the initial guess is computed from either ML or EML solver. In algorithm 2, , and , represent the smoother with and smoothing steps respectively.
At the coarsest level , we set and the inversion is computed using direct solver. The above multigrid approach trivially satisfies the following relation trottenberg2000multigrid
where , represents the L inner product on and respectively. This is a sufficient condition for the stability of intergrid transfer operators in a multigrid algorithm trottenberg2000multigrid . The trivialness is due to the fact that: 1) our prolongation and restriction operators are ideal ones except for , for which they are adjoint of each other; and 2) the coarse grid matrices are constructed by Galerkin projection (10).
3.10 Complexity estimations
In this section we estimate the serial complexity for ND, ML and EML algorithms in both 2D and 3D. For simplicity, we consider standard square or cubical domain discretized by quad/hex elements, where is the dimension. The total number of levels is . Let be the solution order on separator fronts at level and we denote by the number of nodes on an edge/face. For simplicity, we consider Dirichlet boundary condition and exclude the boundary edges/faces in the complexity estimates.
For the ND algorithm, we define level 0 to be the same as level 1. Following the analysis in eijkhout2014introduction , we have crosses (separator fronts) at level and each front is of size and all matrices are dense. The factorization cost of the ND algorithm in 2D is then given by:
2D ND algorithm
The memory requirement is given by
As the Schur complement matrices are dense, the cost for the back solve is same as that for memory. Similarly the estimates in 3D are given as follows:
3D ND algorithm
Unlike poulson2012fast , here we have not included the factorization and memory costs for the matrix multiplication . The reason is that the asymptotic complexity for ND, ML, and EML is unaffected by this additional cost. For EML in particular, the inclusion of this cost makes the analysis much more complicated because of the different solution orders involved at different levels. As shall be shown, our numerical results in section 4.1.1 indicate that the asymptotic estimates derived in this section are in good agreement with the numerical results.
As ML is a special case of EML with zero enrichment, it is sufficient to show the estimates for EML. In this case we still have fronts at level and each front is of the size . The factorization and memory costs in 2D are then given by:
2D EML algorithm
Similarly in 3D we have:
3D EML algorithm
Here, . To enable a direct comparison with ND, we have taken . As a result, the actual cost is less than the estimated ones as . Note that either ML or EML iterative algorithm 1 requires additional cost of for factorization and of for memory and back solves due to block-Jacobi smoothing. Since these additional costs are less than the costs for ML and EML coarse solvers, they increase the overall complexity of the algorithm by at most a constant, and hence can be omitted.
From the above complexity estimates we observe that the factorization cost of our multilevel algorithms scales like in both 2D and 3D. Compared to the cost for ND algorithms which is in 2D and in 3D, a significant gain (independent of spatial dimensions) can be achieved using our methods. Similarly, the memory cost has reduced to independent of dimensions as opposed to in 2D and in 3D for the ND algorithm. Here, for ML whereas it is greater than one for EML. On the other hand, the memory and computational costs required by multigrid is typically . Thus the proposed multilevel algorithms are times more expensive in computation cost and require more memory compared to standard multigrid algorithms. The cost of the multilevel algorithms lie in between direct (ND) solvers and multigrid solvers.
4 Numerical results
In this section we test the multilevel algorithm 1 on elliptic, transport, and convection-diffusion equations. Except for the transport equation in section 4.2, the domain is a standard unit square discretized with structured quadrilateral elements. Dirichlet boundary condition is enforced strongly by means of the trace unknowns on the boundary. For the transport equation, we take and inflow boundary condition is enforced through trace unknowns while outflow boundary condition is employed on the remaining boundary. The number of levels in the multilevel hierarchy and the corresponding number of quadrilateral elements are shown in Table 1 and they are used in all numerical experiments.
We note that even though the iterative multilevel algorithm works as a solver for most of the PDEs considered in this paper, it either converges slowly or diverges for some of the difficult cases. Hence throughout the numerical section we report the iteration counts for GMRES, preconditioned by one v-cycle of the multilevel algorithm 1 with the number of block-Jacobi smoothing steps taken as .
The UMFPACK davis2004algorithm library is used for the factorization and all the experiments are carried out in MATLAB in serial mode. The specifications of the machine used for the experiments is as follows. The cluster has 24 cores (2 sockets, 12 cores/socket) and 2 threads per core. The cores are Intel Xeon E5-2690 v3 with frequency 2.6 GHz and the total RAM is 256 GB.
The stopping tolerance for the residual is set to be in the GMRES algorithm. The maximum number of iterations is limited to . In the tables of subsequent sections by “*” we mean that the algorithm has reached the maximum number of iterations.
4.1 Elliptic equation
4.1.1 Example I: Poisson equation smooth solution
In this section we test the multilevel algorithm on the Poisson equation with a smooth exact solution given by
The forcing is chosen such that it corresponds to the exact solution, and the exact solution is used to enforce the boundary conditions. The other parameters in equation (1) are taken as , where
is the identity matrix, and.
In Table 2 we show the number of GMRES iterations preconditioned by one v-cycle of iterative ML and EML algorithms 1. First, the number of iterations for EML is much less than that for ML and for high-order () solutions and fine meshes EML performs like a direct solver upto the specified tolerance. That is, the number of preconditioned GMRES iterations is . This is expected due to: 1) smooth exact solution, and 2) exponential convergence of high-order solutions, and thus the initial guess computed by EML is, within tolerance, same as the direct solution. As expected for ML, increasing solution order decreases the number of GMRES iterations; this is again due to the smooth exact solution and the high-order accuracy of HDG.
In Figures 6(b) and 6(a) we compare the time-to-solution of GMRES preconditioned by ML and EML algorithms to the one obtained from direct solver with nested dissection. As can be seen, the ratios (ND/ML and ND/EML) are greater than one for either large number of elements or high solution orders. That is, both ML and EML are faster than ND when accurate solution is desirable. In Figure 6(c), the EML and ML algorithms are compared and, apart from , EML is faster than ML though the speedup is not significant in this example. For high solution orders, ML behaves more like EML as the mesh is refined.
We now compare the memory usage of ML and EML against ND. Figures 7(a) and 7(b) show the ratio of memory usages (costs) of EML and ML algorithms relatively to ND. We can see that regardless of meshsize and solution order, both ML and EML requires (much) less memory than ND does. In particular, ML requires almost 8 times less memory than ND at the finest mesh for any solution order. For EML, memory saving is more significant as the mesh and/or solution order are refined, and EML is six times less memory demanding than ND with sixth-order solution at the finest mesh size.
Figure 7(c) compares the memory usage between EML and ML. As expected, EML always requires more memory than ML due to the enrichment. However, the maximum ratio is around 2.1 and since we limit the maximum enrichment order to , memory requirements at high orders for both methods are similar. This is also the reason that all the curves in Figure 7(c) converge to a constant value as the mesh is refined. As the maximum enrichment order can be tuned, depending on the memory availability of computing infrastructure, one can have the flexibility to adjust the EML algorithm to adapt to memory and computation demands of the problem under consideration. For example, we can perform more iterations for less memory to achieve a given accuracy.
Next we verify the complexity estimates derived in section 3.10. Figures 8, 9 and 10 show that the numerical results agree well with our estimates except for the factorization cost of ND in Figure 8(a), which seems to indicate that the asymptotic complexity of has not yet been reached. Since the results in Figures 8, 9 and 10 are independent of the PDE under consideration, in the subsequent sections we study only the iteration counts. As long as the iterations do not increase much when the mesh and the solution order are refined, we can obtain a scalable algorithm whose cost can be estimated based on the factorization and memory costs derived in section 3.10.
|ML with GMRES||EML with GMRES|
4.1.2 Example II: Discontinuous highly heterogeneous permeability
In this section we test the robustness of the algorithm for elliptic PDE with a highly discontinuous and heterogeneous permeability field. To that end, we take and in (1), where is chosen according to example 2 in ho2016hierarchical and is shown in Figure 11 for three different meshes. The forcing and boundary condition in (1) are chosen as and . This is a difficult test case as the permeability varies by four orders of magnitude and is also highly heterogeneous as seen in Figure 11.
Tables 3 and 4 show the number of iterations taken by ML- and EML-preconditioned GMRES, and in Table 5 we compare them with those taken by GMRES preconditioned by one v-cycle of geometric multigrid proposed in wildey2018unified . In Tables 3, 4 and 5 the error, , after 200 iterations is shown in the parentheses. Here, and denote trace solution vectors obtained by the direct solver and the corresponding iterative solver (ML, EML or geometric multigrid), respectively. The results show that the geometric multigrid in wildey2018unified yields the least number of iterations or more accurate approximation when convergence is not attained with 200 iterations. This is expected since ML and EML algorithms have smoothing only on the fine level while smoothing is performed on all levels (in addition to the local smoothing) for the geometric multigrid algorithm, and for elliptic-type PDEs performing smoothing on all levels typically provides better performance trottenberg2000multigrid . However, the proposed algorithms in this paper targets beyond elliptic PDEs, and for that reason it is not clear if smoothing on coarser levels helps reduce the iteration counts reusken1994multigrid . We would also like to point out that the least number of iterations in geometric multigrid does not translate directly to least overall time to solution which in turn depends on time per iteration and set-up cost for the three methods. In future work we will compare the overall time to solution for ML, EML and geometric multigrid. This challenging example clearly shows the benefits of high-order nature of HDG, that is, for all three methods as the solution order increases the solution is not only more accurate but also obtained with less number of GMRES iterations.
Between ML and EML, we can see that EML requires less number of iterations and attains more accuracy at higher levels (see, e.g., the results with in Tables 3 and 4). The benefit of enrichment is clearly observed for , in which EML is almost four orders of magnitude more accurate than ML (see last row and columns of Tables 3 and 4). For coarser meshes, the iteration counts of ML and EML are similar. Finally, it is interesting to notice that columns in Table 4, corresponding to , for EML (highlighted in blue) have similar iteration counts and accuracy when compared to columns in Table 5, corresponding to , for the geometric multigrid method.
|7||* ()||* ()||167||138||113||99|
|8||* ()||* ()||* ()||* ()||* ()||* ()|
|8||* ()||* ()||* ()||* ()||195||176|
|8||* ()||* ()||* ()||190||176||170|
4.2 Example III: Transport equation
In this section we apply ML and EML to a pure transport equation. To that end, we take in (1). Similar to MR2511736 ; iHDG , we consider the velocity field , forcing , and the inflow boundary conditions
The solution is shown in Figure 12(a) and the difficulty of this test case comes from the presence of a curved discontinuity (shock) emanating from the inflow to the outflow boundaries. In Table 6 we show the iteration counts for both ML- and EML-preconditioned GMRES for different solution orders and mesh levels. As can be seen, while scalability is not attained with both ML and EML, scalability is observed for both methods, i.e., the number of GMRES iterations is almost constant for all solution orders. Again EML takes less iteration counts than ML for all cases.
Table 7 shows the iteration counts for block-Jacobi preconditioned GMRES. Compared to ML and EML in Table 6, the iteration counts for block-Jacobi are higher, and for levels 7 and 8 block-Jacobi does not converge within the maximum number of iteration counts. This indicates though both ML and EML do not give scalable results, they provide a global coupling for the two-level algorithm and thus help in reducing the total number of iterations. Moreover, both the ML and EML algorithms are robust (with respect to convergence) even for solution with shock. It is important to point out that for pure transport problems it is in general not trivial to obtain scalable results unless some special smoother, which follows the direction of convection, is used bank1981comparison ; hackbusch1997downwind ; kim2004uniformly ; wang1999crosswind . For this reason, the moderate growth in the iteration counts for both ML and EML algorithms is encouraging.
Next, we test the algorithms on a smooth exact solution (see Figure 12(b)) given by
All the other parameters are the same as those for the discontinuous solution considered above. Tables 8 and 9 show the number of ML-, EML-, and block-Jacobi-preconditioned GMRES iterations. Table 8 shows that the performance of ML- and EML-preconditioned GMRES is similar to the one observed for the elliptic equation with smooth solution in Table 2. Block-Jacobi preconditioned GMRES, on the other hand, is more or less independent of the smoothness of the solution as Table 9 for the smooth solution is very similar to Table 7 for the discontinuous solution. Thus this example demonstrates that, unlike many standard iterative algorithms which depend on the nature of the PDE under consideration, the performance of ML and EML algorithms seems to depend only on the smoothness of the solution and otherwise is independent of the underlying PDE. This behavior is expected, again, thanks to their root in direct solver strategy.
|ML with GMRES||EML with GMRES|
|ML with GMRES||EML with GMRES|
4.3 Convection-diffusion equation
In this section we test the proposed algorithms for the convection-diffusion equation in both diffusion- and convection-dominated regimes. To that end, we consider in (1). We shall take some standard problems that are often used to test the robustness of multigrid algorithms for convection-diffusion equations.
4.3.1 Example IV
Here we consider an example similar to the one in gillman2014direct . In particular, we take