Efficient discontinuous Galerkin implementations and preconditioners for implicit unsteady compressible flow simulations

by   Matteo Franciolini, et al.
University of Michigan

This work presents and compares efficient implementations of high-order discontinuous Galerkin methods: a modal matrix-free discontinuous Galerkin (DG) method, a hybridizable discontinuous Galerkin (HDG) method, and a primal formulation of HDG, applied to the implicit solution of unsteady compressible flows. The matrix-free implementation allows for a reduction of the memory footprint of the solver when dealing with implicit time-accurate discretizations. HDG reduces the number of globally-coupled degrees of freedom relative to DG, at high order, by statically condensing element-interior degrees of freedom from the system in favor of face unknowns. The primal formulation further reduces the element-interior degrees of freedom by eliminating the gradient as a separate unknown. This paper introduces a p-multigrid preconditioner implementation for these discretizations and presents results for various flow problems. Benefits of the p-multigrid strategy relative to simpler, less expensive, preconditioners are observed for stiff systems, such as those arising from low-Mach number flows at high-order approximation. The p-multigrid preconditioner also shows excellent scalability for parallel computations. Additional savings in both speed and memory occur with a matrix-free/reduced version of the preconditioner.



page 17

page 26

page 29


A hybridizable discontinuous Galerkin method for two-phase flow in heterogeneous porous media

We present a new method for simulating incompressible immiscible two-pha...

Linearizing the hybridizable discontinuous Galerkin method: A linearly scaling operator

This paper proposes a matrix-free residual evaluation technique for the ...

Alias-free, matrix-free, and quadrature-free discontinuous Galerkin algorithms for (plasma) kinetic equations

Understanding fundamental kinetic processes is important for many proble...

Parallel Scaling of the Regionally-Implicit Discontinuous Galerkin Method with Quasi-Quadrature-Free Matrix Assembly

In this work we investigate the parallel scalability of the numerical me...

Recovering missing CFD data for high-order discretizations using deep neural networks and dynamics learning

Data I/O poses a significant bottleneck in large-scale CFD simulations; ...

A quasi-robust discretization error estimate for discontinuous Galerkin Isogeometric Analysis

Isogeometric Analysis is a spline-based discretization method to partial...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, high-order discontinuous Galerkin (DG) methods have garnered attention in the field of Computational Fluid Dynamics. With increasing availability of high-performance computing (HPC) resources, the use of high-order methods for unsteady flow simulations has become popular. The success of these methods can be attributed to attractive dispersion and diffusion properties at high orders, ease of parallelization thanks to their compact stencil, and accuracy on unstructured meshes around complex geometries. However, the implementation of an efficient solution strategy for high-order DG methods is still a subject of active research, especially for unsteady flow problems involving the solution of the Navier–Stokes (NS) equations.

Several previous studies, see for example Bassi et al. (2007, 2015), demonstrated that implicit schemes in the context of high-order spatial discretizations are necessary to efficiently overcome the strict stability limits of explicit time integration schemes Hesthaven and Warburton (2007). Implicit strategies require the solution of a large system of equations, which is typically performed with iterative solvers such as the generalized minimial residual method (GMRES). The choice of the preconditioner is a key aspect of the strategy and has been explored extensively in the literature, see for example Wang and Mavriplis (2007); Persson and Peraire (2008); Diosady and Darmofal (2009); Birken et al. (2013). Among those, the use of multilevel algorithms to precondition a flexible GMRES solver Saad (1993) has been demonstrated as a promising choice for both compressible Wang and Mavriplis (2007); Fidkowski et al. (2005); Shahbazi et al. (2009); Diosady and Darmofal (2009) and incompressible flow problems Botti et al. (2017); Franciolini et al. (2018). Superior iterative performance compared to single-grid preconditioned solvers has been observed, as well as better parallel efficiency on distributed-memory architectures.

The application of implicit time integration strategies to DG discretizations is hindered by computational time and memory expenses associated with the assembly and storage of the residual Jacobian matrix. Although the Jacobian is sparse, the number of non-zero entries scales as , where is the approximation order and is the spatial dimension. Thus, the costs grow rapidly with approximation order, particularly in three-dimensional problems. Motivated by this scaling, previous works Crivellini and Bassi (2011); Ceze et al. (2016); Franciolini et al. (2017, 2018) considered the possibility of a reduced matrix storage ("matrix-free") implementation of the iterative solver. This implementation avoids the allocation of the Jacobian matrix but still requires the allocation of a preconditioner operator which in some cases may still be quite large. In this context, the use of multilevel matrix-free strategies with cheap element-wise block-Jacobi preconditioners on the finest level appears to balance computational efficiency and memory considerations with iterative performance for stiff systems. The latter is relevant to solvers applied to DG discretizations, for which the condition number scales as  Cockburn et al. (2014), where is the mesh dimension.

The size of the DG linear system can be reduced through hybridizable discontinuous Galerkin (HDG) methods, which have been recently considered as an alternative to the standard discontinous Galerkin discretization Nguyen et al. (2009); Fidkowski (2016). HDG methods introduce an additional trace variable on the mesh faces but can reduce the number of globally-coupled degrees of freedom relative to DG, when a high order of polynomial approximation is employed. The reduction occurs through a static condensation of the element-interior degrees of freedom, exploiting the block structure of the HDG Jacobian matrix. Thanks to this operation, the memory footprint of the solver scales as . Additionally, HDG methods exhibit superconvergence properties of the gradient variable in diffusion-dominated regimes. On the other hand, they increase the number of operations local to each element, both before and after the linear system solution. While several works have compared the accuracy and cost of HDG versus continuous Kirby et al. (2012); Yakovlev et al. (2016) and discontinuous Woopen et al. (2014); Fidkowski (2016) Galerkin methods, a comparison considering the efficiency of iterative solvers applied to the solution of unsteady flows is missing in this context. In fact, it is worth pointing out that, whereas HDG reduces the number of globally-coupled degrees of freedom relative to DG, at high approximation orders, its element-local operation count is non-trivial. This is particularly the case for viscous problems, in which the state gradient is approximated as a separate variable. An alternative approach is to only approximate the state, and to obtain the gradient when needed by differentiating the state. This leads to the primal HDG formulation Dahm (2017); Devloo et al. (2018), which we also consider in this work. The advantage of primal HDG relative to standard HDG lies mainly in the reduction of element-local operations, which translates into improved computational performance.

While for DG the use of multilevel strategies to deal with ill-conditioned systems has been previously studied, their use in HDG contexts appears not to have yet been explored, especially for unsteady flow problems. A multilevel technique has been introduced in the context of an -multigrid strategy built using the trace variable projection on a continuous finite element space Cockburn et al. (2014). In addition, the use of an algebraic multigrid method applied to a linear finite element space obtained by Galerkin projection has been proposed in the context of elliptic problems Kronbichler and Wall (2018). A similar idea is also considered to speed-up the iterative solution process in HDG Schütz and Aizinger (2017).

The present work focuses on the comparison of implicit solution strategies in the context of unsteady flow simulations for the three aforementioned implementations, i.e. DG, HDG, and primal HDG. The comparison includes efficient preconditioning, such as -multigrid, to deal with the solution of stiff linear systems arising from high-order time discretizations. In particular, an efficient algorithm to inherit the coarse space operators at a low computational cost in the context of HDG is presented for the first time. The scalability of the linear solution process is also considered and compared to standard single-grid preconditioners, such as a incomplete lower-upper factorization, ILU(0) Diosady and Darmofal (2009). The efficiency of the different solution strategies and the overall memory footprint is assessed on two-dimensional laminar compressible flow problems. The results of these cases demonstrate that (i) the different discretizations attain similar error levels, (ii) the use of a multilevel strategy reduces the number of linear iterations in all cases tested, (iii) only for the DG discretizations is this advantage reflected in the CPU time, and iv) the primal HDG and -multigrid matrix-free DG solvers yield comparable solution times and memory footprints, faster than standard, single-grid preconditioners like ILU(0) applied to DG.

The paper is structured as follows. Section 2 presents the differential equations, and Sections 3 and 4 show the spatial and temporal discretizations used in this work. Section 6 presents the -multigrid preconditioner implementations. Section 7 reports numerical experiments using different preconditioning strategies, including ILU(0), block Jacobi, and -multigrid. These experiments are performed on a range of test cases, including two-dimensional airfoils and a circular cylinder. The methods are compared in terms of iterations, computational time, and memory footprint.

2 Governing equations

This work considers solutions of the compressible Navier–Stokes (NS) equations, which can be written as


with the velocity, and the number of space dimensions. The total energy , total enthalpy , pressure

, total stress tensor

, and heat flux are given by

, , ,
, , .

Here is the internal energy, is the enthalpy, is the ratio of gas specific heats, is the viscosity, is the molecular Prandtl number, and is the mean strain-rate tensor. The space and time discretizations are outlined in the following sections.

3 Spatial discretization

(a) DG
(b) HDG
Figure 1: Schematic comparison of the solution approximation and flux evaluation in DG versus HDG.

Three versions of a modal discontinuous Galerkin (DG) finite element method are considered in this work. The first one is a standard DG implementation, which employs basis functions defined in the reference element space. The second one, commonly referred to as the hybridizable discontinuous Galerkin (HDG) method, introduces an additional set of variables defined on the mesh element interfaces to reduce the globally coupled degrees of freedom compared to DG, as shown in Figure 1. This implementation explicitly uses a mixed form for the gradient states (also known as the dual variable), i.e. the gradients are used as an additional element-wise variable, and increase the accuracy of the gradient evaluation. A third implementation, primal HDG Dahm (2017), reduces the computational costs of the solver by removing the dual variable from the equations. This approach is desirable when an increased accuracy of the gradient variable is not strictly necessary or achievable for the problem. In all cases, two types of basis functions in the reference space have been considered: polynomial functions of maximum degree equal to as well as tensor product functions of degree in each dimension. In this work the former is employed within triangular grids, and the number of degrees of freedom per equation per element is . The latter approach is used for quadrilateral mesh elements, and .

In all cases, the discretization is based on an approximation of the domain and a triangulation of made by a set of non-overlapping elements, denoted by . Here, stands for the set of internal element faces, the set of boundary element faces and their union. We also define


where denotes a generic mesh element face. Following Brezzi et al. Arnold et al. (2002), we also introduce the average trace operator, which on a generic internal face is defined as , where

denotes a generic scalar or vector quantity. This definitions can be suitably extended to domain boundary faces by accounting for the weak imposition of boundary conditions.

3.1 Discontinuous Galerkin

In compact form, the system (1) can be expressed as


where is the vector of conservative variables and are the inviscid and viscous fluxes, with the number of equations and the number of space dimensions. Note that the diffusive fluxes are connected to the gradients of via the functional relation , with the diffusivity tensor.

The state vector is approximated by a polynomial expansion with no continuity constraints imposed between adjacent elements, i.e. where


and is the order of polynomial approximation. The weak form of (3) follows from multiplying the PDE by the set of test functions in the same approximation space, integrating by parts, and coupling elements via numerical fluxes. The variational formulation for each element reads: find such that


for all . Here, is the sum of the inviscid and viscous flux functions, while is the numerical flux and . Note that the quantities and denote element interior and element neighbor quantities, respectively.

Uniqueness and local conservation of the solution are achieved by the use of proper numerical interface fluxes. The Roe Roe (1981) approximate Riemann solver is employed for the inviscid part , while the second form of Bassi and Rebay (BR2) Bassi et al. (1997) is employed for the viscous part, . Following BR2, the numerical viscous flux is given by


where, according to Brezzi et al. (2000); Arnold et al. (2002), the penalty factor must be greater than the number of faces of the elements. The auxiliary variable is determined from the jump of , via the solution of the following auxiliary problem:


At the boundary of the domain, the numerical flux function appearing in equation (5) must be consistent with the boundary conditions of the problem. In practice, this is accomplished by properly defining a boundary state which accounts for the boundary data and, together with the internal state, allows for the computation of the numerical fluxes and the lifting operator on the portion of the boundary , see Bassi et al. (1997); Bassi and Rebay (1997).

A system of ordinary differential equations for the degrees of freedom (DoFs) arising from Equation (

5) can be compactly written in the form


where is the block-diagonal spatial mass matrix, is the vector of the DoFs of the problem, and is the spatial residual vector.

3.2 Hybridizable discontinuous Galerkin

3.2.1 Mixed form

The second spatial discretization considered in this work is the HDG method in mixed form (HDG), see Fidkowski (2016)

. A system of first-order partial differential equations can be obtained from (

1) by introducing ,


The HDG discretization approximates the state and gradient variables as and , with defined in (4). An additional trace variable, , is defined on the faces, using the space


where is the space of polynomials of order on face . We remark that the trace variable is defined on the internal faces only, while a properly defined boundary value is used for the flux computation on .

The weak form is obtained by weighting the equations in (9) with appropriate test functions, integrating by parts, and using the interface variable for the face state. Consistent and stable numerical fluxes are required at the mesh element interfaces. The variational formulation reads: find , , , such that


for all , , . The third equation, which weakly imposes flux continuity across interior faces, is required to close the system. We remark that, when using the mixed form, the same theoretical convergence rate is observed for the state variable and the gradient variable . In diffusion-dominated regimes, this allows for a local post-processing of the state to a higher order Nguyen et al. (2009).

In HDG, the numerical flux function , which is the sum of the inviscid and viscous fluxes, is defined as


where is a stabilization term for both the inviscid and viscous parts of the flux. In this work is chosen in a Roe-like fashion as


while the viscous stabilization term is based on the BR2 scheme,


with the lifting operator applied to the jump , and the stabilization factor.

Similarly to DG, at the boundary of the domain, the numerical flux function is made consistent with the boundary conditions of the problem through the definition of a boundary state which accounts for the boundary data and, together with the internal state, allows for the computation of numerical fluxes and the lifting operator on the portion of the boundary .

Defining , and as the residual vectors arising from Equations (11), (12) and (13), the discretized system of nonlinear equations can be written as


where is the element-based mass matrix. The compact form of (17) can be written using the solution vector of the discrete unknowns, , and the concatenated vector of residuals ,


where the matrix is given by


3.2.2 Primal form

A variant of the mixed hybridizable discontinuous Galerkin method presented in Section 3.2.1 is the primal HDG (HDG) method and follows the work in Dahm (2017); Devloo et al. (2018). In HDG, the dual variable is eliminated by introducing the definition of the gradient in (12).

The HDG discretization approximates the variable , with defined in Equation (4). The trace variable is still employed for hybridization, and the variational formulation reads: find , such that


for all , . We note that (20)–(21) are not obtained by just substituting from (11)–(13). In fact the fourth term of (20) arises from the elimination of the variable . This term ensures symmetry and adjoint-consistency of the primal HDG discretization. This type of discretization, involving a smaller number of element-wise degrees of freedom than mixed HDG, does not suffer significantly from overhead costs of dealing with the gradients: eliminating the dual variable and adding the adjoint-consistency term typically results in a faster solver. We note that in this case, the gradients are of one order lower accuracy than the state variable . Numerical flux functions, stabilizing terms, and boundary condition enforcement are defined in the same manner as in mixed HDG.

Defining and as the residuals vectors arising from (20)–(21), the ODE system of equations can be written as


where is the elemental mass matrix. Therefore, the compact form of (22) is written using the solution vector of discrete unknowns, , and the concatenated vector of residuals, ,


where the matrix is given by


4 Temporal discretization

The temporal discretization used in this work is an explicit-first stage, singly-diagonal-implicit Runge–Kutta (ESDIRK) scheme. The general formulation of the scheme for (18) is


for where is the number of stages, and are the coefficients of the scheme, and is the time index. Within each stage, the solution of a non-linear system is required. This is performed by the Newton-Krylov method, which requires the solution of a sequence of linear systems within each stage. In this regard, the th Newton–Krylov iteration assumes the form


with . In this work the third-order ESDIRK3 scheme Bijl et al. (2002) is employed. The method involve three non-linear solutions, following an explicit first stage.

5 Linear system solution

The linear system of ODEs can be solved numerically using iterative solvers. To this end, we apply the generalized minimal residual (GMRES) method to the system in (26). The solution process differs between standard DG and HDG, as the latter discretization takes advantage of the introduction of face unknowns in order to reduce the size of the matrix to be allocated. This section provides details of the implementation.

5.1 Discontinuous Galerkin discretization

The linear system arising from a DG discretizations takes the following general form


where is the iteration matrix, is the vector of degrees of freedom updates, and is the right-hand side. The GMRES implementation can follow two approaches. The first one is denoted as matrix-based (MB) and consists of computing and storing the iteration matrix explicitly to perform matrix-vector products as needed within the iterative solution. A second, matrix-free (MF) approach takes advantage of the structure of the matrix vector products, which can be approximated using the matrix-free formula




and . The latter approach offers several advantages over the former, as the Jacobian matrix is no longer required to maintain the temporal accuracy of the solution. The GMRES solver still requires a preconditioning matrix, which generally needs to be stored. However, this matrix can be approximated or frozen for a certain number of iterations without losing the formal order of accuracy of the time integration scheme, thus providing an improvement in computational efficiency. For example, when using a block-Jacobi preconditioning approach, the memory footprint required for the Jacobian matrix can be one order of magnitude lower, as only the memory for the on-diagonal blocks needs to be allocated. The additional residual evaluation, which are required in (28), have a computational cost similar to a matrix-vector product for high-order polynomials. Further details can be found in previous work Franciolini et al. (2017); Crivellini and Bassi (2011).

5.2 Hybridizable discontinuous Galerkin discretization

5.2.1 Mixed form

The system in (26) can be conveniently arranged using the definition of element-interior and face DoFs. To this end, considering first the mixed form of the HDG discretization, it is convenient to define the following elemental block matrices at stage of the Newton-Krylov method


while the right hand side of Eq. (26) is obtained as


The full system of equations can be therefore written in the compact form as


5.2.2 Primal form

In the primal formulation, the elemental block matrices related to the gradient variables are no longer present in the linear system. Moreover, the right-hand side can be evaluated via the following equations


that do not depend anymore on the internal DoFs . The linear system for the primal HDG discretization assumes the form


5.3 Static condensation and back-solve

Considering the block structure of the matrices appearing in (32) and (34), the solution of the system can involve a smaller number of DoFs by statically condensing out the element-interior variables. Partitioning the matrix into element-interior and face components, , and similarly for the right-hand side vector, , the Schur-complement linear system reads


which assumes the same form as system (27) and can be solved using a GMRES algorithm. The definition of each block can be found from (32) and (34). The static condensation is an operation that involves matrix-matrix products for the iteration matrix, as well as matrix-vector products for the right-hand side. Fortunately, the compact structure of the residual Jacobian prevents us from having to allocate global matrices for the computation of the condensed matrix, i.e. the operations described in Eq. (35) are local to each element. In addition, the computation of can be performed in place. By doing so, we do not increase the memory footprint of the HDG implementation during the solve.

After the solution of (35), the interior states have to be recovered for the residual evaluation in the next time step. This operation is commonly referred to as the back solve and assumes the following form


Our implementation choice of assembling the condensed matrix on-the-fly requires re-evaluation of the inverse of the matrix in an element-wise fashion during the back-solve.

As a final remark for the two solvers, we point out that for both the mixed and primal form of HDG, memory allocation and time spent on the global solve are lower than that of a DG solver due to the smaller number of globally-coupled degrees of freedom at high orders. On the other hand, the inversion of the block-structured matrix of equation (35), although local to each element, increases the amount of element-wise operations.

6 Multigrid preconditioning

The use of a -multigrid strategy to precondition a flexible implementation Saad (1993) of GMRES is explored in the context of the spatial discretizations presented. The basic multigrid idea is to exploit iterative solvers to smooth-out high-frequency components of the error with respect to an unknown exact solution. Such iterative solvers are not sufficiently effective at damping low-frequency error components, and to this end, an iterative solution built using coarser problems can be useful to shift, via system projection, the low-frequency modes towards the upper side of the spectrum. This simple and effective strategy allows us to obtain satisfactory rates of convergence over the entire range of error frequencies.

In p-multigrid the coarse problems are built based on lower-order discretizations with respect to the original problem of degree . We consider coarse levels denoted by the index and indicate the fine and coarse levels with and , respectively. The polynomial degree of level is and the polynomial degrees of the coarse levels are chosen such that , with . These orders are used to build coarser linear systems . In order to avoid additional integration of the residuals and Jacobians on the coarse level, we employ subspace inheritance of the matrix operators assembled on the finest space to build the coarse space operators for both DG and HDG discretizations. This choice involves projections of the matrix operators and right hand sides, which are computed only once on the finest level. Compared to subspace non-inheritance, which requires the re-evaluation of the Jacobians in proper coarser-space discretizations of the problem, inheritance is cheaper in processing and memory. Although previous work has shown lower convergence rates when using such cheaper operators Antonietti et al. (2015); Botti et al. (2017); Franciolini et al. (2018), especially in the context of elliptic problems and incompressible flows, we found these operators sufficiently efficient for our target problems involving the compressible NS equations, as will be demonstrated in the results section.

In the context of standard discontinuous Galerkin discretizations, see for example Fidkowski et al. (2005); Diosady and Darmofal (2009); Shahbazi et al. (2009); Botti et al. (2017); Franciolini et al. (2018), the -multigrid approach has been thoroughly investigated and exploited in several ways, e.g. -, -, and -strategies. On the other hand, the use of -multigrid for HDG has not been as widely studied. See, for example, preliminary works Cockburn et al. (2014); Kronbichler and Wall (2018); Schütz and Aizinger (2017) related to this research area. The definition of the restriction and prolongation operators, as well as the coarse grid operators and right hand sides, is not straightforward when considering the statically condensed system. We will therefore first introduce the concept of subspace inheritance for a standard DG solver and then extend it to HDG.

We employ a full multigrid (FMG) -cycle solver, outlined in Algorithm 1. The FMG cycle constructs a good initial guess for a -cycle iteration, which starts on the fine space. To do so, the solution is initially obtained on the coarsest level (), and then prolongated to the next refined one (). At this point, a standard -cycle is called, such that an improved approximation of the solution can be used for the -cycle at level . This procedure is repeated until the -cycle on the finest level is completed.

1:  for  do
2:     if  then
4:        SOLVE
5:     else
9:     end if
10:  end for
11:  return  
Algorithm 1
1:  if  then
2:     SOLVE
3:  else
4:     =SMOOTH()
7:     =
9:     =SMOOTH()
10:  end if
11:  return  
Algorithm 2

The single -cycle is outlined in Algorithm 2. Starting from a level , the solution is initially smoothed using an iterative solver (SMOOTH). The residual of the solution, , is then computed and projected to the coarser level , where another -cycle is recursively called to obtain a coarse-grid correction . This quantity is prolongated to level and used to correct the solution to be smoothed again. When the coarsest level is reached, the problem is solved with a larger number of iterations to decrease as much as possible the solution error at a low computational cost.

In this work the smoothers consist of preconditioned GMRES solvers. Any combination of single grid preconditioners can be coupled with an iterative solver to properly operate as a smoother in a multigrid strategy. Devising a methodology to optimally set the multigrid preconditioner is beyond the scope of the present work. However, previous work Franciolini et al. (2018) has shown that an optimal and scalable solver can be obtained using an aggressive preconditioner on the coarsest space discretization, where the factorization of the matrix can be performed at a low computational cost, and the system has to be solved with a higher accuracy. On the other hand, cheaper operators can be used on the finest levels of the discretization, where the systems need not be solved to a high degree of accuracy. In the following subsections, the details are given about how the matrices and vectors are restricted and prolongated between multigrid levels.

6.1 DG subspace inheritance

Let us define a sequence of approximation spaces on the same triangulation , being a multigrid level, with and the number of coarse levels. Note that denotes the coarsest space. In our -multigrid setting, each approximation space is defined similarly to (4), but using polynomials, with .

In this context, the prolongation operator can be defined as such that


Similarly, the restriction operator can be defined as the projection such that


Such a definition can be extended to operate on vector functions component-wise, i.e. .

Regarding coarse-space matrices, discrete matrix operators , with , and the number of DoFs on level , have to be considered to inherit the fine-space iteration matrix . This matrix can be restricted up to level via . Fortunately, the explicit assembly of the operators can be avoided and the projection can be applied for each matrix block of size , which assumes the following form




Here, denotes the set of basis functions defined in element of order . We point out that, thanks to the use of basis functions defined in a reference element, the projection operators are identical for each element and pair of polynomial orders, and they can be used in the same way for the on-diagonal and off-diagonal blocks of the iteration matrix. As the prolongation operators can be obtained from the restriction by the transpose, , the method requires the allocation of only matrices of size for each different type of element, which is inexpensive from a memory footprint viewpoint.

6.2 HDG subspace approximate-inheritance

In HDG, the globally-coupled unknowns are those related to the face DoFs, and the iteration matrix is obtained through static condensation, see Equation (35), which allows us to solve the system for the face unknowns only. Theoretically speaking, for element-interior degrees of freedom, the same operators of Section 6.1 can be employed, while those for faces degrees of freedom can be obtained through similar considerations. In this case, the sequence of approximation spaces is properly defined on the interior mesh element faces, , with a multigrid level. To this end, we define . The prolongation operator is now , defined by


The restriction operator is defined as such that


These definitions can also be extended to operate on vector functions and are assumed to act component-wise.

Applying the same subspace-inheritance idea used for DG, one obtains the coarse space condensed HDG matrix and right hand side through the application of element-interior and face DoF projections,


We point out that this operation involves the application of mixed element-interior and face degrees of freedom Galerkin projections prior to the static condensation of the system. Thus, it comes with an increased operation count compared to DG subspace inheritance. To minimize the number of operations involved in the projection, we introduce an approximate-inherited approach where the coarse space matrices and right-hand sides are obtained by simply applying face projections to the condensed matrices and vectors on the finest space. In other words, we obtain and through


This coarse space matrix and right-hand side will in general differ from those of Equation (43) and (44), as the element-interior operator in the Schur complement term is projected differently.

Similar considerations have to be made for the residual evaluation on the coarse levels, which are required in the projection from level to . The residual on a level should be computed as


and this requires the evaluation of the element-interior degrees of freedom from the face unknowns. This operation would require a back-solve on the coarse levels, which increases the amount of operations within each multigrid cycle. To reduce the operation count as much as possible we again rely on the computation of an approximate projection that is evaluated using the working variable only:


Despite the use of those approximations compared to DG subspace inheritance, such a strategy exhibits very good performance in HDG solutions of the compressible Navier–Stokes equations, as will be demonstrated in the results section.

Similarly to DG, the global assembly of the projection matrix for face degrees of freedom is not strictly required for HDG, since the projection is applied to each block of the matrix of size , with related to face degrees of freedom on level . The projection of each block of the iteration matrix assumes the form




In this case denotes the set of basis functions defined in the element face of order . Thanks to the use of basis functions defined in the reference element of the space discretization, similar considerations to those related to element-interior degrees of freedom projection hold true.

6.3 Preconditioning options and memory footprint

In this work we exploit single-grid preconditioners both for benchmarking and to precondition the smoothers of the multigrid strategy. Two operators will be considered in this work. The first and simplest one will be here labelled as element-wise block-Jacobi (BJ). The BJ preconditioner extracts the block-diagonal portion of the iteration matrix and factorize it, in a local-to-each element fashion, using the PLU factorization. This cheap and memory saving preconditioner becomes more effective as the time step size of the discretization decreases, i.e. the matrix becomes more diagonally dominant and the condition number decreases. In addition, the BJ preconditioner is applied in the same way in serial and parallel computations.

The second preconditioner is the incomplete lower-upper factorization with zero-fill, ILU(0). In particular, the minimum discarded fill reordering proposed in Persson and Peraire (2008) is employed. The algorithm showed to be suited for stiff spatial discretizations, and it allows an in-place factorization Diosady and Darmofal (2009). When it is applied in parallel, the ILU(0) is performed on each square, partition-wise block of the iteration matrix. In this case this preconditioner will be labelled as block-ILU(0) (BILU). As a natural downside, BILU loses the preconditioning efficiency when it is applied in parallel. A variant that compensate this effect is the Additive Schwarz method, which extends the partition-wise block of the Jacobian with a number of overlapping elements between the mesh partition. This algorithm, employed in the context of incompressible Navier–Stokes equations, increases the memory footprint of the solver when a few elements per partition is used Franciolini et al. (2018). We do not use such an approach in the present study of compressible flow.

(a) Full-order basis functions
(b) Tensor-product basis functions
Figure 2: Allocated number of non-zeros (NNZ) non-dimensionalized by the memory allocation of the DG Jacobian matrix. DG matrix-free solvers compared to HDG for different values of polynomial orders and preconditioning type. -multigrid preconditioning (MG) assumed to be that of Table 4.

We now provide estimates of the memory footprint of the solver as well as the computational time spent evaluating the matrix operators. It is worth pointing out that, for DG, the matrix assembly time, as well as its operation count, is a function of the number of non-zeros of the matrix itself, while HDG involves the overhead costs of the static condensation and back solve. Considering a square, two-dimensional, bi-periodic domain made of quadrilateral elements, we obtain the results shown in Figure 

2, where the number of non-zeros (NNZ), non-dimensionalized by the number of nonzeros of the Jacobian arising from the DG discretization, is reported as a function of the number of elements per partition, . It can be observed that:

  1. The memory footprint of DG, matrix-based as well as HDG solvers is always equal to that of a Jacobian matrix, and this value is a function of the polynomial order for HDG. This is due to the fact that the preconditioner is always evaluated using the same memory as the Jacobian matrix.

  2. For a matrix-free, DG discretization, the allocation involves only the preconditioner operator. When BJ is considered, NNZ reduces by 80% with respect to the allocation of a full DG Jacobian. As , the NNZ of the BILU solver approaches that of the element-wise block Jacobi, while for it tends to be that of a Jacobian matrix. This is due to the fact that as the domain is partitioned, the ILU(0) factorization is performed in the squared, partition-wise block of the iteration matrix and therefore, in a matrix-free fashion, the off-partition blocks can be neglected during the assembly phase.

  3. The -multigrid (MG) matrix-free preconditioning approach applied to a DG discretization, here assumed to be that of Table 4, requires a memory footprint in line with that of an element-wise block Jacobi method, as already observed in Franciolini et al. (2018). In fact, when using lower-order polynomial spaces with , the size of those matrices is considerably smaller than that of the finest space since they scale with .

  4. For HDG, only for high-order polynomials is NNZ reduced with respect to the iteration matrix of a DG method. For , a memory footprint in line with that of a BJ, matrix-free approach is observed, while for the memory is considerably larger. It is worth pointing out that the use of full-order basis functions (Figure 2(a)) and tensor-product basis functions (Figure 2(b)) only affects NNZ ratio in the HDG case: in particular, the memory footprint reduction when using tensor-product basis is larger due to the higher amount of element-wise unknowns compared to internal face unknowns.

  5. The NNZ ratio for HDG increases when . The reason for this is that the face-to-elements ratio within the computational mesh of the domain partition increases.

Finally, it is important to remark that for a matrix-free iterative solver employed in DG contexts, it is possible to optimize the matrix assembly evaluation to compute only the blocks required by the preconditioner. For example, for BJ, the evaluation of the off-diagonal blocks of the Jacobian can be neglected, while for MG matrix-free with BJ on the finest space, the off-diagonal blocks could be computed at a reduced polynomial order consistent with that of the coarser spaces.

7 Numerical results on a model test case

We present numerical experiments to assess the performance of the HDG discretizations in comparison to DG. First, NS solutions of a vortex transported by uniform flow at and are reported. The objective is i) to show the convergence rates of the solver both in space and time; ii) to investigate the effects of grid refinement for the approximate-inherited approach proposed for HDG, providing mesh-independent convergence rates; and iii) compare the effects of polynomial order, time step size and space discretization on the parallel performance of the solution strategy.

7.1 Test case description

The test case is a modified version of the VI1 case studied in the 5th International Workshop on High Order CFD Methods 5HH (2018), and consists of a two-dimensional mesh on the domain with periodic boundary conditions on each side. The flow initialisation involves the definition of the following state