A domain-specific language for the hybridization and static condensation of finite element methods

02/01/2018 ∙ by Thomas H. Gibson, et al. ∙ Imperial College London 0

In this paper, we introduce a domain-specific language (DSL) for concisely expressing localized linear algebra on finite element tensors, and its integration within a code-generation framework. This DSL is general enough to facilitate the automatic generation of cell-local linear algebra kernels necessary for the implementation of static condensation methods and local solvers for a variety of problems. We demonstrate its use for the static condensation of continuous Galerkin problems, and systems arising from hybridizing a finite element discretization. Additionally, we demonstrate how this DSL can be used to implement local post-processing techniques to achieve superconvergent approximation to mixed problems. Finally, we show that these hybridization and static condensation procedures can act as effective preconditioners for mixed problems. We use the DSL in this paper to implement high-level preconditioning interfaces for the hybridization of mixed problems, as well generic static condensation. Our implementation builds on the solver composability of the PETSc library by providing reduced operators, which are obtained from locally assembled expressions, with the necessary context to specify full solver configurations on the resulting linear systems. We present some examples for model second-order elliptic problems, including a new hybridization preconditioner for the linearized system in a nonlinear method for a simplified atmospheric model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 35

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The development of simulation software is an increasingly important aspect of modern scientific computing. Such software requires a vast range of knowledge spanning several disciplines, ranging from abstract mathematics to high-performance computing and low-level code optimization. Software projects developing automatic code generation systems have become quite popular in recent years, as such systems help create a separation of concerns which focuses on a particular complexity independent from the rest. Examples of such projects include FreeFEM++ (Hecht, 2012), Sundance (Long et al., 2010), the FEniCS Project (Logg et al., 2012), Feel++ (Prud’Homme et al., 2012), and Firedrake (Rathgeber et al., 2016).

The finite element method (FEM) is a mathematically robust framework for computing solutions of partial differential equations (PDEs), with a formulation that is highly amenable to code-generation techniques. A description of the weak formulation of the PDEs, together with appropriate discrete function spaces, is enough to characterize the finite element problem. Both the FEniCS and Firedrake projects employ the Unified Form Language (UFL) (Alnæs et al., 2014) to specify the finite element integral forms and discrete spaces necessary to properly define the finite element problem. UFL is a highly expressive domain-specific language (DSL) embedded in Python, which provides the necessary abstractions for code generation systems.

There are classes of finite element discretizations resulting in discrete systems that can be solved more efficiently by directly manipulating local tensors. For example, the static condensation technique for the reduction of global finite element systems (Guyan, 1965; Irons, 1965)

produces smaller globally-coupled linear systems by eliminating interior unknowns to arrive at an equation for the facet degrees of freedom only. Alternatively, hybridized finite element methods

(Arnold and Brezzi, 1985; Brezzi and Fortin, 2012; Cockburn et al., 2009a) introduce Lagrange multipliers enforcing certain continuity constraints on finite element functions. Static condensation can then be applied to the augmented system to produce a reduced equation for the multipliers. Methods of this type are often accompanied by local post-processing techniques that produce superconvergent approximations (Bramble and Xu, 1989; Cockburn et al., 2010a, 2009b). These procedures often require invasive manual intervention during the equation assembly process in intricate numerical code.

In this paper, we provide a simple yet effective high-level abstraction for localized dense linear algebra on systems derived from finite element problems. Using embedded DSL technology, we provide a means to enable the rapid development of hybridization and static condensation techniques within an automatic code-generation framework. In other words, the main contribution of this paper is in solving the problem of automatically translating from the mathematics of static condensation and hybridization to compiled code. Our work is implemented in the Firedrake finite element framework (Rathgeber et al., 2016) and the PETSc (Balay et al., 1997, 2016) solver library, accessed via petsc4py (Dalcin et al., 2011).

The rest of the paper is organized as follows. We introduce common notation used throughout the paper in Section 1.1. The embedded DSL, Slate, is introduced in Section 2, which allows concise expression of localized linear algebra operations on finite element tensors. We provide some contextual examples for static condensation and hybridization in Section 3, including a discussion on post-processing. We then outline in Section 4 how, by interpreting static condensation techniques as a preconditioner, we can use Slate to automate solution techniques employing hybridization and static condensation. We demonstrate our implementations for manufactured test problems in Section 5. Section 5.3 illustrates the composability of our implementation of a hybridized mixed method as a preconditioner for the resulting linearized system of a semi-implicit discretization of the rotating shallow water equations. Conclusions follow in Section 6.

1.1. Notation

Let denote a tessellation of , the computational domain, consisting of polygonal elements associated with a mesh size parameter , and the set of facets of . The set of facets interior to the domain is denoted by . Similarly, we denote the set of exterior facets as simply . For brevity, we denote the finite element integral forms over and any facet set by

(1)
(2)

where

should be interpreted as standard multiplication for scalar functions or a dot product for vector functions.

For any double-valued vector field on a facet , we define the jump of its normal component across by

(3)

where and denotes the positive and negative sides of the facet respectively. Here, and are the unit normal vectors with respect to the positive and negative sides of the facet . Whenever the facet domain is clear by the context, we omit the subscripts for brevity and simply write .

2. A system for localized algebra on finite element tensors

We present an expressive language for dense linear algebra on the elemental matrix systems arising from finite element problems. The language, which we call Slate, inherits typical mathematical operations performed on matrices and vectors, hence the input syntax is comparable to high-level linear algebra software such as MATLAB. The Slate language provides basic abstract building blocks which can be used by a specialized compiler for linear algebra to generate low-level code implementations.

Slate is heavily influenced by the Unified Form Language (UFL) (Alnæs et al., 2014; Logg et al., 2012), a DSL embedded in Python which provides symbolic representations of finite element forms. The expressions can be compiled by a form compiler, which translates UFL into low level code for the local assembly of a form over the cells and facets of a mesh. In a similar manner, Slate expressions are compiled to low level code that performs the requested linear algebra element-wise on a mesh.

2.1. An overview of Slate

To clarify conventions and the scope of Slate, we start by considering a general form. Suppose we have a finite element form:

(4)

where and denote appropriate integration measures. The integral form in (4) is uniquely determined by its lists (possibly of 0-length) of arbitrary coefficient functions in the associated finite element spaces, arguments describing any test or trial functions, and its integrand expressions for each integral type: , , . The form describes a finite element form globally over the entire problem domain. The contribution of (4) in each cell of the mesh is simply

(5)

We call (5) the cell-local contribution of . Equation (5) produces an element tensor which is mapped into a global data structure. However, one may want to produce a new local tensor by algebraically manipulating different element tensors. This is precisely the job of Slate.

The Slate language consists of two primary abstractions for linear algebra: (1) terminal element tensors corresponding to multi-linear integral forms, or assembled data; and (2) expressions consisting of operations on terminal tensors or existing Slate expressions.

Terminal tensors:

In Slate, one associates a tensor with data on an element either by using a form, or assembled coefficient data:

  • Tensor()
    associates a form, expressed in UFL, with its local element tensor:

    (6)

    The number of arguments determine the rank of Tensor, i.e. scalars, vectors, and matrices are produced from 0-forms, 1-forms, and 2-forms111As with UFL, Slate is capable of abstractly representing arbitrary rank tensors. However, only rank tensors are typically used in most finite element applications and therefore we currently only generate code for those ranks. respectively.

  • AssembledVector()
    where is some finite element function. The result associates a function with its local coefficient vectors.

Symbolic linear algebra:

Slate supports typical linear algebra operations with a high-level syntax close to mathematics:

  • A + B, the addition of two equal shaped tensors.

  • A * B, a contraction over the last index of A and the first index of B.

  • -A, the additive inverse (negative) of a tensor.

  • A.T, the transpose of a tensor.

  • A.inv, the inverse of a square tensor.

  • A.solve(B, decomposition="..."), the result of solving for , optionally specifying a factorization strategy.

  • A.blocks[indices], where A is a tensor from a mixed finite element space, allows extraction of subblocks of the tensor indexed by field (slices are allowed). For example, if a matrix corresponds to the bilinear form , where and are product spaces consisting of finite element spaces , , then the cell-local tensors have the form:

    (7)

    The associated submatrix of (7) with indices , , , is

    (8)

    where , .

These building blocks may be arbitrarily composed, giving us the necessary algebraic framework for a large class of problems, some of which we present in this paper.

Slate expressions are handled by a linear algebra compiler, which employs TSFC to compile kernels for the assembly of terminal tensors and generates a dense linear algebra kernel to be iterated cell-wise. Our compiler generates C++ code, using the templated library Eigen (Guennebaud et al., 2015) for dense linear algebra. During execution, the local computations in each cell are mapped into global data objects via appropriate indirection mappings. Figure 1 provides an illustration of the complete tool-chain.

Figure 1. The Slate language wraps forms, expressed in UFL, describing the finite element problem. The Slate expressions are handed over to a specialized linear algebra compiler, which produces a single “macro” kernel which assembles the local element contributions, and then performs the dense linear algebra. The resulting kernels are passed to the PyOP2 interface, which wraps the Slate kernel in a mesh-iteration kernel. Parallel scheduling and code generation occurs after the PyOP2 layer.

3. Examples

We now present a few examples and discuss solution methods which require element-wise manipulations of finite element systems and their specification in Slate. We stress here that Slate is not limited to these model problems; rather these examples were chosen for clarity and to demonstrate key features of the Slate language. In Sections 4 and 5, we discuss more intricate ways the Slate DSL is used in custom preconditioners for linear systems.

3.1. Hybridized discontinuous Galerkin methods

For our model problem, consider the second-order elliptic PDE:

(9)
(10)
(11)

where and , are positive-valued coefficients. Rewriting as a first order system, we have the following mixed problem:

(12)
(13)
(14)
(15)

where and .

The hybridized discontinuous Galerkin (HDG) method is a natural extension of discontinuous Galerkin (DG) discretizations. Here, we consider a specific HDG discretization, LDG-H (Cockburn et al., 2010a). Other forms of HDG that involve local lifting operators can also be implemented in this software framework by the introduction of additional local (i.e., discontinuous) variables in the definition of the local solver.

To construct the LDG-H discretization, we define the DG numerical fluxes and to be functions of the trial unknowns and a new independent unknown in the trace space:

(16)

where denotes a polynomial space defined on each facet . The numerical fluxes are:

(17)
(18)

where is a function approximating the trace of on and is a positive function that may vary on each facet . The full LDG-H formulation reads as follows. Find such that

(19)
(20)
(21)
(22)

Equation (21) enforces continuity of the numerical flux on , which in turn produces a flux that is single-valued on the facets. Note that the choice of has a significant influence on the expected convergence rates of the computed solutions. We use this subtle dependency to verify our implementation in Section 5.1.

The matrix system arising from (19)–(22) has the general form:

(23)

The LDG-H system has more degrees of freedom than its DG counterpart, but its matrix-form exhibits some immediate advantages.

  • By our choice of function spaces, the operator coupling and is block-sparse. We can therefore simultaneously eliminate the coupled unknowns by performing element-wise static condensation to (23), producing a significantly smaller problem for only:

    (24)

    where and are given by

    (25)
    (26)
  • The matrix is sparse, symmetric, and positive-definite (Cockburn et al., 2009a). Furthermore, is a discrete elliptic operator and therefore we can solve (24) using existing techniques for global elliptic equations.

  • Once is computed, both and can be recovered locally in each element. This can be accomplished in two stages. First we compute by solving:

    (27)

    followed by:

    (28)
  • The solution approximations can further be improved via local post-processing. We highlight a few procedures in Section 3.3.

Listing 1 displays the corresponding Slate code for assembling the trace system, solving (24), and recovering the eliminated unknowns.

1# Element tensors defining the local 3-by-3 block system
2_A = Tensor(a)
3_F = Tensor(L)
4
5# Extracting blocks to form the reduced system
6A = _A.blocks
7F = _F.blocks
8S = A[2, 2] - A[2, :2] * A[:2, :2].inv * A[:2, 2]
9E = F[2] - A[2, :2] * A[:2, :2].inv * F[:2]
10
11# Assemble and solve: 
12Smat = assemble(S, bcs=[...])
13Evec = assemble(E)
14lambda_h = Function(M)
15solve(Smat, lambda_h, Evec)
16
17p_h = Function(V)                              # Function to store the result: 
18u_h = Function(U)                              # Function to store the result: 
19
20# Intermediate expressions
21Sd = A[1, 1] - A[1, 0] * A[0, 0].inv * A[0, 1]
22Sl = A[1, 2] - A[1, 0] * A[0, 0].inv * A[0, 2]
23Lambda = AssembledVector(lambda_h)             # Coefficient vector for 
24P = AssembledVector(p_h)                       # Coefficient vector for 
25
26# Local solve expressions for  and 
27p_sys = Sd.solve(F[1] - A[1, 0] * A[0, 0].inv * F[0] - Sl * Lambda,
28                decomposition="PartialPivLu")
29u_sys = A[0, 0].solve(F[0] - A[0, 1] * P - A[0, 2] * Lambda,
30                     decomposition="PartialPivLu")
31assemble(p_sys, p_h)
32assemble(u_sys, u_h)
Listing 1: Slate code for solving (23), given UFL expressions a, L for (19)–(21). Arguments of the mixed space are indexed by 0, 1, and 2 respectively. Lines 1 and 1 correspond with (25) and (26) for building the trace system. Any vanishing conditions on the trace variables should be provided as boundary conditions during operator assembly. Lines 1 and 1 correspond to the expressions given in (27) and (28) (using LU).

3.2. Hybridized mixed methods

To motivate the hybridization of mixed methods, we start by recalling the mixed method for the first order system (12)–(15). Standard methods seek a solution in the finite dimensional spaces:

(29)
(30)

respectively. The space consists of -conforming piecewise vector polynomials, where choices of typically include the Raviart-Thomas (RT), Brezzi-Douglas-Marini (BDM), or Brezzi-Douglas-Fortin-Marini (BDFM) elements (Brezzi et al., 1987, 1985; Nédélec, 1980; Raviart and Thomas, 1977).

The mixed finite element formulation of (12)–(15) reads as follows: find satisfying

(31)
(32)

where is the space of functions in whose normal components vanish on . With and denoting the vector of degrees of freedom for and respectively, computing the solution of (31)–(32) requires solving the saddle point system:

(33)

Methods to efficiently invert such systems include -multigrid (Arnold et al., 2000) (requiring complex overlapping-Schwarz smoothers), global Schur complement factorizations (which require an approximation to the inverse of the elliptic Schur complement operator), or auxiliary space multigrid (Hiptmair and Xu, 2007). Here, we focus on a solution approach using a hybridized mixed method (Arnold and Brezzi, 1985; Brezzi and Fortin, 2012).

For hybridized mixed methods, the discrete solution spaces are for and for , where the superscript denotes a discontinuous approximation to in the space

(34)

The vector finite element space is a subspace of consisting of local functions, but normal components are no longer continuous on . Lagrange multipliers enforcing normal continuity are introduced as an auxiliary variable in the trace space .

Deriving the hybridized mixed system is accomplished through integration by parts over each element . Testing with and integrating (12) over produces:

(35)

The trace function is introduced in surface integrals approximating on elemental boundaries. An additional constraint equation is added to close the system. The global hybrid-mixed formulation reads: find such that

(36)
(37)
(38)

where denotes the space of traces vanishing on .

Arising from the hybridized-mixed formulation, we have the augmented system of linear equations with the general form:

(39)

where is a mixed vector of the form , and , are the vectors of degrees of freedom for the flux, scalar, and trace unknowns. Equation (39) defines a 33 block system which is block-sparse by our choices of , , and . If the space of Lagrange multipliers is chosen appropriately, then the flux , albeit sought a priori in a discontinuous space, will coincide with its -conforming counterpart . The constraint in (38) enforces both continuity of the normal components of across elemental boundaries, as well as the Neumann condition on . As a result, the formulations in (36)–(37) and (31)–(32) are solving equivalent problems (Arnold and Brezzi, 1985).

The system in (39) defines a three-field problem similar to that of the HDG method in (23), and hence possesses the same properties previously discussed. The solution approach is nearly identical, including the Slate code. A few simplifications are present in the expressions for building the trace system and local solvers (). The resulting matrix for the -system also defines an elliptic operator. For the interested reader, a unifying framework for the hybridization of mixed and DG methods is detailed by Cockburn et al. (2009a).

3.3. Local post-processing

Local post-processing techniques for the construction of superconvergent approximations were discussed within the context of mixed methods (Arnold and Brezzi, 1985; Bramble and Xu, 1989; Stenberg, 1991), and discontinuous Galerkin methods (Cockburn et al., 2010a, 2009b). Here, we present two post-processing techniques for producing local solutions of higher approximation order. The Slate code follows naturally from previous discussions in Sections 3.1 and 3.2, using the standard set of operations on local tensors.

3.3.1. Post-processing of the scalar solution

There are number of post-processing techniques for enhancing the accuracy of the scalar approximation of the hybridized mixed method and its DG variant. We present a modified version of the procedure presented by Stenberg (1991), and highlighted within the context of hybridizing eigenproblems by Cockburn et al. (2010b).

Let denote a polynomial space of degree on an element . Then for a given pair of computed solutions of the hybridized methods, we define the post-processed scalar as the unique solution of the local problem:

(40)
(41)

where . Here, the space denotes the -orthogonal complement of . This post-processing method directly uses the definition of the flux to construct the local problem above. In practice, the space may be constructed using an orthogonal hierarchical basis, and solving (40)–(41) amounts to inverting a local symmetric positive definite system.

At the time of this work, Firedrake does not support the construction of such a finite element basis. However, we can introduce Lagrange multipliers to enforce the orthogonality constraint. The resulting local problem then becomes the following mixed system: find such that

(42)
(43)

where . The local problems (42)–(43) and (40)–(41) are equivalent, with the Lagrange multiplier enforcing orthogonality of test functions in with functions in .

This post-processing method produces a new approximation which superconverges at a rate of for hybridized mixed methods (Arnold and Brezzi, 1985; Cockburn et al., 2010b; Stenberg, 1991). For the LDG-H method, superconvergence is achieved when and , but only convergence is achieved when (Cockburn et al., 2010a, 2009b). We use this dependency in the convergence rates to verify our software implementation in Section 5.1.

3.3.2. Post-processing of the flux

To post-process the flux in the LDG-H method, we use the numerical trace from (17). The technique we outline here follows that of Cockburn et al. (2009b), and produces a new flux with better conservation properties.

Let be a mesh consisting of simplices. On each element , we define a new function to be the unique element of the local Raviart-Thomas space satisfying

(44)
(45)

This local problem produces a new flux with the following properties:

  1. converges at the same rate as for all choices of producing a solvable system for (19)–(21). However,

  2. . That is,

    (46)
  3. Furthermore, the divergence of convergences at a rate of .

4. Static condensation as a preconditioner

Slate enables static condensation approaches to be expressed very concisely. Nonetheless, application of a particular approach to different variational problems using Slate still produces a certain amount of code repetition. By formulating each form of static condensation as a preconditioner, code can be written once and then applied to any mathematically suitable problem. Rather than writing the static condensation by hand, in many cases, it is sufficient to just select the appropriate, Slate-based, preconditioner.

To understand how we can view static condensation as a preconditioner, it is helpful to frame the problem in the particular context of the solver library. Firedrake uses PETSc to provide linear solvers, and we implement our preconditioners as PETSc PC objects. These are defined to act on the problem residual, and return a correction to the solution. We can think of (left) preconditioning the matrix equation in residual form:

(47)

by an operator (which may not necessarily be linear) as a transformation into an equivalent system of the form

(48)

Given a current iterate the residual at the -th iteration is simply , and acts on the residual to produce an approximation to the error . If is an application of an exact inverse, the residual is converted into an exact (up to numerical round-off) error.

We will denote the application of particular Krylov method for the linear system (47) as . Upon preconditioning the system via as in (48), we write

(49)

If (49) is solved directly via the application of , then . So, we have that produces the exact solution of (47) in a single iteration of .

4.1. Interfacing with PETSc via custom preconditioners

The implementation of preconditioners for these systems requires manipulation not of assembled matrices, but rather their symbolic representation. To do this, we use the preconditioning infrastructure developed by Kirby and Mitchell (2018), which gives preconditioners written in Python access to the symbolic problem description. We manipulate this appropriately and provide operators assembled from Slate expressions to PETSc for further algebraic preconditioning. Using this technique, we have developed a static condensation interface for the hybridization of mixed problems, and a generic interface for hybridized systems (both hybridized mixed and HDG methods). The advantage of writing even the latter as a preconditioner is the ability to switch out the solution scheme for the system, even when nested inside a larger set of coupled equations.

4.1.1. A static condensation interface for hybridization

One of the main advantages of using a hybridized variant of a DG or mixed method is that such systems permit the use of element-wise condensation and recovery. To facilitate this, we provide a PETSc PC static condensation interface, HybridSCPC. This preconditioner takes the discretized hybridized system as in (23), and performs the local elimination and recovery procedures.

More precisely, the incoming system has the form:

(50)

where . Note that in many applications, , however our implementation does not assume symmetry of the off-diagonal operators. Equation (50) defines a 33 block system of equations. In exact arithmetic, the HybridSCPC preconditioner applies the inverse of the Schur complement factorization:

(51)

where is the Schur complement operator for the system. The only globally coupled system that needs an iterative solver is the reduced problem for the Lagrange multipliers:

(52)

where is the trace right-hand side, and is another possible choice of preconditioner. Once is computed, the flux and scalar fields are reconstructed element-wise via inverting the local systems in (27) and (28).

4.1.2. Preconditioning mixed methods via hybridization

The preconditioner HybridizationPC expands on the previous one, this time taking an system and automatically forming the hybridized problem. This is accomplished through manipulating the UFL objects representing the discretized PDE. This includes replacing argument spaces with their discontinuous counterparts, introducing test functions on an appropriate trace space, and providing operators assembled from Slate expressions.

More precisely, let , where , , and and are the flux and scalar unknowns respectively, be the mixed saddle point problem. Then this preconditioner replaces with the augmented system:

(53)

where , , are the right-hand sides for the flux and scalar equations respectively, and indicates modified matrices and co-vectors with discontinuous functions. Here, are the new discontinuous unknowns.

The preconditioning operator for the hybrid-mixed system (53) has the form:

(54)

where is the Schur complement matrix . As before, a single globally coupled system for is required. The recovery of and happens in the same manner as the HybridSCPC.

Since the flux is constructed in a discontinuous space , we must project the computed solution into . This can be done cheaply via local facet averaging. The resulting solution is then updated via , where is the projection mapping. This ensures the residual for the original mixed problem is properly evaluated to test for convergence. With as in (54), the preconditioning operator for the original system is:

(55)

We note here that assembly of the right-hand for the system requires special attention. The situation we are given is that we have for , but require for . For consistency, we also require for any that

(56)

We can construct such a satisfying (56) using the local definition:

(57)

where is the number of cells that the degree of freedom corresponding to the basis function touches. For -elements, for facet nodes, and 1 otherwise. By construction of the space , we have for :

(58)

where , and are functions corresponding to the positive and negative restrictions associated with the -th facet node222 These are the two “broken” parts of on a particular facet connecting two elements. That is, for two adjacent cells, a basis function in for a particular facet node can be decomposed into two basis functions in defined on their respective sides of the facet.. Using (57) and (58), we can verify that our construction of satisfies (56).

5. Numerical results

We now present results utilizing the Slate DSL and our static condensation interfaces for a set of test problems. All parallel results were obtained on a single fully-loaded compute node of dual-socket Intel E5-2630v4 (Xeon) processors with cores (2 threads per core) running at 2.2GHz. In order to avoid potential memory effects due to the operating system migrating processes between sockets, we pin MPI processes to cores.

5.1. Hybridized methods and effects of post-processing

To verify our computed results, we now perform a simple convergence study for a model Dirichlet problem. We seek a solution to the Poisson equation as a first-order system:

(59)

where and are chosen so that the analytic solution is the sinusoid and its negative gradient. We solve this problem by hybridizing the mixed formulation of (59), and employ our static condensation preconditioner described in Section 4.1.1. All results were obtained in serial, with MUMPS providing the LU factorization algorithms for the condensed trace system (Amestoy et al., 2000).

Each mesh in our convergence study is obtained by generating a quadrilateral mesh with cells in each spatial direction, and dividing each quadrilateral cell into two equal simplicial elements. Once the solutions are obtained, we compute a post-processed scalar solution using the method described in (42)–(43) via Slate-generated kernels. Figure 2 displays the results for the hybridized RT method. Our computations are in full agreement with the theory.

Figure 2. Error convergence rates for our implementation of the hybridized RT method of orders 0, 1, 2, and 3. We observe the expected rates for the scalar and flux solutions of the standard RT method: in the -error for both the scalar and flux approximations. Additionally, we see the effects of post-processing the scalar solution, yielding superconvergent rates.

We repeat this experiment for the LDG-H method with varying choices of in order to verify how influences the convergence rates, comparing with the expected rates for the LDG-H method given a particular order of (see Table 1 for a summary). In all our experiments, we use the post-processing methods described in Section 3.3 to produce approximations and . Error convergence plots from our tests are shown in Figure 3 that confirm the expected rates. This rather sensitive test verifies that our software framework is generating correct code.

parameter expected rates of convergence ()
Table 1. The expected convergence rates of the LDG-H method with a stability parameter of a particular order.
Figure 3. Error convergence rates for our implementation of the LDG-H method with and . The expected sensitivity of this discretization subject to appropriate choices of stabilization parameter is verified. We see no change in the convergence rates between the scalar and post-processed scalar solutions when . Superconvergence is achieved when taking . The post-processed flux rates in both cases match the rates of the unprocessed flux.

5.2. Relative performance of the HDG method

Having verified our generated code for the HDG method in the previous section, we now analyze the quality of said code for a model three-dimensional elliptic system:

(60)

where and are chosen such that the analytic solution is . We use a regular mesh consisting tetrahedral elements (). At the time of this work, Firedrake does not support sum-factorized algorithms on simplices. We therefore restrict ourselves to “low” polynomial order, only considering approximation degrees for the HDG method. Additionally, we compute a post-processed scalar approximation of the HDG solution. In all numerical studies here, we set the HDG parameter . All results were computed in parallel, utilizing a single compute node (described previously).

A continuous Galerkin (CG) discretization serves as a reference for this experiment. Due to the superconvergence in the post-processed solution for the HDG method, we use CG discretizations of polynomial order 2, 3, and 4. This takes into account the enhanced accuracy of the HDG solution, despite being initially computed as a lower-order approximation. We therefore expect both methods to produce equally accurate solutions to the model problem.

Our aim here is not to compare the performance of HDG and CG, which has been investigated elsewhere (for example, see Kirby et al. (2012); Yakovlev et al. (2016)). Instead, we provide a reference that the reader might be more familiar with in order to evaluate whether our software framework produces a sufficiently performant HDG implementation relative to what might be expected.

For the CG discretization, we use a matrix-explicit iterative solver consisting of the conjugate gradient method preconditioned with hypre’s boomerAMG implementation of algebraic multigrid (AMG) (Falgout et al., 2006). We use the preconditioner described in Section 4.1.1 to statically condense the LDG-H system, using the same iterative solver as the CG method for the Lagrange multipliers. This is sensible, as the global trace operator defines a symmetric positive-definite operator. To avoid over-solving, we iterate to a relative tolerance such that the discretization error is minimal for a given mesh.

5.2.1. Error versus execution time

The total execution time is recorded for the CG and HDG solvers, which includes the setup time for the AMG preconditioner, matrix-assembly, and the time-to-solution for the Krylov method. In the HDG case, we include the time spent building the Schur-complement for the traces, local recovery of the scalar and flux approximations, and post-processing. The -error against execution time is summarized in Figure 4.

Figure 4. Error against execution time for the CG and HDG (, with post-processing) methods.

The HDG method of order () with post-processing, as expected, produces a solution which is as accurate as the CG method of order (). While the full HDG system is never explicitly assembled, the larger execution time is a result of several factors. The total number of trace unknowns for the , , and discretizations is roughly four, three, and two times larger (resp.) than the corresponding number of CG unknowns. Therefore, each iteration is more expensive. Moreover, we also observe that the trace system requires more Krylov iterations to reach discretization error. The gap in total number of iterations starts to close as the approximation degree increases (see Figure 5). The extra cost of HDG due to the larger degree-of-freedom count and the need to perform local tensor inversion is offset by the local conservation and stabilization properties which are useful for fluid dynamics applications, for example.

Figure 5. Krylov iterations of the AMG-preconditioned conjugate gradient algorithm (to reach discretization error) against number of cells.

5.2.2. Break down of solver time

The HDG method introduces far more additional degrees of freedom than CG or native DG methods. This is largely due to the fact that the HDG method simultaneously approximates the primal solution and its flux. The global matrix for the traces is larger than the one for the CG system at low polynomial order. The execution time for HDG is then compounded by a more expensive global solve. We remind the reader that our goal here is not to compare CG and HDG but to verify that the solve time for HDG is as might be expected given the problem size.

Figure 6. Break down of the and execution times on a simplicial mesh.
Stage
(s) % (s) % (s) %
Matrix assembly (static cond.) 1.05 7.47 % 6.97 11.17 % 31.53 10.51 %
Forward elimination 0.82 5.88 % 6.31 10.11 % 31.74 10.58 %
Trace solve 10.74 76.68 % 39.83 63.84 % 185.18 61.73 %
Back substitution 1.13 8.08 % 8.33 13.35 % 44.86 14.96 %
Post processing 0.26 1.89 % 0.96 1.54 % 6.68 2.23 %
HDG Total 14.00 62.39 299.98
(s) % (s) % (s) %
Matrix assembly (monolithic) 0.50 11.82 % 1.89 8.08 % 10.77 11.93 %
Solve 3.73 88.18 % 21.56 91.92 % 79.50 88.07 %
CG Total 4.23 23.46 90.27
Table 2. Breakdown of the raw timings for the () and methods, , , and . Each method corresponds to a mesh size on a fully-loaded compute node.

Figure 6 displays the execution times on a simplicial mesh consisting of million elements. The execution times have been normalized by the CG total time, showing that the HDG method is roughly 3 times the execution time of the CG method. This is expected given the larger degree-of-freedom count. The raw numerical breakdown of the HDG and CG solvers are shown in Table 2. We isolate each component of the HDG method contributing to the total execution time. Local operations include static condensation (trace operator assembly), forward elimination (right-hand side assembly for the trace system), backwards substitution to recover the scalar and flux unknowns, and local post-processing of the primal solution. For all , our HDG implementation is solver-dominated, indicating that our software framework is producing a relatively performant implementation.

Both operator and right-hand side assembly are dominated by the costs of inverting a local square mixed matrix coupling the primal and dual variables, which is performed directly via an LU factorization. They should therefore be of the same magnitude in time spent. We observe that this is the case across all degrees (ranging between approximately 6—11% of total execution time for both processes). We observe that the cost of HDG assembly takes approximately the same percentage of time as the CG method.

Back-substitution takes slightly more time (between 8—15% of execution time across all ). This is a result of splitting the reconstruction stage into two local solves: one pass to reconstruct , followed by an additional pass to reconstruct . A further performance optimization could be to simultaneously solve for both and in each element. However, at the time of this work, the PyOP2 layer of Firedrake does not support direct insertion into mixed data structures. Finally, the additional cost of post-processing accrues negligible time (roughly 2% of execution time across all degrees).

We note that caching of local tensors does not occur. Each pass to perform the local eliminations and backwards reconstructions rebuilds the local element tensors. It is not clear at this time whether the performance gained from avoiding rebuilding the local operators will offset the memory costs of storing the local matrices. Moreover, in time-dependent problems where the operators contain state-dependent variables, rebuilding local matrices will be necessary in each time-step regardless.

5.3. A hybridized finite element solver for the shallow water equations

A primary motivator for our interest in hybridized methods revolves around developing efficient solvers for problems in geophysical flows. In this last section, we present some results integrating the shallow water equations on the sphere using test case 5 (flow past a mountain) from (Williamson et al., 1992). We use the framework of compatible finite elements (Cotter and Shipton, 2012; Cotter and Thuburn, 2014).

5.3.1. Semi-implicit solver

We start with the vector-invariant rotating nonlinear shallow water system defined on a two-dimensional spherical surface embedded in :

(61)
(62)

where is the fluid velocity, is the depth field, is the Coriolis parameter, is the acceleration due to gravity, is the bottom topography, and , with being the unit normal to the surface .

After discretising in space and time using a semi-implicit scheme and Picard linearization, following Natale and Cotter (2017), we must solve an indefinite saddle point system at each step:

(63)

In staggered finite difference models, the standard approach for solving (63) is to neglect the Coriolis term and eliminate the velocity unknowns to obtain a discrete elliptic problem for which smoothers like Richardson or relaxation methods are convergent. This is more problematic in the compatible finite element framework, since is not diagonal. Instead, we use the preconditioner described in Section 4.1.2 to form the hybridized problem and eliminate both and locally.

5.3.2. Atmospheric flow over a mountain

As a test problem, we solve test case 5 of Williamson et al. (1992), on the surface of an Earth-sized sphere. We refer the reader to Cotter and Shipton (2012); Shipton and Cotter (2017) for a more comprehensive study on mixed finite elements for shallow water systems of this type.

We use the mixed finite element pairs (lowest-order RT method) and (next-to-lowest order BDM method) for the velocity and depth spaces. The sphere mesh is generated from 8 refinements of an octahedron, resulting in a triangulation consisting of 524,288 elements. The grid information, including time-step size, and degree of freedom count is summarized in Table 3.

Grid information
Refinements Number of (s)
cells (km) (km)
8 524,288 28.396 33.052 28.125
Discretization properties
Mixed method Velocity Depth Total
unknowns unknowns
786,432 524,288 1,310,720
3,932,160 1,572,864 5,505,024
Table 3. Grid properties for the isolated mountain test case. The implicit Courant number corresponding to the choice of is approximately . The number of unknowns to be determined are summarized for each mixed method (not hybridized).

We run for a total of 100 time-steps, with a fixed number of 4 Picard iterations in each time-step. We compare the overall simulation time using two different solver configurations for the implicit linear system. First, we use an approximate Schur complement preconditioner for GMRES of the form:

(64)

where , and is a diagonal approximation to the velocity mass matrix. The approximate Schur complement is inverted using conjugate gradients preconditioned by PETSc’s smoothed aggregation multigrid (GAMG). The Krylov method is set to terminate once the preconditioned residual norm is reduced by a factor of . is computed approximately using a single application of incomplete LU (zero fill-in).

Next, we use a preonly application of our hybridization preconditioner, which replaces the original linearized mixed system with its hybrid-mixed equivalent. After hybridization, we have the problem: find