Slate: extending Firedrake's domain-specific abstraction to hybridized solvers for geoscience and beyond

02/01/2018 ∙ by Thomas H. Gibson, et al. ∙ Imperial College London 0

Within the finite element community, discontinuous Galerkin (DG) and mixed finite element methods have become increasingly popular in simulating geophysical flows. However, robust and efficient solvers for the resulting saddle-point and elliptic systems arising from these discretizations continue to be an on-going challenge. One possible approach for addressing this issue is to employ a method known as hybridization, where the discrete equations are transformed such that classic static condensation and local post-processing methods can be employed. However, it is challenging to implement hybridization as performant parallel code within complex models, whilst maintaining separation of concerns between applications scientists and software experts. In this paper, we introduce a domain-specific abstraction within the Firedrake finite element library that permits the rapid execution of these hybridization techniques within a code-generating framework. The resulting framework composes naturally with Firedrake's solver environment, allowing for the implementation of hybridization and static condensation as runtime-configurable preconditioners via the Python interface to PETSc, petsc4py. We provide examples derived from second order elliptic problems and geophysical fluid dynamics. In addition, we demonstrate that hybridization shows great promise for improving the performance of solvers for mixed finite element discretizations of equations related to large-scale geophysical flows.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 11

page 14

page 21

page 26

page 31

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 A system for localized algebra on finite element tensors

We present an expressive language for dense linear algebra on the elemental matrix systems arising from finite element problems. The language, which we call Slate, provides typical mathematical operations performed on matrices and vectors, hence the input syntax is comparable to high-level linear algebra software such as MATLAB. The Slate language provides basic abstract building blocks which can be used by a specialized compiler for linear algebra to generate low-level code implementations.

Slate is heavily influenced by the Unified Form Language (UFL) (Alnæs et al., 2014; Logg et al., 2012), a DSL embedded in Python which provides symbolic representations of finite element forms. The expressions can be compiled by a form compiler, which translates UFL into low level code for the local assembly of a form over the cells and facets of a mesh. In a similar manner, Slate expressions are compiled to low level code that performs the requested linear algebra element-wise on a mesh.

1.1 An overview of Slate

To clarify conventions and the scope of Slate, we start by considering a general form. Suppose we have a finite element form:

(4)

where and denote appropriate integration measures. The integral form in (4) is uniquely determined by its lists (possibly of 0-length) of arbitrary coefficient functions in the associated finite element spaces, arguments describing any test or trial functions, and its integrand expressions for each integral type: , , . The form describes a finite element form globally over the entire problem domain. Here, we will consider the case where the integrand can be decomposed into two independent parts for each facet : one for the positive restriction () and the negative restriction ().

The contribution of (4) in each cell of the mesh is simply

(5)

We call (5) the cell-local contribution of , with

(6)

Equation (5) produces an element tensor which is mapped into a global data structure. However, before doing so, one may want to produce a new local tensor by algebraically manipulating different element tensors. This is precisely the job of Slate.

The Slate language consists of two primary abstractions for linear algebra:

  1. terminal element tensors corresponding to multi-linear integral forms (matrices, vectors, and scalars), or assembled data (coefficient vectors); and

  2. expressions consisting of operations on terminal tensors.

The composition of binary and unary operations on terminal tensors produces a Slate expression. Such expressions can be composed with other Slate objects in arbitrary ways, resulting in concise representations of complex operations on locally assembled arrays.

Terminal tensors:

In Slate, one associates a tensor with data on an element either by using a form, or assembled coefficient data:

  • Tensor()
    associates a form, expressed in UFL, with its local element tensor:

    (7)

    The number of arguments determine the rank of Tensor, i.e. scalars, vectors, and matrices are produced from 0-forms, 1-forms, and 2-forms111As with UFL, Slate is capable of abstractly representing arbitrary rank tensors. However, only rank tensors are typically used in most finite element applications and therefore we currently only generate code for those ranks. respectively.

  • AssembledVector()
    where is some finite element function. The result associates a function with its local coefficient vectors.

Symbolic linear algebra:

Slate supports typical binary and unary operations in linear algebra, with a high-level syntax close to mathematics. At the time of this paper, these include:

  • A + B, the addition of two equal shaped tensors.

  • A * B, a contraction over the last index of A and the first index of B. This is the usual multiplicative operation on matrices, vectors, and scalars.

  • -A, the additive inverse (negation) of a tensor.

  • A.T, the transpose of a tensor.

  • A.inv, the inverse of a square tensor.

  • A.solve(B, decomposition="..."), the result of solving a local linear system for , optionally specifying a direct factorization strategy.

  • A.blocks[indices], where A is a tensor from a mixed finite element space, allows extraction of subblocks of the tensor indexed by field (slices are allowed). For example, if a matrix corresponds to the bilinear form , where and are product spaces consisting of finite element spaces , , then the cell-local tensors have the form:

    (8)

    The associated submatrix of (8) with indices , , , is

    (9)

    where , .

These building blocks may be arbitrarily composed, giving a symbolic expression for the linear algebra we wish to perform on each cell during assembly. They provide the necessary algebraic framework for a large class of problems, some of which we present in this paper.

In Firedrake, these Slate expressions are transformed into low-level code by a linear algebra compiler. This uses the form compiler, TSFC (Homolya et al., 2018), to compile kernels for the assembly of terminal tensors and generates a dense linear algebra kernel to be iterated cell-wise. At the time of this work, our compiler generates C++ code, using the templated library Eigen (Guennebaud et al., 2015) for dense linear algebra. During execution, the local computations in each cell are mapped into global data objects via appropriate indirection mappings using the PyOP2 framework (Rathgeber et al., 2012). Figure 1 provides an illustration of the complete tool-chain.

Figure 1: The Slate language wraps UFL objects describing the finite element system. The resulting Slate expressions are passed to a specialized linear algebra compiler, which produces a single “macro" kernel assembling the local contributions and executes the dense linear algebra represented in Slate. The kernels are passed to the Firedrake’s PyOP2 interface, which wraps the Slate kernel in a mesh-iteration kernel. Parallel scheduling, code generation, and compilation occurs after the PyOP2 layer.

2 Examples

We now present a few examples and discuss solution methods which require element-wise manipulations of finite element systems and their specification in Slate. We stress here that Slate is not limited to these model problems; rather these examples were chosen for clarity and to demonstrate key features of the Slate language. In Sections 3 and 4, we discuss more intricate ways Slate is used in custom preconditioners.

For the hybridization of mixed and discontinuous Galerkin methods, we use a model elliptic equation. Consider the second-order PDE with both Dirichlet and Neumann boundary conditions:

(10)
(11)
(12)

where and , are positive-valued coefficients. Rewriting as a first-order system, we obtain the mixed problem:

(13)
(14)
(15)
(16)

where and is the velocity variable.

2.1 Hybridization of mixed methods

To motivate our discussion in this section, we start by recalling the mixed method for (13)–(16). Methods of this type seek approximations in finite-dimensional subspaces , defined by:

(17)
(18)

The space consists of -conforming piecewise vector polynomials, where choices of typically include the Raviart-Thomas (RT), Brezzi-Douglas-Marini (BDM), or Brezzi-Douglas-Fortin-Marini (BDFM) elements (Brezzi et al., 1987, 1985; Nédélec, 1980; Raviart and Thomas, 1977). is the Lagrange family of discontinuous polynomials. These spaces are of particular interest when simulating geophysical flows, since choosing the right pairing results in stable discretizations with desirable conservation properties and avoids spurious computational modes. We refer the reader to Cotter and Shipton (2012); Cotter and Thuburn (2014); Natale et al. (2016); Shipton et al. (2018) for a discussion of mixed methods relevant for geophysical fluid dynamics. Two examples of such a discretization is presented in Section 4.2.

The mixed finite element formulation of (13)–(16) reads as follows: find satisfying

(19)
(20)

where is the space of functions in whose normal components vanish on . The discrete system is obtained by first expanding the solutions in terms of the finite element bases:

(21)

where and are bases for and respectively. Here, and are the coefficients to be determined. As per standard Galerkin-based finite element methods, taking , and , in (19)–(20) produces the discrete saddle point system:

(22)

where , are the coefficient vectors, and

(23)
(24)
(25)
(26)
(27)

Methods to efficiently invert such systems include -multigrid (Arnold et al., 2000) (requiring complex overlapping-Schwarz smoothers), global Schur-complement factorizations (which require an approximation to the inverse of the dense222The Schur-complement, while elliptic, is globally dense due to the fact that has a dense inverse. This is a result of velocities in having continuous normal components across cell-interfaces. elliptic Schur-complement ), or auxiliary space multigrid (Hiptmair and Xu, 2007). Here, we focus on a solution approach using a hybridized mixed method (Arnold and Brezzi, 1985; Brezzi and Fortin, 2012).

The hybridization technique replaces the original system with a discontinuous variant, decoupling the velocity degrees of freedom between cells. This is done by replacing the discrete solution space for with the “broken” space , defined as:

(28)

The vector finite element space is a subspace of consisting of local functions, but normal components are no longer continuous on . The approximation space for remains unchanged.

Next, Lagrange multipliers are introduced as an auxiliary variable in the space , defined only on cell-interfaces:

(29)

where denotes a polynomial space defined on each facet. We call the space of approximate traces. Functions in are discontinuous across vertices in two-dimensions, and vertices/edges in three-dimensions.

Deriving the hybridizable mixed system is accomplished through integration by parts over each element . Testing with and integrating (13) over produces:

(30)

The trace function is introduced in surface integrals and approximates on elemental boundaries. An additional constraint equation is added to close the system. The resulting hybridizable formulation reads: find such that

(31)
(32)
(33)

where denotes the space of traces vanishing on . The constraint in (33) enforces both continuity of the normal components of across elemental boundaries, as well as the boundary condition on . If the space of Lagrange multipliers is chosen appropriately, then the “broken” velocity , albeit sought a priori in a discontinuous space, will coincide with its -conforming counterpart. Specifically, the formulations in (31)–(32) and (19)–(20) are solving equivalent problems if the normal components of lie in the same polynomial space as the trace functions (Arnold and Brezzi, 1985).

The discrete matrix system arising from (31)–(33) has the general form:

(34)

where the discrete system is produced by expanding functions in terms of the finite element bases for , , and like before. Upon initial inspection, it may not appear to be advantageous to replace our original formulation with this augmented equation-set; the hybridizable system has substantially more total degrees of freedom. However, (34) has a considerable advantage over (22) in the following ways:

  1. Since both and are discontinuous spaces, and are coupled only within the cell. This allows us to simultaneously eliminate both unknowns via local static condensation to produce a significantly smaller global (hybridized) problem for the trace unknowns, :

    (35)

    where and are assembled by gathering the local contributions:

    (36)
    (37)

    Note that the inverse of the block matrix in (36) and (37) is never evaluated globally; the elimination can be performed locally by performing a sequence of Schur-complement reductions within each cell.

  2. The matrix is sparse, symmetric, positive-definite, and spectrally similar to the dense Schur-complement from (22) of the original mixed formulation (Cockburn et al., 2009a).

  3. Once is computed, both and can be recovered locally in each element. This can be accomplished in a number ways. One way is to compute by solving:

    (38)

    followed by solving for :

    (39)

    Similarly, one could rearrange the order in which each variable is reconstructed.

  4. If desired, the solutions can be improved further through local post-processing. We highlight two such procedures, for and respectively, in Section 2.3.

Listing 1 displays the corresponding Slate code for assembling the trace system, solving (35), and recovering the eliminated unknowns. For a complete reference on how to formulate the hybridized mixed system (31)–(33) in UFL, we refer the reader to Alnæs et al. (2014). We remark that, in the case of this hybridizable system, (34) contains zero-valued blocks which can simplify the resulting expressions in (36)–(37) and (38)–(39). This is not true in general and therefore the expanded form using all sub-blocks of (34) is presented for completeness.

1# Element tensors defining the local 3-by-3 block system
2_A = Tensor(a)
3_F = Tensor(L)
4
5# Extracting blocks for Slate expression of the reduced system
6A = _A.blocks
7F = _F.blocks
8S = A[2, 2] - A[2, :2] * A[:2, :2].inv * A[:2, 2]
9E = F[2] - A[2, :2] * A[:2, :2].inv * F[:2]
10
11# Assemble and solve: 
12Smat = assemble(S, bcs=[...])
13Evec = assemble(E)
14lambda_h = Function(M)
15solve(Smat, lambda_h, Evec, solver_parameters={"ksp_type": "preonly",
16                                               "pc_type": "lu"})
17p_h = Function(V)                              # Function to store the result: 
18u_h = Function(U)                              # Function to store the result: 
19
20# Intermediate expressions
21Sd = A[1, 1] - A[1, 0] * A[0, 0].inv * A[0, 1]
22Sl = A[1, 2] - A[1, 0] * A[0, 0].inv * A[0, 2]
23Lambda = AssembledVector(lambda_h)             # Local coefficient vector for 
24P = AssembledVector(p_h)                       # Local coefficient vector for 
25
26# Local solve expressions for  and 
27p_sys = Sd.solve(F[1] - A[1, 0] * A[0, 0].inv * F[0] - Sl * Lambda,
28                 decomposition="PartialPivLu")
29u_sys = A[0, 0].solve(F[0] - A[0, 1] * P - A[0, 2] * Lambda,
30                      decomposition="PartialPivLu")
31assemble(p_sys, p_h)
32assemble(u_sys, u_h)
Listing 1: Firedrake code for solving (34) via static condensation and local recovery, given UFL expressions a, L for (31)–(33). Arguments of the mixed space are indexed by 0, 1, and 2 respectively. Lines 1 and 1 are symbolic expressions for (36) and (37) respectively. Any vanishing conditions on the trace variables should be provided as boundary conditions during operator assembly (line 1). Lines 1 and 1 are expressions for (38) and (39) (using LU). Code generation occurs in lines 1, 1, 1, and 1. A global linear solver for the reduced system is created and used in line 1. Configuring the linear solver is done by providing an appropriate Python dictionary of solver options for the PETSc library.

2.2 Hybridization of discontinuous Galerkin methods

The hybridized discontinuous Galerkin (HDG) method is a natural extension of discontinuous Galerkin (DG) discretizations. Here, we consider a specific HDG discretization, namely the LDG-H method (Cockburn et al., 2010b). Other forms of HDG that involve local lifting operators can also be implemented in this software framework by the introduction of additional local (i.e., discontinuous) variables in the definition of the local solver.

To construct the LDG-H discretization, we define the DG numerical fluxes and to be functions of the trial unknowns and a new independent unknown in the trace space :

(40)
(41)

where is a function approximating the trace of on and is a positive function that may vary on each facet . The full LDG-H formulation reads as follows. Find such that

(42)
(43)
(44)

Equation (44) enforces continuity of the numerical flux on , which in turn produces a flux that is single-valued on the facets. Note that the choice of has a significant influence on the expected convergence rates of the computed solutions.

The matrix system arising from (42)–(44) has the same general form as that of the hybridized mixed method in (34), except all sub-blocks are now populated with non-zero entries due to the coupling of trace functions with both and . However, all previous properties of the discrete matrix system from Section 2.1 still apply. The Slate expressions for the local elimination and reconstruction operations will be identical to those of Listing 1. For the interested reader, a unified analysis of hybridization methods (both mixed and DG) for second-order elliptic equations is presented in Cockburn et al. (2009a); Cockburn (2016).

2.3 Local post-processing

For both mixed (Arnold and Brezzi, 1985; Bramble and Xu, 1989; Stenberg, 1991) and discontinuous Galerkin methods (Cockburn et al., 2010b, 2009b), it is possible to locally post-process solutions to obtain superconvergent approximations (gaining one order of accuracy over the unprocessed solution). These methods can be expressed as local solves on each element, and so, in addition to static condensation, the Slate language also provides access to code generation for local post-processing of computed solutions.

Here, we present two post-processing techniques: one for scalar fields, and another for the vector unknown. The Slate code follows naturally from previous discussions in Sections 2.1 and 2.2, using the standard set of operations on local tensors summarized in Section 1.1.

2.3.1 Post-processing of the scalar solution

Our first example is a modified version of the procedure presented by Stenberg (1991) for enhancing the accuracy of the scalar solution. This was also highlighted within the context of hybridizing eigenproblems by Cockburn et al. (2010a). This post-processing technique can be used for both the hybridized mixed and LDG-H methods.

Let denote a polynomial space of degree on an element . Then for a given pair of computed solutions of the hybridized methods, we define the post-processed scalar as the unique solution of the local problem:

(45)
(46)

where . Here, the space denotes the -orthogonal complement of . This post-processing method directly uses the definition of the flux to construct the local problem. In practice, the space may be constructed using an orthogonal hierarchical basis, and solving (45)–(46) amounts to inverting a local symmetric positive definite system.

At the time of this work, Firedrake does not support the construction of such a finite element basis. However, we can introduce Lagrange multipliers to enforce the orthogonality constraint. The resulting local problem then becomes the following mixed system: find such that

(47)
(48)

where . The local problems (47)–(48) and (45)–(46) are equivalent, with the Lagrange multiplier enforcing orthogonality of test functions in with functions in .

This post-processing method produces a new approximation which superconverges at a rate of for hybridized mixed methods (Arnold and Brezzi, 1985; Cockburn et al., 2010a; Stenberg, 1991). For the LDG-H method, superconvergence is achieved when and , but only convergence is achieved when (Cockburn et al., 2010b, 2009b). We demonstrate the increased accuracy in computed solutions in Section 4.1. An abridged example using Firedrake and Slate is provided in Listing 2.

1# Define spaces for the higher-order pressure approximation and Lagrange multipliers
2DGk1 = FunctionSpace(mesh, "DG", degree + 1)
3DG0 = FunctionSpace(mesh, "DG", 0)
4W = DGk1 * DG0
5p, psi = TrialFunctions(W)
6w, phi = TestFunctions(W)
7
8# Create local Slate tensors for the post-processing system
9K = Tensor((inner(grad(p), grad(w)) + inner(psi, w) + inner(p, phi))*dx)
10# Use computed pressure  and flux  in right-hand side
11F = Tensor((-inner(u_h, grad(w)) + inner(p_h, phi))*dx)
12E = K.inv * F
13
14# Function for the post-processed scalar field 
15p_star = Function(DGk1, name="Post-processed scalar")
16assemble(E.blocks[0], p_star)      # Only want the first field (pressure)
Listing 2: Example of local post-processing using Firedrake and Slate. Here, we locally solve the mixed system defined in (45)–(46). The corresponding symbolic local tensors are defined in lines 2 and 2. The Slate expression for directly inverting the local system is written in line 2. In line 2, a Slate-generated kernel is produced which solves the resulting linear system in each cell. Since we are not interested in the multiplier, we only return the block corresponding to the new pressure field.

2.3.2 Post-processing of the flux

Our second example illustrates a procedure that uses the numerical flux of an HDG discretization for (13)–(16). Within the context of the LDG-H method, we can use the numerical trace in (40) to produce a vector field that is -conforming. The technique we outline here follows that of Cockburn et al. (2009b).

Let be a mesh consisting of simplices333 This particular post-processing strategy only works on triangles and tetrahedra.. On each element , we define a new function to be the unique element of the local Raviart-Thomas space satisfying

(49)
(50)

This local problem produces a new velocity with the following properties:

  1. converges at the same rate as for all choices of producing a solvable system for (42)–(44). However,

  2. . That is,

    (51)
  3. Additionally, the divergence of convergences at a rate of .

The Firedrake implementation using Slate is similar to the scalar post-processing example (see Listing 2); the element-wise linear systems (49)–(50) can be expressed in UFL, and therefore the necessary Slate expressions to invert the local systems follows naturally from the set of operations presented in Section 1.1. We use the very sensitive parameter dependency in the post-processing methods to validate our software implementation in Zenodo/Tabula-Rasa (2019).

3 Static condensation as a preconditioner

Slate enables static condensation approaches to be expressed very concisely. Nonetheless, application of a particular approach to different variational problems using Slate still requires a certain amount of code repetition. By formulating each form of static condensation as a preconditioner, code can be written once and then applied to any mathematically suitable problem. Rather than writing the static condensation by hand, in many cases, it is sufficient to just select the appropriate, Slate-based, preconditioner.

For context, it is helpful to frame the problem in the particular context of the solver library. Firedrake uses PETSc to provide linear solvers, and we implement our preconditioners as PETSc PC objects. These are defined to act on the problem residual, and return a correction to the solution. Specifically, we can think of (left) preconditioning the matrix equation in residual form:

(52)

by an operator (which may not necessarily be linear) as a transformation into an equivalent system of the form

(53)

Given a current iterate the residual at the -th iteration is simply , and acts on the residual to produce an approximation to the error . If is an application of an exact inverse, the residual is converted into an exact (up to numerical round-off) error.

We will denote the application of particular Krylov subspace method (KSP) for the linear system (52) as . Upon preconditioning the system via as in (53), we write

(54)

If (54) is solved directly via the application of , then . So, we have that produces the exact solution of (52) in a single iteration of . Having established notation, we now present our implementation of static condensation via Slate by defining the appropriate operator, .

3.1 Interfacing with PETSc via custom preconditioners

The implementation of preconditioners for these systems requires manipulation not of assembled matrices, but rather their symbolic representation. To do this, we use the preconditioning infrastructure developed by Kirby and Mitchell (2018), which gives preconditioners written in Python access to the symbolic problem description. In Firedrake, this means all derived preconditioners have direct access to the UFL representation of the PDE system. From this mathematical specification, we manipulate this appropriately via Slate and provide operators assembled from Slate expressions to PETSc for further algebraic preconditioning. Using this approach, we have developed a static condensation interface for the hybridization of mixed problems, and a generic interface for statically condensing hybridized systems. The advantage of writing even the latter as a preconditioner is the ability to switch out the solution scheme for the system, even when nested inside a larger set of coupled equations at runtime.

3.1.1 A static condensation interface for hybridization

As discussed in sections 2.1 and 2.2, one of the main advantages of using a hybridizable variant of a DG or mixed method is that such systems permit the use of element-wise condensation and recovery. To facilitate this, we provide a PETSc PC static condensation interface, SCPC. This preconditioner takes the discretized system as in (34), and performs the local elimination and recovery procedures. Slate expressions are generated from the underlying UFL problem description.

More precisely, the incoming system has the form:

(55)

where is the vector of unknowns to be eliminated, is the vector of unknowns for the condensed field, and , are the incoming right-hand sides. The partitioning in (55) is determined by the SCPC option: pc_sc_eliminate_fields. Field indices are provided in the same way one configures solver options to PETSc. These indices determine which field(s) to statically condense into. For example, on a three-field problem (with indices 0, 1, and 2), setting -pc_sc_eliminate_fields 0,1 will configure SCPC to cell-wise eliminate field 0 and 1; the resulting condensed system is associated with field 2.

In exact arithmetic, the SCPC preconditioner applies the inverse of the Schur-complement factorization of (55):

(56)

where is the Schur-complement operator for the system. The distinction here from block preconditioners via fieldsplit (Brown et al., 2012), for example, is that does not require global actions; by design can be inverted locally and is sparse. As a result, can be assembled or applied exactly via Slate-generated kernels.

In practice, the only globally coupled system requiring iterative inversion is :

(57)

where is the condensed right-hand side and is another possible choice of preconditioner for . Once is computed, is reconstructed element-wise via inverting the local systems.

By construction, this preconditioner is suitable for both hybridized mixed and HDG discretizations. It can also be used within other contexts, such as the static condensation of continuous Galerkin discretizations (Guyan, 1965; Irons, 1965) or primal-hybrid methods (Devloo et al., 2018). As with any PETSc preconditioner, solver options can be specified for inverting via the appropriate options prefix (condensed_field). The resulting KSP for (57) is compatible with existing solvers and external packages provided through the PETSc library. This allows users to experiment with a direct method and then switch to a more parallel-efficient iterative solver without changing the core application code.

3.1.2 Preconditioning mixed methods via hybridization

The preconditioner HybridizationPC expands on the previous one, this time taking an system and automatically forming the hybridizable problem. This is accomplished through manipulating the UFL objects representing the discretized PDE. This includes replacing argument spaces with their discontinuous counterparts, introducing test functions on an appropriate trace space, and providing operators assembled from Slate expressions in a similar manner as described in section 3.1.1.

More precisely, let be the incoming mixed saddle point problem, where , , and and are the velocity and scalar unknowns respectively. Then this preconditioner replaces with the augmented system:

(58)

where , , are the right-hand sides for the flux and scalar equations respectively, and indicates modified matrices and co-vectors with discontinuous functions. Here, are the hybridized (discontinuous) unknowns to be determined.

The preconditioning operator for the hybrid-mixed system (58) has the form:

(59)

where is the Schur-complement matrix . As before, a single globally coupled system for is required. The recovery of and happens in the same manner as SCPC.

Since the flux is constructed in a discontinuous space , we must project the computed solution into . This can be done cheaply via local facet averaging. The resulting solution is then updated via , where is the projection mapping. This ensures the residual for the original mixed problem is properly evaluated to test for convergence. With as in (59), the preconditioning operator for the original system is:

(60)

We note here that assembly of the right-hand for the system requires special attention. Firstly, when Neumann conditions are present, then is not necessarily 0. Since the hybridization preconditioner has access to the entire Python context (which includes a list of boundary conditions and the spaces in which they are applied), surface integrals on the exterior boundary are added where appropriate and incorporated in the generated Slate expressions. A more subtle issue that requires extra care is the incoming right-hand side tested in .

The situation we are given is that we have for , but require for . For consistency, we also require for any that

(61)

We can construct such a satisfying (61) in the following way. By construction of the space , we have for :

(62)

where , and are functions corresponding to the positive and negative restrictions associated with the -th facet node444 These are the two “broken" parts of on a particular facet connecting two elements. That is, for two adjacent cells, a basis function in for a particular facet node can be decomposed into two basis functions in defined on their respective sides of the facet.. We then define our “broken” right-hand side via the local definition:

(63)

where is the number of cells that the degree of freedom corresponding to the basis function touches. Using (62), (63), and the fact that is linear in its argument, we can verify that our construction of satisfies (61).

4 Numerical studies

We now present results utilizing the Slate DSL and our static condensation preconditioners for a set of test problems. Since we are using the interfaces outlined in Section 3, Slate is accessed indirectly and requires no manually-written solver code for hybridization or static condensation/local recovery. All parallel results were obtained on a single fully-loaded compute node of dual-socket Intel E5-2630v4 (Xeon) processors with cores (2 threads per core) running at 2.2GHz. In order to avoid potential memory effects due to the operating system migrating processes between sockets, we pin MPI processes to cores.

Verification of the generated code is performed using parameter-sensitive convergence tests. The study consists of running a variety of discretizations spanning the methods outlined in Section 2. Details and numerical results are made public and can be viewed in Zenodo/Tabula-Rasa (2019) (see “Code and data availability”). All results are in full agreement with the theory.

4.1 LDG-H method for a three-dimensional elliptic equation

In this section, we take a closer look at the LDG-H method for the model elliptic equation (sign-definite Helmholtz):

(64)
(65)

where and are chosen such that the analytic solution is . We use a regular mesh consisting tetrahedral elements (). First, we reformulate (64)–(65) as the mixed problem:

(66)
(67)
(68)

We start with linear polynomial approximations, up to cubic, for the LDG-H discretization of (66)–(68). Additionally, we compute a post-processed scalar approximation of the HDG solution. This raises the approximation order of the computed solution by an additional degree. In all numerical studies here, we set the HDG parameter . All results were computed in parallel, utilizing a single compute node (described previously).

A continuous Galerkin (CG) discretization of the primal problem (64)–(65) serves as a reference for this experiment. Due to the superconvergence in the post-processed solution for the HDG method, we use CG discretizations of polynomial order 2, 3, and 4. This takes into account the enhanced accuracy of the HDG solution, despite being initially computed as a lower-order approximation. We therefore expect both methods to produce equally accurate solutions to the model problem.

Our aim here is not to compare the performance of HDG and CG, which has been investigated elsewhere (for example, see Kirby et al. (2012); Yakovlev et al. (2016)). Instead, we provide a reference that the reader might be more familiar with in order to evaluate whether our software framework produces a sufficiently performant HDG implementation relative to what might be expected.

For the CG discretization, we use a matrix-explicit iterative solver consisting of the conjugate gradient method preconditioned with hypre’s boomerAMG implementation of algebraic multigrid (AMG) Falgout et al. (2006). We use the preconditioner described in Section 3.1.1 to statically condense the LDG-H system, using the same iterative solver as the CG method for the Lagrange multipliers. This is sensible, as the global trace operator defines a symmetric positive-definite operator. To avoid over-solving, we iterate to a relative tolerance such that the discretization error is minimal for a given mesh.

4.1.1 Error versus execution time

The total execution time is recorded for the CG and HDG solvers, which includes the setup time for the AMG preconditioner, matrix-assembly, and the time-to-solution for the Krylov method. In the HDG case, we include the time spent building the Schur-complement for the traces, local recovery of the scalar and flux approximations, and post-processing. The -error against execution time is summarized in Figure 1(a).

((a)) Error against execution time for the CG and HDG with post-processing () methods.
((b)) Krylov iterations of the AMG-preconditioned conjugate gradient algorithm (to reach discretization error) against number of cells.
Figure 2: Comparison of continuous Galerkin and LDG-H solvers for the model three-dimensional positive-definite Helmholtz equation.

The HDG method of order () with post-processing, as expected, produces a solution which is as accurate as the CG method of order (). While the full HDG system is never explicitly assembled, the larger execution time is a result of several factors. The total number of trace unknowns for the , , and discretizations is roughly four, three, and two times larger (resp.) than the corresponding number of CG unknowns. Therefore, each iteration is more expensive. Moreover, we also observe that the trace system requires more Krylov iterations to reach discretization error. The gap in total number of iterations starts to close as the approximation degree increases (see Figure 1(b)). The extra cost of HDG due to the larger degree-of-freedom count and the need to perform local tensor inversion is offset by the local conservation and stabilization properties which are useful for fluid dynamics applications.

4.1.2 Break down of solver time

The HDG method requires many more degrees of freedom than CG or primal DG methods. This is largely due to the fact that the HDG method simultaneously approximates the primal solution and its velocity. The global matrix for the traces is larger than the one for the CG system at low polynomial order. The execution time for HDG is then compounded by a more expensive global solve. We remind the reader that our goal here is to verify that the solve time for HDG is as might be expected given the problem size.

Figure 3: Break down of the and execution times on a simplicial mesh.
Stage
(s) % (s) % (s) %
Matrix assembly (static cond.) 1.05 7.49 % 6.95 10.40 % 31.66 10.27 %
Forward elimination 0.86 6.13 % 6.32 9.45 % 31.98 10.37 %
Trace solve 10.66 76.24 % 43.89 65.66 % 192.31 62.36 %
Back substitution 1.16 8.28 % 8.71 13.03 % 45.81 14.85 %
Post processing 0.26 1.86 % 0.98 1.46 % 6.62 2.15 %
HDG Total 13.98 66.85 308.37
(s) % (s) % (s) %
Matrix assembly (monolithic) 0.50 12.01 % 2.91 11.39 % 26.37 24.11 %
Solve 3.63 87.99 % 22.67 88.61 % 82.99 75.89 %
CG Total 4.12 25.59 109.36
Table 1: Breakdown of the raw timings for the () and methods, , , and . Each method corresponds to a mesh size on a fully-loaded compute node.

Figure 3 displays the execution times on a simplicial mesh consisting of million elements. The execution times have been normalized by the CG total time, showing that the HDG method is roughly 3 times the execution time of the CG method. This is expected given the larger degree-of-freedom count. The raw numerical breakdown of the HDG and CG solvers are shown in Table 1. We isolate each component of the HDG method contributing to the total execution time. Local operations include static condensation (trace operator assembly), forward elimination (right-hand side assembly for the trace system), backwards substitution to recover the scalar and velocity unknowns, and local post-processing of the primal solution. For all , our HDG implementation is solver-dominated.

Both operator and right-hand side assembly are dominated by the costs of inverting a local square mixed matrix coupling the primal and dual variables, which is performed directly via an LU factorization. They should therefore be of the same magnitude in time spent. We observe that this is the case across all degrees (ranging between approximately 6—11% of total execution time for both local operations). Back-substitution takes roughly the same time as the static condensation and forward elimination stages (between 8—15% of execution time across all ). This is expected, as these operations are all dominated by the cost of inverting the local matrix coupling the scalar and velocity degrees of freedom. The slight increase in time is due splitting the local equations into two local solvers: one for and another for the velocity. Finally, the additional cost of post-processing accrues negligible time (roughly 2% of execution time across all degrees).

We note that caching of local tensors does not occur. Each pass to perform the local eliminations and backwards reconstructions rebuilds the local element tensors. It is not clear at this time whether the performance gained from avoiding rebuilding the local operators will offset the memory costs of storing the local matrices. Moreover, in time-dependent problems where the operators may contain state-dependent variables, rebuilding local matrices will be necessary in each time-step regardless.

4.2 Hybridized-mixed solver for the shallow water equations

A primary motivator for our interest in hybridized methods revolves around developing efficient solvers for problems in geophysical flows. In section, we present some results integrating the nonlinear shallow water equations on the sphere using test case 5 (flow past a mountain) from Williamson et al. (1992). We use the framework of compatible finite elements (Cotter and Shipton, 2012; Cotter and Thuburn, 2014).

4.2.1 Semi-implicit finite element discretization

We start with the vector-invariant rotating nonlinear shallow water system defined on a two-dimensional spherical surface embedded in :

(69)
(70)

where is the fluid velocity, is the depth field, is the Coriolis parameter, is the acceleration due to gravity, is the bottom topography, and