1 A system for localized algebra on finite element tensors
We present an expressive language for dense linear algebra on the elemental matrix systems arising from finite element problems. The language, which we call Slate, provides typical mathematical operations performed on matrices and vectors, hence the input syntax is comparable to highlevel linear algebra software such as MATLAB. The Slate language provides basic abstract building blocks which can be used by a specialized compiler for linear algebra to generate lowlevel code implementations.
Slate is heavily influenced by the Unified Form Language (UFL) (Alnæs et al., 2014; Logg et al., 2012), a DSL embedded in Python which provides symbolic representations of finite element forms. The expressions can be compiled by a form compiler, which translates UFL into low level code for the local assembly of a form over the cells and facets of a mesh. In a similar manner, Slate expressions are compiled to low level code that performs the requested linear algebra elementwise on a mesh.
1.1 An overview of Slate
To clarify conventions and the scope of Slate, we start by considering a general form. Suppose we have a finite element form:
(4) 
where and denote appropriate integration measures. The integral form in (4) is uniquely determined by its lists (possibly of 0length) of arbitrary coefficient functions in the associated finite element spaces, arguments describing any test or trial functions, and its integrand expressions for each integral type: , , . The form describes a finite element form globally over the entire problem domain. Here, we will consider the case where the integrand can be decomposed into two independent parts for each facet : one for the positive restriction () and the negative restriction ().
The contribution of (4) in each cell of the mesh is simply
(5) 
We call (5) the celllocal contribution of , with
(6) 
Equation (5) produces an element tensor which is mapped into a global data structure. However, before doing so, one may want to produce a new local tensor by algebraically manipulating different element tensors. This is precisely the job of Slate.
The Slate language consists of two primary abstractions for linear algebra:

terminal element tensors corresponding to multilinear integral forms (matrices, vectors, and scalars), or assembled data (coefficient vectors); and

expressions consisting of operations on terminal tensors.
The composition of binary and unary operations on terminal tensors produces a Slate expression. Such expressions can be composed with other Slate objects in arbitrary ways, resulting in concise representations of complex operations on locally assembled arrays.
Terminal tensors:
In Slate, one associates a tensor with data on an element either by using a form, or assembled coefficient data:

Tensor
()
associates a form, expressed in UFL, with its local element tensor:(7) The number of arguments determine the rank of
Tensor
, i.e. scalars, vectors, and matrices are produced from 0forms, 1forms, and 2forms^{1}^{1}1As with UFL, Slate is capable of abstractly representing arbitrary rank tensors. However, only rank tensors are typically used in most finite element applications and therefore we currently only generate code for those ranks. respectively. 
AssembledVector
()
where is some finite element function. The result associates a function with its local coefficient vectors.
Symbolic linear algebra:
Slate supports typical binary and unary operations in linear algebra, with a highlevel syntax close to mathematics. At the time of this paper, these include:

A + B
, the addition of two equal shaped tensors. 
A * B
, a contraction over the last index ofA
and the first index ofB
. This is the usual multiplicative operation on matrices, vectors, and scalars. 
A
, the additive inverse (negation) of a tensor. 
A.T
, the transpose of a tensor. 
A.inv
, the inverse of a square tensor. 
A.solve(B, decomposition="...")
, the result of solving a local linear system for , optionally specifying a direct factorization strategy. 
A.blocks[indices]
, where A is a tensor from a mixed finite element space, allows extraction of subblocks of the tensor indexed by field (slices are allowed). For example, if a matrix corresponds to the bilinear form , where and are product spaces consisting of finite element spaces , , then the celllocal tensors have the form:(8) The associated submatrix of (8) with indices , , , is
(9) where , .
These building blocks may be arbitrarily composed, giving a symbolic expression for the linear algebra we wish to perform on each cell during assembly. They provide the necessary algebraic framework for a large class of problems, some of which we present in this paper.
In Firedrake, these Slate expressions are transformed into lowlevel code by a linear algebra compiler. This uses the form compiler, TSFC (Homolya et al., 2018), to compile kernels for the assembly of terminal tensors and generates a dense linear algebra kernel to be iterated cellwise. At the time of this work, our compiler generates C++ code, using the templated library Eigen (Guennebaud et al., 2015) for dense linear algebra. During execution, the local computations in each cell are mapped into global data objects via appropriate indirection mappings using the PyOP2 framework (Rathgeber et al., 2012). Figure 1 provides an illustration of the complete toolchain.
2 Examples
We now present a few examples and discuss solution methods which require elementwise manipulations of finite element systems and their specification in Slate. We stress here that Slate is not limited to these model problems; rather these examples were chosen for clarity and to demonstrate key features of the Slate language. In Sections 3 and 4, we discuss more intricate ways Slate is used in custom preconditioners.
For the hybridization of mixed and discontinuous Galerkin methods, we use a model elliptic equation. Consider the secondorder PDE with both Dirichlet and Neumann boundary conditions:
(10)  
(11)  
(12) 
where and , are positivevalued coefficients. Rewriting as a firstorder system, we obtain the mixed problem:
(13)  
(14)  
(15)  
(16) 
where and is the velocity variable.
2.1 Hybridization of mixed methods
To motivate our discussion in this section, we start by recalling the mixed method for (13)–(16). Methods of this type seek approximations in finitedimensional subspaces , defined by:
(17)  
(18) 
The space consists of conforming piecewise vector polynomials, where choices of typically include the RaviartThomas (RT), BrezziDouglasMarini (BDM), or BrezziDouglasFortinMarini (BDFM) elements (Brezzi et al., 1987, 1985; Nédélec, 1980; Raviart and Thomas, 1977). is the Lagrange family of discontinuous polynomials. These spaces are of particular interest when simulating geophysical flows, since choosing the right pairing results in stable discretizations with desirable conservation properties and avoids spurious computational modes. We refer the reader to Cotter and Shipton (2012); Cotter and Thuburn (2014); Natale et al. (2016); Shipton et al. (2018) for a discussion of mixed methods relevant for geophysical fluid dynamics. Two examples of such a discretization is presented in Section 4.2.
The mixed finite element formulation of (13)–(16) reads as follows: find satisfying
(19)  
(20) 
where is the space of functions in whose normal components vanish on . The discrete system is obtained by first expanding the solutions in terms of the finite element bases:
(21) 
where and are bases for and respectively. Here, and are the coefficients to be determined. As per standard Galerkinbased finite element methods, taking , and , in (19)–(20) produces the discrete saddle point system:
(22) 
where , are the coefficient vectors, and
(23)  
(24)  
(25)  
(26)  
(27) 
Methods to efficiently invert such systems include multigrid (Arnold et al., 2000) (requiring complex overlappingSchwarz smoothers), global Schurcomplement factorizations (which require an approximation to the inverse of the dense^{2}^{2}2The Schurcomplement, while elliptic, is globally dense due to the fact that has a dense inverse. This is a result of velocities in having continuous normal components across cellinterfaces. elliptic Schurcomplement ), or auxiliary space multigrid (Hiptmair and Xu, 2007). Here, we focus on a solution approach using a hybridized mixed method (Arnold and Brezzi, 1985; Brezzi and Fortin, 2012).
The hybridization technique replaces the original system with a discontinuous variant, decoupling the velocity degrees of freedom between cells. This is done by replacing the discrete solution space for with the “broken” space , defined as:
(28) 
The vector finite element space is a subspace of consisting of local functions, but normal components are no longer continuous on . The approximation space for remains unchanged.
Next, Lagrange multipliers are introduced as an auxiliary variable in the space , defined only on cellinterfaces:
(29) 
where denotes a polynomial space defined on each facet. We call the space of approximate traces. Functions in are discontinuous across vertices in twodimensions, and vertices/edges in threedimensions.
Deriving the hybridizable mixed system is accomplished through integration by parts over each element . Testing with and integrating (13) over produces:
(30) 
The trace function is introduced in surface integrals and approximates on elemental boundaries. An additional constraint equation is added to close the system. The resulting hybridizable formulation reads: find such that
(31)  
(32)  
(33) 
where denotes the space of traces vanishing on . The constraint in (33) enforces both continuity of the normal components of across elemental boundaries, as well as the boundary condition on . If the space of Lagrange multipliers is chosen appropriately, then the “broken” velocity , albeit sought a priori in a discontinuous space, will coincide with its conforming counterpart. Specifically, the formulations in (31)–(32) and (19)–(20) are solving equivalent problems if the normal components of lie in the same polynomial space as the trace functions (Arnold and Brezzi, 1985).
The discrete matrix system arising from (31)–(33) has the general form:
(34) 
where the discrete system is produced by expanding functions in terms of the finite element bases for , , and like before. Upon initial inspection, it may not appear to be advantageous to replace our original formulation with this augmented equationset; the hybridizable system has substantially more total degrees of freedom. However, (34) has a considerable advantage over (22) in the following ways:

Since both and are discontinuous spaces, and are coupled only within the cell. This allows us to simultaneously eliminate both unknowns via local static condensation to produce a significantly smaller global (hybridized) problem for the trace unknowns, :
(35) where and are assembled by gathering the local contributions:
(36) (37) Note that the inverse of the block matrix in (36) and (37) is never evaluated globally; the elimination can be performed locally by performing a sequence of Schurcomplement reductions within each cell.

Once is computed, both and can be recovered locally in each element. This can be accomplished in a number ways. One way is to compute by solving:
(38) followed by solving for :
(39) Similarly, one could rearrange the order in which each variable is reconstructed.

If desired, the solutions can be improved further through local postprocessing. We highlight two such procedures, for and respectively, in Section 2.3.
Listing 1 displays the corresponding Slate code for assembling the trace system, solving (35), and recovering the eliminated unknowns. For a complete reference on how to formulate the hybridized mixed system (31)–(33) in UFL, we refer the reader to Alnæs et al. (2014). We remark that, in the case of this hybridizable system, (34) contains zerovalued blocks which can simplify the resulting expressions in (36)–(37) and (38)–(39). This is not true in general and therefore the expanded form using all subblocks of (34) is presented for completeness.
2.2 Hybridization of discontinuous Galerkin methods
The hybridized discontinuous Galerkin (HDG) method is a natural extension of discontinuous Galerkin (DG) discretizations. Here, we consider a specific HDG discretization, namely the LDGH method (Cockburn et al., 2010b). Other forms of HDG that involve local lifting operators can also be implemented in this software framework by the introduction of additional local (i.e., discontinuous) variables in the definition of the local solver.
To construct the LDGH discretization, we define the DG numerical fluxes and to be functions of the trial unknowns and a new independent unknown in the trace space :
(40)  
(41) 
where is a function approximating the trace of on and is a positive function that may vary on each facet . The full LDGH formulation reads as follows. Find such that
(42)  
(43)  
(44) 
Equation (44) enforces continuity of the numerical flux on , which in turn produces a flux that is singlevalued on the facets. Note that the choice of has a significant influence on the expected convergence rates of the computed solutions.
The matrix system arising from (42)–(44) has the same general form as that of the hybridized mixed method in (34), except all subblocks are now populated with nonzero entries due to the coupling of trace functions with both and . However, all previous properties of the discrete matrix system from Section 2.1 still apply. The Slate expressions for the local elimination and reconstruction operations will be identical to those of Listing 1. For the interested reader, a unified analysis of hybridization methods (both mixed and DG) for secondorder elliptic equations is presented in Cockburn et al. (2009a); Cockburn (2016).
2.3 Local postprocessing
For both mixed (Arnold and Brezzi, 1985; Bramble and Xu, 1989; Stenberg, 1991) and discontinuous Galerkin methods (Cockburn et al., 2010b, 2009b), it is possible to locally postprocess solutions to obtain superconvergent approximations (gaining one order of accuracy over the unprocessed solution). These methods can be expressed as local solves on each element, and so, in addition to static condensation, the Slate language also provides access to code generation for local postprocessing of computed solutions.
Here, we present two postprocessing techniques: one for scalar fields, and another for the vector unknown. The Slate code follows naturally from previous discussions in Sections 2.1 and 2.2, using the standard set of operations on local tensors summarized in Section 1.1.
2.3.1 Postprocessing of the scalar solution
Our first example is a modified version of the procedure presented by Stenberg (1991) for enhancing the accuracy of the scalar solution. This was also highlighted within the context of hybridizing eigenproblems by Cockburn et al. (2010a). This postprocessing technique can be used for both the hybridized mixed and LDGH methods.
Let denote a polynomial space of degree on an element . Then for a given pair of computed solutions of the hybridized methods, we define the postprocessed scalar as the unique solution of the local problem:
(45)  
(46) 
where . Here, the space denotes the orthogonal complement of . This postprocessing method directly uses the definition of the flux to construct the local problem. In practice, the space may be constructed using an orthogonal hierarchical basis, and solving (45)–(46) amounts to inverting a local symmetric positive definite system.
At the time of this work, Firedrake does not support the construction of such a finite element basis. However, we can introduce Lagrange multipliers to enforce the orthogonality constraint. The resulting local problem then becomes the following mixed system: find such that
(47)  
(48) 
where . The local problems (47)–(48) and (45)–(46) are equivalent, with the Lagrange multiplier enforcing orthogonality of test functions in with functions in .
This postprocessing method produces a new approximation which superconverges at a rate of for hybridized mixed methods (Arnold and Brezzi, 1985; Cockburn et al., 2010a; Stenberg, 1991). For the LDGH method, superconvergence is achieved when and , but only convergence is achieved when (Cockburn et al., 2010b, 2009b). We demonstrate the increased accuracy in computed solutions in Section 4.1. An abridged example using Firedrake and Slate is provided in Listing 2.
2.3.2 Postprocessing of the flux
Our second example illustrates a procedure that uses the numerical flux of an HDG discretization for (13)–(16). Within the context of the LDGH method, we can use the numerical trace in (40) to produce a vector field that is conforming. The technique we outline here follows that of Cockburn et al. (2009b).
Let be a mesh consisting of simplices^{3}^{3}3 This particular postprocessing strategy only works on triangles and tetrahedra.. On each element , we define a new function to be the unique element of the local RaviartThomas space satisfying
(49)  
(50) 
This local problem produces a new velocity with the following properties:

. That is,
(51) 
Additionally, the divergence of convergences at a rate of .
The Firedrake implementation using Slate is similar to the scalar postprocessing example (see Listing 2); the elementwise linear systems (49)–(50) can be expressed in UFL, and therefore the necessary Slate expressions to invert the local systems follows naturally from the set of operations presented in Section 1.1. We use the very sensitive parameter dependency in the postprocessing methods to validate our software implementation in Zenodo/TabulaRasa (2019).
3 Static condensation as a preconditioner
Slate enables static condensation approaches to be expressed very concisely. Nonetheless, application of a particular approach to different variational problems using Slate still requires a certain amount of code repetition. By formulating each form of static condensation as a preconditioner, code can be written once and then applied to any mathematically suitable problem. Rather than writing the static condensation by hand, in many cases, it is sufficient to just select the appropriate, Slatebased, preconditioner.
For context, it is helpful to frame the problem in the particular context of the solver library. Firedrake uses PETSc to provide linear solvers, and we implement our preconditioners as PETSc PC objects. These are defined to act on the problem residual, and return a correction to the solution. Specifically, we can think of (left) preconditioning the matrix equation in residual form:
(52) 
by an operator (which may not necessarily be linear) as a transformation into an equivalent system of the form
(53) 
Given a current iterate the residual at the th iteration is simply , and acts on the residual to produce an approximation to the error . If is an application of an exact inverse, the residual is converted into an exact (up to numerical roundoff) error.
We will denote the application of particular Krylov subspace method (KSP) for the linear system (52) as . Upon preconditioning the system via as in (53), we write
(54) 
If (54) is solved directly via the application of , then . So, we have that produces the exact solution of (52) in a single iteration of . Having established notation, we now present our implementation of static condensation via Slate by defining the appropriate operator, .
3.1 Interfacing with PETSc via custom preconditioners
The implementation of preconditioners for these systems requires manipulation not of assembled matrices, but rather their symbolic representation. To do this, we use the preconditioning infrastructure developed by Kirby and Mitchell (2018), which gives preconditioners written in Python access to the symbolic problem description. In Firedrake, this means all derived preconditioners have direct access to the UFL representation of the PDE system. From this mathematical specification, we manipulate this appropriately via Slate and provide operators assembled from Slate expressions to PETSc for further algebraic preconditioning. Using this approach, we have developed a static condensation interface for the hybridization of mixed problems, and a generic interface for statically condensing hybridized systems. The advantage of writing even the latter as a preconditioner is the ability to switch out the solution scheme for the system, even when nested inside a larger set of coupled equations at runtime.
3.1.1 A static condensation interface for hybridization
As discussed in sections 2.1 and 2.2, one of the main advantages of using a hybridizable variant of a DG or mixed method is that such systems permit the use of elementwise condensation and recovery. To facilitate this, we provide a PETSc PC static condensation interface, SCPC. This preconditioner takes the discretized system as in (34), and performs the local elimination and recovery procedures. Slate expressions are generated from the underlying UFL problem description.
More precisely, the incoming system has the form:
(55) 
where is the vector of unknowns to be eliminated, is the vector of unknowns for the condensed field, and , are the incoming righthand sides. The partitioning in (55) is determined by the SCPC option: pc_sc_eliminate_fields. Field indices are provided in the same way one configures solver options to PETSc. These indices determine which field(s) to statically condense into. For example, on a threefield problem (with indices 0, 1, and 2), setting pc_sc_eliminate_fields 0,1 will configure SCPC to cellwise eliminate field 0 and 1; the resulting condensed system is associated with field 2.
In exact arithmetic, the SCPC preconditioner applies the inverse of the Schurcomplement factorization of (55):
(56) 
where is the Schurcomplement operator for the system. The distinction here from block preconditioners via fieldsplit (Brown et al., 2012), for example, is that does not require global actions; by design can be inverted locally and is sparse. As a result, can be assembled or applied exactly via Slategenerated kernels.
In practice, the only globally coupled system requiring iterative inversion is :
(57) 
where is the condensed righthand side and is another possible choice of preconditioner for . Once is computed, is reconstructed elementwise via inverting the local systems.
By construction, this preconditioner is suitable for both hybridized mixed and HDG discretizations. It can also be used within other contexts, such as the static condensation of continuous Galerkin discretizations (Guyan, 1965; Irons, 1965) or primalhybrid methods (Devloo et al., 2018). As with any PETSc preconditioner, solver options can be specified for inverting via the appropriate options prefix (condensed_field). The resulting KSP for (57) is compatible with existing solvers and external packages provided through the PETSc library. This allows users to experiment with a direct method and then switch to a more parallelefficient iterative solver without changing the core application code.
3.1.2 Preconditioning mixed methods via hybridization
The preconditioner HybridizationPC expands on the previous one, this time taking an system and automatically forming the hybridizable problem. This is accomplished through manipulating the UFL objects representing the discretized PDE. This includes replacing argument spaces with their discontinuous counterparts, introducing test functions on an appropriate trace space, and providing operators assembled from Slate expressions in a similar manner as described in section 3.1.1.
More precisely, let be the incoming mixed saddle point problem, where , , and and are the velocity and scalar unknowns respectively. Then this preconditioner replaces with the augmented system:
(58) 
where , , are the righthand sides for the flux and scalar equations respectively, and indicates modified matrices and covectors with discontinuous functions. Here, are the hybridized (discontinuous) unknowns to be determined.
The preconditioning operator for the hybridmixed system (58) has the form:
(59) 
where is the Schurcomplement matrix . As before, a single globally coupled system for is required. The recovery of and happens in the same manner as SCPC.
Since the flux is constructed in a discontinuous space , we must project the computed solution into . This can be done cheaply via local facet averaging. The resulting solution is then updated via , where is the projection mapping. This ensures the residual for the original mixed problem is properly evaluated to test for convergence. With as in (59), the preconditioning operator for the original system is:
(60) 
We note here that assembly of the righthand for the system requires special attention. Firstly, when Neumann conditions are present, then is not necessarily 0. Since the hybridization preconditioner has access to the entire Python context (which includes a list of boundary conditions and the spaces in which they are applied), surface integrals on the exterior boundary are added where appropriate and incorporated in the generated Slate expressions. A more subtle issue that requires extra care is the incoming righthand side tested in .
The situation we are given is that we have for , but require for . For consistency, we also require for any that
(61) 
We can construct such a satisfying (61) in the following way. By construction of the space , we have for :
(62) 
where , and are functions corresponding to the positive and negative restrictions associated with the th facet node^{4}^{4}4 These are the two “broken" parts of on a particular facet connecting two elements. That is, for two adjacent cells, a basis function in for a particular facet node can be decomposed into two basis functions in defined on their respective sides of the facet.. We then define our “broken” righthand side via the local definition:
(63) 
where is the number of cells that the degree of freedom corresponding to the basis function touches. Using (62), (63), and the fact that is linear in its argument, we can verify that our construction of satisfies (61).
4 Numerical studies
We now present results utilizing the Slate DSL and our static condensation preconditioners for a set of test problems. Since we are using the interfaces outlined in Section 3, Slate is accessed indirectly and requires no manuallywritten solver code for hybridization or static condensation/local recovery. All parallel results were obtained on a single fullyloaded compute node of dualsocket Intel E52630v4 (Xeon) processors with cores (2 threads per core) running at 2.2GHz. In order to avoid potential memory effects due to the operating system migrating processes between sockets, we pin MPI processes to cores.
Verification of the generated code is performed using parametersensitive convergence tests. The study consists of running a variety of discretizations spanning the methods outlined in Section 2. Details and numerical results are made public and can be viewed in Zenodo/TabulaRasa (2019) (see “Code and data availability”). All results are in full agreement with the theory.
4.1 LDGH method for a threedimensional elliptic equation
In this section, we take a closer look at the LDGH method for the model elliptic equation (signdefinite Helmholtz):
(64)  
(65) 
where and are chosen such that the analytic solution is . We use a regular mesh consisting tetrahedral elements (). First, we reformulate (64)–(65) as the mixed problem:
(66)  
(67)  
(68) 
We start with linear polynomial approximations, up to cubic, for the LDGH discretization of (66)–(68). Additionally, we compute a postprocessed scalar approximation of the HDG solution. This raises the approximation order of the computed solution by an additional degree. In all numerical studies here, we set the HDG parameter . All results were computed in parallel, utilizing a single compute node (described previously).
A continuous Galerkin (CG) discretization of the primal problem (64)–(65) serves as a reference for this experiment. Due to the superconvergence in the postprocessed solution for the HDG method, we use CG discretizations of polynomial order 2, 3, and 4. This takes into account the enhanced accuracy of the HDG solution, despite being initially computed as a lowerorder approximation. We therefore expect both methods to produce equally accurate solutions to the model problem.
Our aim here is not to compare the performance of HDG and CG, which has been investigated elsewhere (for example, see Kirby et al. (2012); Yakovlev et al. (2016)). Instead, we provide a reference that the reader might be more familiar with in order to evaluate whether our software framework produces a sufficiently performant HDG implementation relative to what might be expected.
For the CG discretization, we use a matrixexplicit iterative solver consisting of the conjugate gradient method preconditioned with hypre’s boomerAMG implementation of algebraic multigrid (AMG) Falgout et al. (2006). We use the preconditioner described in Section 3.1.1 to statically condense the LDGH system, using the same iterative solver as the CG method for the Lagrange multipliers. This is sensible, as the global trace operator defines a symmetric positivedefinite operator. To avoid oversolving, we iterate to a relative tolerance such that the discretization error is minimal for a given mesh.
4.1.1 Error versus execution time
The total execution time is recorded for the CG and HDG solvers, which includes the setup time for the AMG preconditioner, matrixassembly, and the timetosolution for the Krylov method. In the HDG case, we include the time spent building the Schurcomplement for the traces, local recovery of the scalar and flux approximations, and postprocessing. The error against execution time is summarized in Figure 1(a).
The HDG method of order () with postprocessing, as expected, produces a solution which is as accurate as the CG method of order (). While the full HDG system is never explicitly assembled, the larger execution time is a result of several factors. The total number of trace unknowns for the , , and discretizations is roughly four, three, and two times larger (resp.) than the corresponding number of CG unknowns. Therefore, each iteration is more expensive. Moreover, we also observe that the trace system requires more Krylov iterations to reach discretization error. The gap in total number of iterations starts to close as the approximation degree increases (see Figure 1(b)). The extra cost of HDG due to the larger degreeoffreedom count and the need to perform local tensor inversion is offset by the local conservation and stabilization properties which are useful for fluid dynamics applications.
4.1.2 Break down of solver time
The HDG method requires many more degrees of freedom than CG or primal DG methods. This is largely due to the fact that the HDG method simultaneously approximates the primal solution and its velocity. The global matrix for the traces is larger than the one for the CG system at low polynomial order. The execution time for HDG is then compounded by a more expensive global solve. We remind the reader that our goal here is to verify that the solve time for HDG is as might be expected given the problem size.
Stage  

(s)  %  (s)  %  (s)  %  
Matrix assembly (static cond.)  1.05  7.49 %  6.95  10.40 %  31.66  10.27 % 
Forward elimination  0.86  6.13 %  6.32  9.45 %  31.98  10.37 % 
Trace solve  10.66  76.24 %  43.89  65.66 %  192.31  62.36 % 
Back substitution  1.16  8.28 %  8.71  13.03 %  45.81  14.85 % 
Post processing  0.26  1.86 %  0.98  1.46 %  6.62  2.15 % 
HDG Total  13.98  66.85  308.37  
(s)  %  (s)  %  (s)  %  
Matrix assembly (monolithic)  0.50  12.01 %  2.91  11.39 %  26.37  24.11 % 
Solve  3.63  87.99 %  22.67  88.61 %  82.99  75.89 % 
CG Total  4.12  25.59  109.36 
Figure 3 displays the execution times on a simplicial mesh consisting of million elements. The execution times have been normalized by the CG total time, showing that the HDG method is roughly 3 times the execution time of the CG method. This is expected given the larger degreeoffreedom count. The raw numerical breakdown of the HDG and CG solvers are shown in Table 1. We isolate each component of the HDG method contributing to the total execution time. Local operations include static condensation (trace operator assembly), forward elimination (righthand side assembly for the trace system), backwards substitution to recover the scalar and velocity unknowns, and local postprocessing of the primal solution. For all , our HDG implementation is solverdominated.
Both operator and righthand side assembly are dominated by the costs of inverting a local square mixed matrix coupling the primal and dual variables, which is performed directly via an LU factorization. They should therefore be of the same magnitude in time spent. We observe that this is the case across all degrees (ranging between approximately 6—11% of total execution time for both local operations). Backsubstitution takes roughly the same time as the static condensation and forward elimination stages (between 8—15% of execution time across all ). This is expected, as these operations are all dominated by the cost of inverting the local matrix coupling the scalar and velocity degrees of freedom. The slight increase in time is due splitting the local equations into two local solvers: one for and another for the velocity. Finally, the additional cost of postprocessing accrues negligible time (roughly 2% of execution time across all degrees).
We note that caching of local tensors does not occur. Each pass to perform the local eliminations and backwards reconstructions rebuilds the local element tensors. It is not clear at this time whether the performance gained from avoiding rebuilding the local operators will offset the memory costs of storing the local matrices. Moreover, in timedependent problems where the operators may contain statedependent variables, rebuilding local matrices will be necessary in each timestep regardless.
4.2 Hybridizedmixed solver for the shallow water equations
A primary motivator for our interest in hybridized methods revolves around developing efficient solvers for problems in geophysical flows. In section, we present some results integrating the nonlinear shallow water equations on the sphere using test case 5 (flow past a mountain) from Williamson et al. (1992). We use the framework of compatible finite elements (Cotter and Shipton, 2012; Cotter and Thuburn, 2014).
4.2.1 Semiimplicit finite element discretization
We start with the vectorinvariant rotating nonlinear shallow water system defined on a twodimensional spherical surface embedded in :
(69)  
(70) 
where is the fluid velocity, is the depth field, is the Coriolis parameter, is the acceleration due to gravity, is the bottom topography, and
Comments
There are no comments yet.