1 Introduction
Highorder finite element methods (FEM) are becoming increasingly important in the field of scientific computing due to their increased accuracy and performance efficiency on modern computer architectures [12]. Many highorder applications are interested in parallel adaptive mesh refinement (AMR) on unstructured meshes containing quadrilateral and hexahedral elements. Such elements are attractive for their tensor product structure (enabling efficiency) and for their refinement flexibility (enabling anisotropic refinement). However, as opposed to bisectionbased methods for local refinement of simplices, hanging nodes that occur after local refinement of quadrilaterals and hexahedra are not easily avoided by further refinement [34]. We are thus interested in handling nonconforming (irregular) meshes [15, 35, 20]
, in which adjacent elements need not share a complete face or edge and where some finite element degrees of freedom (DOFs) have to be constrained to obtain a conforming, highorder
, or solution.In this paper we present a set of software abstractions and algorithms for handling large parallel nonconforming meshes and highorder function spaces, that we believe strikes the right balance between efficiency, ease of implementation and applicability to a wide range of problems. In particular, with a single general algorithm we handle a large class of 2D and 3D AMR meshes that consist of quadrilaterals and hexahedra, possibly mixed with triangles and triangular prisms. Generally, the elements can be refined by bisecting their edges, and we allow anisotropic refinement of tensor product elements, meaning that e.g. a hexahedron can be split into 2 or 4 hexahedra only. In addition, the method we propose is on a general discretization level, independent of the physics simulation, and supports the whole de Rham sequence of finite element spaces, at arbitrarily high order. Our approach is also compatible with curvilinear meshes, as well as with highorder finite element techniques such as static condensation and hybridization [16]. While the algorithms we present can be extended to general refinement, in this paper we focus exclusively on nonconforming refinement with a fixed order .
The methodology of constraining hanging nodes in nonconforming meshes is well established in the literature. Most authors opt to construct algebraic constraints that express constrained degrees of freedom as linear combinations of true degrees of freedom. The constraints are then applied to either the local element matrices as part of the global matrix assembly [1, 15, 10, 25, 5], after global assembly to “condense” the constrained DOFs out of the linear system [4], or, more rarely, at the shape function level [40] or the linear solver level [37]. Yet another possibility is to integrate the interpolation of hanging nodes within a multigrid framework [7, 33]. Our approach is closest to the method of adjusting the global linear system, but we cast the elimination of constrained DOFs as a form of variational restriction. The practical advantage is that the stiffness matrix assembly stays unchanged and completely independent of the treatment of AMR, and we also avoid intermixing AMR with the numerical method of the simulation.
A large part of this paper is concerned with an efficient algorithm to construct a global operator that interpolates the constrained degrees of freedom from the true degrees of freedom. The algorithm is based on composing local interpolation matrices into a global one. It is easy to implement, works for a variety of finite element spaces and element geometries and is unique in being able to handle 3D anisotropic refinements as well as arbitrarily irregular meshes where adjacent elements are not limited to a 2:1 refinement ratio. The resulting interpolation matrix is also conveniently used to constrain duplicate nodes in parallel domain decomposition.
A modern finite element solver needs to scale to hundreds of thousands of parallel tasks. To achieve that, the mesh has to be fully distributed, as the global mesh no longer fits on each processor. Historically, this is not been the case in popular codes such as libMesh [25]. The deal.II project [4] has switched to the p4est library to distribute the mesh in 2011. In addition to mesh distribution, an AMR solver at this level of parallelism requires an extremely efficient partitioning and load balancing algorithm that can be applied often. High quality spectral type partitioners, such as ParMETIS [24], may be prohibitively expensive for this purpose. We thus take inspiration from octreebased approaches [18, 8, 22] where only the coarse elements (the octree roots) are shared by all parallel tasks and where the refinement tree provides a natural way to enumerate elements. The resulting sequence is equipartitioned and distributed among processors, which has been shown to be a scalable way to load balance the mesh [8]. In contrast to octree methods, however, we allow for more general refinement and support additional types of elements. Our partitioning algorithm is thus closest to the REFTREE method [31].
The rest of the paper is organized as follows: in Section 2, we define the class of meshes we support. In Section 3 we describe the general procedure to construct the AMR system through variational restriction. Section 4 is devoted to the algorithm for constructing the AMR interpolation operator, in serial. The internal data structures for elementbased AMR in the context of highorder finite elements are presented in Section 5. Extending these methods to parallel settings is the subject of Section 6. Finally, Section 7 presents numerical experiments that illustrate the performance of our algorithms in practice.
2 Nonconforming meshes
For the purposes of this paper, and following the definitions in [30], an element is a (closed) triangle, quadrilateral, triangular prism or hexahedron. An element contains , , with the usual definitions, and three dimensional elements contain . A mesh is the union of elements, not necessarily all of the same type, such that is a connected, bounded region in or . Let and denote the interior and boundary of element . We assume that the element interiors do not intersect, i.e. , for . A mesh is said to be conforming if the set , , is a common vertex, common edge, common face or an empty set. A vertex of an element is called a hanging node, if it lies in the interior of an edge or face of another element. A mesh that contains at least one hanging node is called nonconforming or irregular.
To refine an element means to replace it with at least two smaller elements of the same type, such that . We call the elements child elements and the parent element. The parent element is removed from the mesh after refinement, but it is stored along with links to the child elements in a refinement tree. In an inverse, coarsening process^{1}^{1}1A better term would in fact be derefinement, meaning that only previously introduced refinements can be removed., the parent element can later be reintroduced to the mesh, once its child elements are removed.
When refining an element, we require that new vertices be placed in the center of edges or faces of the parent element. We thus only support bisection of edges during refinement. In this way a triangle can be split into 4 children, a quadrilateral into 2 or 4 children, and a prism or hexahedron into 2, 4 or 8 children. We refer to refinements that split all edges of the parent element as isotropic refinement, and the remaining refinement options as anisotropic. Some examples of the types of nonconforming meshes we support (obtained by refining initially conforming meshes) are shown in Figure 1.
A nonconforming mesh is said to be consistent if lower dimensional mesh entities (i.e., vertices, edges, faces) that intersect are either identical, or one is a proper subset of the other (see Figure 8 for an example of an inconsistent mesh). We refer to the smaller entity as a slave and the larger containing entity as a master. Lower dimensional entities of a consistent mesh that are neither masters nor slaves are called conforming entities.
Note that a slave face has exactly one master face, but a 3D edge can have multiple master edges. In that case we always take the smallest master edge, which in turn may be a slave to its smallest master. This hierarchy will end with a maximal master edge that is a not a slave itself. Note that any master edge can be shared by several elements in a “conforming” manner. We call the interface between slave faces and their master, or slave edges and their maximal master the coarsefine interface.
3 Nonconforming AMR by variational restriction
In the continuous Galerkin method [14], one starts with a weak variational problem: Find , such that
(1) 
where and are bilinear and linear forms, respectively, on the innerproduct space , typically a Sobolev space on the domain with imposed essential boundary conditions. The problem (1) is discretized by constructing an approximate finite dimensional subspace , , on the mesh and by assembling the stiffness matrix
and load vector
, so that(2) 
This is equivalent to the linear system , which yields the approximate solution vector .
When constructing the space , care must be taken to ensure that conformity requirements of the space are met, so that is a subspace of . For example, if is the Sobolev space , the functions in must be kept continuous across element boundaries. If is an space, the tangential component of the finite element vector fields in needs to be continuous across element faces, etc. In highorder FEM this is achieved by matching suitable shape functions at degrees of freedom associated with the mesh vertices, edges and faces, which together with DOFs in the element interiors form the basis functions of [39].
When the mesh is nonconforming, the shape functions cannot be matched directly at the coarsefine interfaces. If we assign degrees of freedom in the usual way on the conforming entities (including vertices, and maximal master edges in 3D) and disregard any conformity in the interior coarsefine interfaces, we obtain a larger space that is “broken” along these interfaces, i.e., slave entities have DOFs that are independent of their master DOFs. We call the partially conforming space, with .
To restore conformity at the coarsefine interfaces, the degrees of freedom on slave entities (slave DOFs) must be constrained so that the slave entities interpolate the finite element functions of their masters. Since the masters and slaves have matching function spaces (all elements have the same order ), the slave DOFs can simply be expressed as linear combinations of the master DOFs. Examples of such local constraints for loworder elements are shown in Figure 2.
Note that in general nonconforming meshes, some DOFs on the boundaries of master entities may themselves be constrained by another master entity, or a chain of them. In this paper we are concerned with solvable nonconforming meshes, where each such dependency chain ends with unconstrained degrees of freedom. The final set of unconstrained DOFs is called true DOFs, denoted . A solution vector in the partially conforming space can thus be written as
(3) 
where represents all slave DOFs and can be evaluated at any time by a linear interpolation for some interpolation matrix . We can also write
(4) 
will be called the conforming prolongation matrix. The columns of define the conforming basis functions of , expressed as vectors in . We also use to variationally restrict the solution to the conforming subspace .
Since we want to upgrade an existing finite element code to support nonconforming meshes with as few modifications as possible, we let the code assemble the stiffness matrix and load vector in the space , as if there were no hanging nodes in the mesh. Of course, when computed in , the discrete problem (2) and the corresponding linear system result in a nonconforming solution where the slave DOFs are not constrained. However, taking and , the variational formulation on becomes
(5) 
so we can solve the smaller system
(6) 
where the slave DOFs are eliminated. We then prolongate to obtain a conforming solution .
An important point regarding essential boundary conditions in is that they have to be applied in the system (6) after the variational transformation, i.e., based on the matrix and right hand side . This is in contrast to many nonAMR finite element settings, where one can eliminate the essential boundary conditions on element level. Indeed, it is straightforward to verify that for a general submatrix , the elimination based on and followed by variational restriction is different than performing variational restriction and then elimination. In our experience, ensuring that the essential boundary conditions are eliminated at the very end is the main change needed in applications adopting the proposed AMR approach.
Another algebraic transformation that is commonly used in practice is static condensation, see e.g. [42]. Since static condensation manipulates DOFs in the interior of the element, and such DOFs are always true, AMR and static condensation commute, i.e., static condensation followed by AMR variational restriction is the same as AMR reduction followed by static condensation. The variational restriction approach can also be used to couple AMR with hybridization, see [16].
4 Constructing serial
We now describe the algorithm to calculate the conforming prolongation matrix in a singleprocessor setting. As indicated in the previous section, factoring out of the coupled AMR matrix makes it easier to incorporate AMR in applications. The explicit availability of can also facilitate the development of AMRspecific linear solvers and preconditioners. Many codes apply without assembling it [41], but in our case it is more convenient to build it explicitly due to the complex dependencies that can arise in arbitrarily irregular 3D meshes.
The matrix is constructed for a concrete instance of a partially conforming finite element space . The procedure described below constructs both the true DOF identity and the slave interpolation submatrices of . The first step is to determine the masterslave relations of nonconforming edges and faces. Algorithms for that are described later. Here we assume that we are given lists of master edges and faces and their associated slaves.
For each slave entity , the position of its vertices within the reference domain of its master edge/face is known. For example, the vertices of the slave face in Figure 3 could have coordinates {(0.5, 0), (1, 0), (1, 0.5), (0.5, 0.5)} in the reference domain of the master face. From these we can calculate the local interpolation matrix which expresses the slave DOFs (including DOFs on the boundary of the slave face) as linear combinations of the DOFs of the master face (including its boundary DOFs). In the case of nodal elements, the rows of contain just the shape functions of the master face finite element evaluated in the nodal positions of the slave face, i.e. the slave face performs nodal interpolation of the master face finite element function. Since the polynomial degrees of the two faces match, the interpolation is exact. Although we primarily use nodal elements, the local interpolation matrix is not limited to nodal elements and can be computed for any kind of finite element.
Note that sometimes a row of will be of the trivial form = (0, …, 0, 1, 0, …, 0), which means that the slave DOF is identical to the master DOF and may not be constrained at all. This is the case, for example, for the node marked with “” in Figure 3. Since the matrices include the boundary DOFs of the slave face, certain dependencies will be computed multiple times — this is done to simplify the implementation at a relatively small computational cost.
As we iterate through the local interpolation matrices of all slave edges/faces, we continuously update a global dependency matrix , . The matrix , corresponding to the DOFs in , is represented as a sparse matrix and is initialized to identity. For each , we assign the local interpolation weights to for each slave DOF that depends on a master DOF . This allows us to identify three kinds of rows in the dependency matrix at the end of this phase:

Identity rows: . These represent unconstrained, true degrees of freedom.

Direct slave rows: nonidentity with nonzero entries corresponding only to unconstrained degrees of freedom . These are slave degrees of freedom directly dependent on true DOFs.

Indirect slave rows: the remaining nonidentity rows represent slave degrees of freedom dependent on true and/or other slave DOFs. These rows appear because we do not restrict the mesh regularity, see Figure 5 and the discussion below.
Let be the number of identity rows of the matrix. This is the number of true degrees of freedom and the dimension of the conforming finite element space . We initialize as an sparse matrix with identity in the rows corresponding to the true DOFs. To fill in the remaining rows, we introduce a Boolean vector of size , which keeps track of which rows in have been resolved. If is a true DOF, we initialize , otherwise we set . We loop over the rows of , scanning for rows where and for all nonzero . These rows correspond to unresolved constrained DOFs that depend on already resolved DOFs. When such is found, the (sparse) row is set to
(7) 
and the resolution is marked by setting to one.
This procedure is iterated until all unresolved are assigned a linear combination of already known . In the first sweep, all direct dependencies are resolved (e.g. the dependence of on and in Figure 5), which unlocks indirectly dependent slave DOFs that are resolved in subsequent sweeps (e.g. the dependence of on in Figure 5). The number of sweeps needed is based on the longest indirect dependency in the nonconforming mesh known as irregularity [2, 15, 11]: 1irregular meshes have only one level of dependency, while irregular meshes will require sweeps to resolve all rows of . In practice, we typically have at most 3irregular meshes, so only a few sweeps are needed for all rows of to be resolved.
The algorithm is only guaranteed to finish if there are no cyclic dependencies, i.e., if the mesh is solvable. In the serial algorithm we include an explicit check: if there are still unresolved rows but none of them has for all nonzero , we print an error message. This never happens if the AMR process starts on an initially conforming (regular) mesh. However, it is possible to construct a nonconforming mesh with cyclic dependencies. Figure 5 shows a mesh (inspired by [11]) that is representable by our data structure but leads to an infinite loop in the unchecked matrix algorithm. Note that in this case, a single level of additional refinement breaks the cycle and makes the mesh solvable.
5 Serial data structures
In this section we describe how we represent nonconforming meshes and how we determine masterslave relations needed for the construction of . A highorder FEM code needs to be able to track vertices, edges and faces and their relations to incident elements. Octreebased algorithms can derive this connectivity from the structure of the tree itself [41], though on octree boundaries the algorithms get complex (see interoctree connectivity in [8]). Since our meshes are more general than octrees, we take a more traditional approach with explicit links between mesh entities. A comprehensive and mathematically elegant approach is the Sieve representation [26], storing a graph with edges connecting entities whose dimension differs by one. However, this cellcomplex representation does not generalize naturally to nonconforming meshes [23] and stores more information than we need.
As an alternative, we merely connect each dimensional entity to its vertices and store the inverse mappings (from vertices back to edges and faces) as hash tables. This simple data structure provides the mesh connectivity we need and allows us to traverse the nonconforming interfaces. More specifically,

edges and hanging vertices are identified by pairs of vertex indices,

faces are identified by four vertex indices, and contain two element references,

elements contain indices of up to 8 vertices, or indices of children if refined.
For a uniform hexahedral mesh, this representation requires about 290 bytes per element, counting the vertices, edges, faces and their 32bit indices, and including the hash tables and the refinement trees. For a highorder code we believe this to be acceptable, as a highorder mesh typically contains at least 8 times fewer elements than a comparable loworder mesh.
5.1 Refinement trees
Since we aim for dynamic AMR capable of coarsening, we need to store the refinement history. When the elements of the initial mesh get refined, they are removed from the list of active elements and serve as roots of refinement trees. A refined element points to its children instead of its vertices, as shown in pseudocode in Figure 7. To save memory, we use 32bit integers to identify entities instead of pointers. An element can have fewer than 8 children if it is refined anisotropically. We store a computed list of all leaf nodes of all refinement trees. This list represents the current state of the mesh.
5.2 Hash maps
The mapping from vertices to other entities is done by hash tables. We illustrate this on an example shown in Figure 7. Assume is a 3D face. Across this face there are three neighboring elements with faces , and . To access edges and faces from the incident elements, we can call
A similar hash map exists for hanging vertices and is used to traverse nonconforming interfaces in Section 5.4:
Each Get function searches the respective hash table and returns a 32bit entity index. The order of arguments is not relevant, i.e., GetVertex = GetVertex. We use the convention that the Get functions create the requested entity if it does not exist. To query its existence, each GetX function has a FindX counterpart taking the same arguments and returning nil (encoded as ) if the entity does not exist, instead of creating it. We make sure the hash tables are properly (dynamically) sized to get amortized complexity of each query.
5.3 Refinement and coarsening
To refine e.g. a hexahedron, we first create new midedge vertices (or get already existing ones) by calling GetVertex for all 12 edges. Six midface vertices are accessed through opposite midedge vertices; the two options per face must be checked first with FindVertex to prevent creating a duplicate midface vertex, in case it already exists. The midelement vertex is accessed from any two opposite midface vertices. New elements are created with indices of the new vertices. The old element is marked as refined and its vertex indices are replaced with indices of the child elements. Reference counting is used to keep track of the number of elements pointing to each vertex, edge and face. If the reference count drops to zero, the entity is deleted.
Our data structure allows for anisotropic refinement. The challenge is maintaining consistency of the mesh when refining anisotropically. It is easy to refine adjacent elements in such a way that their faces are not subsets or supersets of each other, so there is no way to determine a masterslave relationship. This is illustrated in Figure 8. To resolve such a situation, the neighbor of the element being refined has to be refined as well. Such forced refinements may propagate, until a globally consistent mesh is obtained. We defer the treatment of this ripple effect to a followup publication.
When coarsening an element, the indices of its original vertices are first retrieved from the children (since the refinement pattern is known). All of the children need to be leaves, otherwise they are recursively coarsened first. In an inverse process to refinement, the child elements are destroyed and associated vertex, edge and face reference counts are decremented. The coarse element is reactivated, and the appropriate reference counts are incremented.
5.4 Calculating masterslave relations
We loop over all leaf elements and their faces. Each face, triangular or quadrilateral, is checked if it is a master face by invoking the function TraverseTriFace (Algorithm 1) or TraverseQuadFace (Algorithm 2). These functions attempt to recursively descend an implicit face refinement “tree” and return a list of slave faces, if they can be reached, or an empty list if the starting face is conforming. The function FindVertex is used here to obtain vertices that were created at the center of edges of previously refined elements. During descent, the position within the master face’s reference domain is tracked by the points , which are initialized to the reference domain corners (at ).
In the case of quadrilateral faces, the face refinement “tree” is traversed as a binary tree, i.e., isotropically refined faces are treated as if refined anisotropically twice (horizontally and then vertically relative to their reference domain, or vice versa). The function FaceSplitType (Algorithm 3) determines which way a face is split, again using FindVertex to look up midedge and midface vertices (refer to Figure 7 for a quadrilateral face example).
An analogous function, TraverseEdge, is employed to construct the list of masterslave relations for edges. Edges of each leaf element are checked for slave edges by traversing the local vertex hierarchy in a similar recursive manner.
The 2D case is treated with the same code as a degenerate case: triangular and quadrilateral elements merely have a different number of vertices/edges in internal geometry tables and they have zero faces, so creation and traversal of faces is skipped.
We make no assumptions regarding the level of refinement of adjacent elements. The simple algorithms above work for general nonconforming 3D meshes, including meshes with anisotropic refinements.
5.5 Element neighbors
We define two elements to be neighbors if their closed sets have a nonempty intersection, even if the intersection is just a single vertex (i.e. we consider all of vertex, edge and faceneighbors). Although algorithms exist to determine such neighbors from the structure of the refinement tree itself, the handling of multiple trees may become complex [8].
Instead, we employ an approach based on a Boolean matrix of elementvertex connectivity. Let be a Boolean matrix containing one row per element and one column per mesh vertex. For each element, its row marks vertices that belong to the element, including possible midedge and midface vertices of neighboring refined elements. The incident vertices are collected for each element by a procedure similar to Algorithms 1 and 2. Then, if is a Boolean vector representing the set of elements whose neighbors we need to find, the vector will represent the original set of elements plus all its neighbors. The product is not stored, only its action is computed on demand. To save even more memory, the obvious corner vertices of each element are also not explicitly stored in . The matrix is thus empty if the mesh is conforming.
6 Parallelization
We split the mesh into disjoint regions (see next section), where is the number of MPI tasks. The partitioning is element based, i.e., each element is assigned to one of the tasks . The vertices, edges and faces on the boundary of each task’s region are duplicated, so that each region can be treated as an isolated mesh and passed to the serial part of the code. Repeating the approach from Section 3, we assemble the stiffness matrices and load vectors locally on each MPI task , as if the mesh was conforming and serial on each task.
Globally, we have a new, parallel finite element space , with . The parallel stiffness matrix consists of diagonal blocks and the parallel load vector contains the blocks . Again, the solution of the parallel system would be disconnected along the interfaces of the regions, but we can use the same variational restriction approach to obtain a globally conforming parallel solution. The parallel conforming prolongation matrix, still denoted , is now of the size and conveniently handles both conforming interpolation and parallel decomposition, by mapping from the space directly to .
In our current implementation, we explicitly form the parallel triple matrix product using the triplematrixproduct kernel from the hypre library [21]. This kernel is highly optimized in hypre as it is used internally for the coarsegrid operator construction in its algebraic multigrid solvers. For higher order elements we are also considering direct evaluation of the action of , without assembly.
6.1 Treebased partitioning
Similarly to [19, 9, 38, 31, 8, 6], we partition the mesh by splitting a spacefilling curve (SFC) obtained by enumerating depthfirst all leaf elements of all refinement trees. We assume that the elements of the coarse mesh are ordered as a sequence of faceneighbors, so that a globally continuous SFC can be obtained. For ordering the coarse mesh (i.e., the refinement tree roots) we use the Gecko library [28]. Once the coarse mesh is ordered we invoke depthfirst traversal for the ordered and oriented roots. Orienting a root element means assigning an initial traversal state (see below) so that SFCs of successive trees are connected.
In the case of quadrilaterals and hexahedra, the simplest traversal with a fixed order of children at each tree node leads to the wellknown Zcurve, as shown in Figure 10 (left). However, it turns out that simply by changing the order the subtrees are visited at each internal node, the Hilbert curve can be obtained at no additional cost (i.e., the elements do not need to be physically rotated). The Hilbert curve produces continuous partitions and leads to better interprocess connectivity [9] than the Zcurve. In the simpler 2D case, each tree node can be in 8 states (4 rotations times 2 inversions). Given a state of the root node, we descend to subtrees in the appropriate order, passing a new state to each, according to Table 10. Example of a Hilbert curve partitioning is shown in Figure 10 (right). The method is analogous for hexahedra, where a tree node has 24 states.
Generating a continuous SFC is more difficult for triangles, prisms and in the presence of anisotropic refinements. The Sierpinski curve is optimal for triangles [31]
, but we would have to switch to triangle bisection. Prisms admit a continuous SFC analogous to hexahedra, but not all elements can be face neighbors (only edge neighbors). The most serious difficulty is with anisotropic refinement, which probably prevents a continuous SFC, and generating a good leaf sequence is an open problem.
At the beginning of the computation, we assume that the mesh is small enough to fit completely in the memory of each MPI task. Each task uses a copy of the serial mesh and traverses its trees as described. The list of leaf elements is split into equal sized parts and each part is assigned to one MPI task. Each task thus owns a region of the mesh. The region may not have a minimal surface, but it should be continuous and relatively compact due to the SFC locality [9].
6.2 Ghost layer
Although the serial code sees a nonoverlapping decomposition of the mesh, we internally hold the (pruned) refinement trees on all MPI tasks, as in [8]. For the purpose of constructing , we also track ghost elements for each MPI task. A ghost element is one that is a vertex, edge, or faceneighbor to an element owned by an MPI task, but is itself owned by another task. The set of all ghost elements forms a minimal layer of elements enclosing a task’s region. This layer needs to be kept synchronized with neighboring tasks.
On each task, elements beyond the ghost layer may not correspond to real elements, and we prune the refinements in this area immediately after partitioning the serial mesh and identifying the ghost elements. Pruning the refinement tree means removing (coarsening) all subtrees that contain only leaves not owned by the current task nor belonging to the ghost layer [3]. After this step, the mesh becomes fully distributed as no single task sees all of the actual leaf elements. This is illustrated in Figure 11 for two tasks in 2D.
The last type of elements we identify are parallel boundary elements. This layer is a subset of the region owned by each MPI task formed by elements which are vertex, edge, or faceneighbors to the ghost layer. The boundary elements constitute ghost elements of one or more neighboring tasks and any change in them needs to be communicated to the neighbors to keep all the ghost layers synchronized. For example, if a refinement occurs in one task’s boundary layer, a message needs to be sent to all neighboring tasks that share the elements, so they can duplicate the refinement in their ghost layer.
6.3 Load balancing
Balancing the mesh so that each MPI task has the same number of elements ( if the total number of elements is not divisible by the number of tasks) is relatively straightforward in the context of SFC based partitioning. A parallel partial sum ( MPI_Scan) is done to determine new beginnings of the partitions in the global sequence of leaf elements. New assignments of Element::rank within each parallel region can then be computed. Next, the new assignments within the ghost layers must be communicated between neighboring tasks (this is the most expensive part). This facilitates the final step, in which elements no longer assigned to the current task are sent to the new owners, together with a layer of ghost elements, to ensure a valid new state of the distributed mesh. The exchange of elements ends when each task obtains the right number of elements, which is known beforehand. After the load balancing procedure, a pruning step is done on each task to remove branches of the refinement tree that no longer need to be represented, i.e., subtrees that only contain leafs beyond the ghost layer are removed.
6.4 Message encoding
The prime advantage of storing the coarse mesh (i.e., all refinement tree roots) on all MPI tasks is the ability to refer to any element of the global mesh by the index of a root element and the refinement path to the element in question. However, rather than identify elements in this way individually, we develop an algorithm to encode a subtree of the global refinement hierarchy, since our messages usually carry information for multiple elements at once.
Given a set of leaf elements in a local mesh, we use the member Element::parent recursively to identify the set of trees participating in the element set and store their indices. In each tree, we then descend along the same paths back to the leaves. At each node, we output an 8bit mask indicating which subtrees contain elements from the set. We output a zero mask to terminate the descent at leaves.
The functionality of encoding element sets is used for load balancing, but also as a basis for encoding sets of vertices, edges, faces and their degrees of freedom, as required in Section 6.5. A vertex, edge or face can be identified by the index of an element in an element set, followed by the local number of the vertex/edge/face within that element. A DOF is identified by its index within its mesh entity.
We use variable length MPI_BYTE messages and try to collect as much data as possible for communication between each pair of tasks in order to minimize the total number of messages.
6.5 Constructing parallel
The algorithm to build the matrix in parallel is more complex, but conceptually similar to the serial algorithm. We still express slave DOF rows of as linear combinations of other rows; however, some of them may be located on other MPI tasks and must be communicated first.
The ghost layer allows us to determine, without communication, which vertices, edges and faces are shared by more than one task. As we traverse the local mesh and its ghost layer using the serial algorithms of Section 5.4, we build a list of groups of MPI ranks, which we call communication groups. Each vertex, edge and face is assigned a communication group index, and an owner rank. In a group, the task with the lowest rank is defined to be the owner. We store the owner explicitly for each mesh object, because as the next step, we modify the communication groups so that master and slave faces and edges are grouped together. This is to ensure that information will flow from the owners and masters to nonowners and slaves.
Next, for a concrete finite element space, we assign local degrees of freedom on each MPI task . These are the DOFs of all elements of the task’s region, whether slave, master or conforming. Globally, . Each local DOF has a unique owner, inherited from its underlying vertex, edge or face. DOFs with a remote owner are expected to later receive a row of the matrix that will make them identical to a remote true DOF.
In addition to local DOFs, we also assign virtual DOFs within the ghost layer (if not already marked as local), all with indices greater than . We define vectors , , of size , where for each DOF : is the owner rank, is the communication group index and is the resolution flag, as in Section 4.
On each task we define a local dependency matrix , this time and , initialized to identity on the block. Reusing much of the serial code, we again collect the slave interpolation weights for each slave DOF that depends on master DOFs . Note that may now be a ghost DOF, . Still without any communication, we can identify the following types of rows :

Identity rows, with . These are true DOFs that we own.

Identity rows, with . Here is a true DOF owned by another rank.

Local slave rows, nonidentity with for all nonzero . These are slave DOFs dependent only on local DOFs.

Remote slave rows, nonidentity with at least one nonzero for . To resolve these slave DOFs, one or more remote rows need to be received first.
Let be the number of rows of type 1 on task . We perform a parallel partial sum ( MPI_Scan) on to determine the global index of the first true DOF for each rank, i.e. the column partitioning of . We then run a version of the iterative algorithm of Section 4, wrapped in one more outer loop. The outer loop communicates remote rows of when there is nothing more to be done by the serial (local) inner loop. In the first outer iteration, the inner loop resolves all DOFs that do not depend on remote DOFs. Then we send the rows of all DOFs that have just been resolved to neighboring tasks. The group of a DOF determines which ranks need to be sent its row. We collect the rows and send a single message per neighbor. The messages are decoded by the recipients and the received rows, containing global column numbers, are assigned to appropriate ghost DOFs. This unlocks (by setting the ghost DOF ) in the next outer iteration more DOFs that have been waiting for remote data. The complete procedure is summarized in Algorithm 4.
Situations may occur in which a ghost layer in an MPI task does not provide complete information on the ranks that depend on a particular master DOF. One such setup is shown in Figure 12, where DOF is owned by rank 0, and the corresponding row is needed by both rank 1 and rank 2 (e.g., to constrain DOF owned by rank 2). The group should therefore be , but rank 0 does not see rank 2 elements—these are beyond its ghost layer. Since rank 0 sees the group , the message is only sent to rank 1. However, rank 1 sees the correct group, and so it forwards the message to rank 2. This is accomplished by line 19 of Algorithm 4, which compares the sender’s version of the group (encoded in the message) with the recipient’s own view, and the message is sent to the missing ranks. To prevent infinite loops, a message is only allowed to be forwarded once. Testing shows that the number of rows that need forwarding is rarely more than 1%, typically in cases where the number of elements per task is small.
7 Numerical results
In this section we present numerical experiments that illustrate the performance of our unstructured AMR algorithms in practice. The results were obtained with the opensource implementation of the proposed methods in the MFEM finite element library [29].
7.1 Model problems
As a first test, we solve a standard AMR benchmark problem with a known exact solution, described in [32]. The goal is to reveal potential errors in the prolongation matrix algorithm by checking the behavior of the convergence curves. We also examine the benefit of AMR compared to uniform refinement.
The problem is the Poisson’s equation on the unit square with a Dirichlet boundary condition. The right hand side is chosen so that the exact solution is
(8) 
The solution has a sharp circular wave front of radius centered at , as shown in Figure 14 (left). The parameters are , and . We start with a mesh consisting of squares and use nodal finite elements of degree (a subspace of ) to discretize the problem. We then perform an adaptation loop consisting of the following steps:

Solve the problem on the current mesh to get an approximate solution .

For each element , integrate the energy norm of the exact error,
(9) 
Record the total error and the current number of DOFs.

Refine elements for which .
Since the exact solution is not a polynomial, the errors must be calculated with a sufficiently high order integration rule (we used a rule accurate for order 30), especially in the first few iterations. Figure 14 (center) shows the mesh after 11 iterations. The convergence curves are graphed for in Figure 14 (left; solid lines). We observe that the convergence histories exhibit the expected behavior.
For comparison with a previous (nonAMR) version of our code, we also plot convergence curves for the case of uniform refinement, meaning we refine all elements in Step 4 above in order to avoid hanging nodes. As expected, the benefit of local refinement is substantial. Moreover, the benefit appears greater for higher order discretization. For completeness, we also include the best available result for the problem (according to [32]), obtained with FEM, a method that adapts both the element size and the element order (the exact method used is described in [13]).
Anisotropic Refinement
Since our discretization supports anisotropic refinement, we test the same problem again and enable splitting quadrilaterals along one axis only, depending on the shape of the error. Assuming element is selected for refinement in Step 4 based on its error , we project and scale the error gradient along each transformed reference space axis of the element, and integrate the following anisotropic error indicators:
(10) 
where denotes the th column of the element transformation Jacobian. We then define the threshold and mark for refinement in its th reference axis if . Figure 14 (right) shows the anisotropically refined mesh after 12 iterations, and the convergence is compared to both isotropic and uniform refinement in Figure 14 (right). Even though the wave front in the solution is not really aligned with the mesh, many elements could still be refined in one direction only, which saved up to 48% DOFs in this problem.
3D Benchmark
Since we aim for 3D AMR, we present a straightforward generalization of the benchmark problem into 3D. This time, the right hand side is designed for the exact solution (shown in Figure 16, left) to be
(11) 
with similar parameters, , and . We start with a mesh of hexahedra of degree and execute the same AMR loop as in 2D, for both isotropic and anisotropic refinement.
Figure 16 shows the isotropic and anisotropic meshes after 12 and 14 iterations, respectively. Figure 16 plots the convergence histories of uniform refinement, isotropic AMR and anisotropic AMR. We obtain very similar behavior as in the 2D case. The convergence curves are smooth and do not reveal any irregularities. The 3D version of the problem benefits even more from anisotropic refinement: more than 80% DOFs are saved for =4. Even a slight anisotropy in the solution means that elements with edge length ratios of 1:2, 1:4, or more, can be used.
Curved Meshes
One of our target applications is highorder Lagrangian hydrodynamics, where it is routine to use highorder (curvilinear) meshes. In MFEM, the mesh curvature is handled simply by maintaining a vectorvalued finite element function that represents the physical position of all nodes and, in turn, of any point within the mesh. The associated finite element space is a subspace of , has its own conforming prolongation matrix to keep the curvature continuous, and does not need to coincide with the solution space. When the mesh is refined, the curvature function is interpolated from the original space to the finer space. We use a version of the anisotropic 3D benchmark problem to test this functionality on a spherical domain, as illustrated in Figure 18.
7.2 Parallel scalability
To demonstrate the parallel scalability of our algorithms, we designed a test to put as much stress on the AMR infrastructure as possible. We choose the simple Poisson problem again and fix the element order to . This is the lowest order that has both edge and face DOFs and where the stiffness matrix is still cheap to assemble. We also omit the linear solver and instead only perform nodal interpolation of the exact solution. This exposes the following components of the AMR iteration:

constructing the matrix,

assembling the parallel AMR system (using hypre’s function),

refining elements and synchronizing ghost layers,

load balancing the whole mesh at the end of the iteration.
For the exact solution, we reuse the 3D “wave front” function from the previous section. To make the problem larger, we sum two of the functions with radii 0.2 and 0.4, and center both at . The gradient is also steeper with . We initialize the mesh to hexahedra and repeat the following steps, measuring their wall clock times (averaged over all MPI ranks):

Construct the finite element space for the current mesh (create the matrix).

Assemble locally the stiffness matrix and right hand side .

Form the products , .

Eliminate Dirichlet boundary conditions from the parallel system.

Project the exact solution to by nodal interpolation.

Integrate the exact error on each element.

Refine elements with . Synchronize ghost layers.

Load balance so each process has the same number of elements (1): redistribute elements and synchronize ghost layers again.
Figure 18 shows 1/8 of the mesh after several iterations of this AMR loop on 16384 MPI tasks. The colors represent MPI rank assignment.
Out of the approximately 100 iterations of the AMR loop, we select 8 iterations that have approximately 0.5, 1, 2, 4, 8, 16, 32 and 64 million elements in the mesh at the beginning of the iteration and plot their times as if they were 8 independent problems. This is possible because the sequence of meshes is always the same, regardless of the number of CPU cores used. We run from 64 to 393216 (384K) cores on LLNL’s Vulcan BG/Q machine. Starting with 65536 cores, there are fewer elements in the initial mesh than parallel tasks, but this is not a problem in our implementation.
Figure 19 shows the total times for the selected iterations. The solid lines show strong scaling, i.e. we follow the same AMR iteration and its total time as the number of cores doubles. The dashed lines skip to a twice larger problem when doubling the number of cores, showing weak scaling, which should ideally be horizontal.
Figure 20 is a partial breakdown of the total iteration time, showing individual scaling of dominant components of the iteration: matrix construction, local assembly, triple product, element refinement and load balancing. Local assembly scales perfectly because it contains no communication and the mesh is always perfectly balanced. The remaining steps do communicate, and so scale with different degrees of success, the worst being refinement and load balancing. Fortunately, these steps take relatively little time compared to the total iteration time, even in this synthetic benchmark containing no physics and solvers. Moreover, the load balancing time should be well worth the perfect scaling of the nocommunication parts.
The last graph in Figure 20 shows the weak scaling of two memory measurements. The first is the maximum RSS (resident set size) obtained for each MPI process with getrusage(). This is the high watermark of the RAM allocated for the process by the operating system. The second measurement is the total memory (in MB) used by the nonconforming mesh data structure only. This is the more difficult of the two tests, as this particular data structure stores the ghost elements and other support data that the rest of the code does not see and that could potentially scale badly. For both measurements, the maximum over all MPI ranks is shown. The four curves in each set correspond to the top four weak scaling curves in Figure 19. The remaining weak scaling curves, as well as the strong scaling curves, are not shown here.
Finally, to test support for more than DOFs, we let the problem run on 128K cores with no limit on the number of iterations, until the available RAM was exhausted (1GB per process in this case). Execution stopped after 131 iterations when the mesh contained elements, corresponding to unknowns.
7.3 Dynamic AMR in Lagrangian hydrodynamics code
In this section we show that variational restrictionbased AMR can be remarkably unintrusive when it comes to integration in a real finite element application code. For a demonstration, we chose the opensource highorder hydrodynamics solver Laghos [27], which simulates the timedependent Euler equations of compressible gas dynamics in a Lagrangian frame, using highorder finite element spatial discretization and explicit highorder timestepping. The full numerical method for Laghos is described in [17]. Here we only briefly mention that the simulation state consists of material position , velocity field and internal energy defined over the initial domain . Kinematic quantities (position, velocity) are discretized by order continuous elements, and thermodynamic quantities (energy) are discretized by order discontinuous elements. The discretized material position corresponds directly to the mesh curvature function, i.e., the mesh moves and deforms together with the material. The gas density can be calculated at any time from the deformation and is not part of the simulation state.
To obtain the change in velocity at each time step, Laghos solves the linear system
(12) 
where is the kinematic mass matrix and contains forces that depend on , and the equation of state. To handle domain decomposition in a parallel computation, the system solved is really
(13) 
where the prolongation matrix assembles contributions from shared degrees of freedom on parallel task boundaries (note that the product may never get evaluated explicitly in the actual implementation). Since Laghos is already using variational restriction for domain decomposition, running the simulation on a parallel nonconforming mesh amounts to switching the matrix to the one constructed in Section 6.5. The only minor issue is the order of elimination of essential boundary conditions, as discussed at the end of Section 3.
Our first experiment is running the 2D multimaterial shock triplepoint test problem (also described in [17]) on a static nonconforming mesh, meaning the initial mesh is already refined and does not change during the simulation. Based on a previous run on a coarse mesh, we anisotropically refine elements in the three material subdomains of the initial mesh in an attempt to counter the deformations the elements will undergo, thus improving the size of the CFLlimited time step. The initial and target meshes are shown in Figure 21. We observe that some element deformations (diagonal compression) cannot be countered in this way, but we conclude that the code is capable of running on an arbitrarily refined nonconforming mesh practically outofthebox.
For the second experiment, we modified the Laghos code to allow for dynamic AMR that changes the mesh during the simulation, both via refinement and coarsening. Changing the mesh entails interpolating the state functions on the new mesh and reassembling both the kinematic and thermodynamic mass matrices (which are normally constant). After each mesh change we also load balance the mesh immediately (see Section 6.3). The state vectors are migrated together with the mesh using a parallel Boolean matrix constructed for each finite element space as part of the load balancing operation.
Since devising a general AMR strategy for Lagrangian shock hydrodynamics is beyond the scope of this paper, for the purpose of this demonstration we employ simple refinement and coarsening rules tailored specifically for the well known Sedov blast problem [17, 36]. Specifically, we look at the artificial viscosity coefficient to detect the presence of shock discontinuity and trigger refinement to a predefined maximum level in the undisturbed region of the mesh () in front of the shock. In the postshock region (nonzero ) we coarsen a group of eight elements whenever their maximum densities are below a threshold defined as . The initial mesh is the unit cube refined 5 times towards the origin, i.e., the corner where all internal energy is deposited at time . This corner is never coarsened. For better stability, we also enforce 1irregularity of the mesh at all times.
With these rules we obtain a dynamic AMR computation where the maximum refinement follows the shock, but the remaining areas are resolved with larger elements (see Figure 22). This AMRenabled version of Laghos is publicly available in [27] and contains about 550 new lines of code. In contrast, obtaining these types of AMR results in similar codes has required multiple manyears of software development in the past.
8 Conclusions and future work
In this paper we presented a highly scalable approach to unstructured nonconforming adaptive mesh refinement that is easy to integrate into applications. The variational restriction approach clearly separates the finite element assembly from the influence of the nonconforming space. The conforming interpolation algorithm handles a large class of triangular, quadrilateral, prismatic and hexahedral meshes and can be used to construct arbitrary high order finite element discretizations in , and , as demonstrated in examples in this paper and in the freely available implementation in the MFEM library [29].
Uniquely, our approach allows anisotropic refinement and supports arbitrarily irregular meshes, which to the best of our knowledge has not been done previously for continuous Galerkin hexahedral elements. Compared to purely octreebased algorithms, we sacrifice some memory and runtime efficiency for generality (e.g., octrees only admit isotropic refinement). In the future, we plan to extend the method to 4D hypercubes and perform independent space and time adaptation, for which anisotropic refinement is crucial. We see this work as a first step towards that goal.
Reproducibility
References
 [1] M. Ainsworth and B. Senior, Aspects of an adaptive hpfinite element method: Adaptive strategy, conforming approximation and efficient solvers, Computer Methods in Applied Mechanics and Engineering, 150 (1997), pp. 65–87.

[2]
I. Babuska and W. Rheinboldt,
Reliable error estimation and mesh adaptation for the finite element method
, (1979), pp. 67–108.  [3] W. Bangerth, C. Burstedde, T. Heister, and M. Kronbichler, Algorithms and data structures for massively parallel generic adaptive finite element codes, ACM Trans. Math. Softw., 38 (2012), pp. 14:1–14:28.
 [4] W. Bangerth, R. Hartmann, and G. Kanschat, deal.II—a generalpurpose objectoriented finite element library, ACM Trans. Math. Softw., 33 (2007).
 [5] R. E. Bank, A. H. Sherman, and A. Weiser, Refinement algorithms and data structures for regular local mesh refinement, Scientific Computing, (1983), pp. 3–17.
 [6] E. G. Boman, U. V. Catalyurek, C. Chevalier, and K. D. Devine, The Zoltan and Isorropia parallel toolkits for combinatorial scientific computing: Partitioning, ordering, and coloring, Scientific Programming, 20 (2012), pp. 129–150.
 [7] A. Brandt, Multilevel adaptive solutions to boundaryvalue problems, Mathematics of Computation, 31 (1977), pp. 333–390.
 [8] C. Burstedde, L. C. Wilcox, and O. Ghattas, p4est: Scalable algorithms for parallel adaptive mesh refinement on forests of octrees, SIAM J. Sci. Comput., 33 (2011), pp. 1103–1133.
 [9] P. M. Campbell, K. D. Devine, J. E. Flaherty, L. G. Gervasio, and J. D. Teresco, Dynamic octree load balancing using spacefilling curves, Tech. Rep. CS0301, Williams College Department of Computer Science, 2003.
 [10] G. F. Carey, A meshrefinement scheme for finite element computations, Computer Methods in Applied Mechanics and Engineering, 7 (1976), pp. 93 – 105.
 [11] C. Carstensen and J. Hu, Hanging nodes in the unifying theory of a posteriori finite element error control, J. Comput. Math., 27 (2009), pp. 215–236.
 [12] CEED: Center for Efficient Exascale Discretizations, Exascale Computing Project, DOE. http://ceed.exascaleproject.org.
 [13] J. Červený, Automatic hpAdaptivity for TimeDependent Problems, PhD thesis, University of West Bohemia, 2012.
 [14] P. G. Ciarlet, The Finite Element Method for Elliptic Problems, NorthHolland, 1978.
 [15] L. Demkowicz, J. Oden, W. Rachowicz, and O. Hardy, Toward a universal hp adaptive finite element strategy, part 1. constrained approximation and data structure, Computer Methods in Applied Mechanics and Engineering, 77 (1989), pp. 79 – 112.
 [16] V. Dobrev, T. Kolev, C. Lee, V. Tomov, and P. Vassilevski, Algebraic hybridization and static condensation with application to scalable H(div) preconditioning, SIAM Journal on Scientific Computing, to appear (2019).
 [17] V. A. Dobrev, T. V. Kolev, and R. N. Rieben, Highorder curvilinear finite element methods for Lagrangian hydrodynamics, SIAM Journal on Scientific Computing, 34 (2012), pp. B606–B641.
 [18] J. Flaherty, R. Loy, M. Shephard, B. Szymanski, J. Teresco, and L. Ziantz, Adaptive local refinement with octree load balancing for the parallel solution of threedimensional conservation laws, J. Parallel Distrib. Comput., 47 (1997), pp. 139–152.
 [19] M. Griebel and G. Zumbusch, Parallel multigrid in an adaptive pde solver based on hashing and spacefilling curves, 1997.
 [20] V. Heuveline and F. Schieweck, H1interpolation on quadrilateral and hexahedral meshes with hanging nodes, Computing, 80 (2007), pp. 203–220.
 [21] hypre: A library of high performance preconditioners. http://www.llnl.gov/CASC/hypre/.
 [22] T. Isaac, C. Burstedde, L. Wilcox, and O. Ghattas, Recursive algorithms for distributed forests of octrees, SIAM Journal on Scientific Computing, 37 (2015), pp. C497–C531.
 [23] T. Isaac and M. G. Knepley, Support for nonconformal meshes in PETSc’s DMPlex interface, CoRR, abs/1508.02470 (2015).
 [24] G. Karypis and V. Kumar, A parallel algorithm for multilevel graph partitioning and sparse matrix ordering, J. Parallel Distrib. Comput., 48 (1998), pp. 71–95.
 [25] B. S. Kirk, J. W. Peterson, R. H. Stogner, and G. F. Carey, libMesh: A C++ library for parallel adaptive mesh refinement/coarsening simulations, Eng. with Comput., 22 (2006), pp. 237–254.
 [26] M. G. Knepley and D. A. Karpeev, Mesh algorithms for PDE with Sieve I: mesh distribution, Sci. Program., 17 (2009), pp. 215–230.
 [27] Laghos: Highorder Lagrangian hydrodynamics miniapp. http://github.com/CEED/Laghos.
 [28] P. Lindstrom, The minimum edge product linear ordering problem, Tech. Rep. LLNLTR496076, Lawrence Livermore National Laboratory, Aug. 2011.
 [29] MFEM: Modular finite element methods. http://mfem.org.
 [30] W. F. Mitchell, Hamiltonian paths through two and threedimensional grids, J. Res. Natl. Inst. Stand. Technol., 110 (2005), pp. 127–36.
 [31] W. F. Mitchell, A refinementtree based partitioning method for dynamic load balancing with adaptively refined grids, J. Parallel Distrib. Comput., 67 (2007), pp. 417–429.

[32]
W. F. Mitchell and M. A. McClain,
A Survey of hpAdaptive Strategies for Elliptic Partial Differential Equations
, Springer Netherlands, Dordrecht, 2011, pp. 227–258.  [33] B. Reps and T. Weinzierl, Complex additive geometric multilevel solvers for helmholtz equations on spacetrees, ACM Trans. Math. Softw., 44 (2017), pp. 2:1–2:36.
 [34] R. Schneiders, Refining quadrilateral and hexahedral element meshes, Fifth International Meshing Roundtable, 1 (1998).
 [35] T. Schönfeld, Adaptive mesh refinement methods for threedimensional inviscid flow problems, International Journal of Computational Fluid Dynamics, 4 (1995), pp. 363–391.
 [36] L. I. Sedov, Similarity and Dimensional Methods in Mechanics, CRC Press, 10th ed., 1993.
 [37] F. T. Suttmeier, On concepts of PDEsoftware: The cellwise oriented approach in DEAL, International Mathematical Forum, 2 (2007), pp. 1 – 20.
 [38] T. Tu, D. O’Hallaron, and O. Ghattas, Scalable parallel octree meshing for terascale applications, in Proc. of the 2005 ACM/IEEE Conf. on Supercomputing, 2005, pp. 4–.
 [39] P. Šolín, Partial Differential Equations and the Finite Element Method, John Wiley & Sons, 2005.
 [40] P. Šolín, J. Červený, and I. Doležel, Arbitrarylevel hanging nodes and automatic adaptivity in the hpfem, Math. Comput. Simul., 77 (2008), pp. 117–132.
 [41] T. Weinzierl and M. Mehl, Peano–a traversal and storage scheme for octreelike adaptive cartesian multiscale grids, SIAM Journal on Scientific Computing, 33 (2011), pp. 2732–2760.
 [42] E. Wilson, The static condensation algorithm, Int. J. Numer. Methods Eng., 8 (1974), pp. 199–203.
Comments
There are no comments yet.