1 Motivation
Computer hardware progress towards high FloptoByte ratios renders the data movement the deciding factor for efficient PDE solvers, especially for multigrid algorithms as their main algorithmic component in the case of elliptic operators. A consequence of this development is a stronger coupling of the numerical linear algebra part and computer science part in solver development, since blackbox matrixbased solvers that are severly memorybound are no longer competitive. Therefore, optimal multigrid solvers can not be designed with a view on convergence rates and iteration counts only, but need to take into account the implementation technique from the very beginning. With the goal in mind of optimizing timetosolution, we believe that addressing these interdisciplinary challenges deserves special attention.
1.1 Matrixfree implementations and recent trends in computer hardware
It is well known from the spectral element community that computationally efficient implementations of highorder finite element discretizations on tensorproduct elements rely on matrixfree algorithms using the sumfactorization technique, see for example Orszag1980 ; Kopriva2009 ; Deville2002 ; Karniadakis2013 . In the emerging field of highorder discontinuous Galerkin discretizations, recent implementations targeting current cachebased multicore architectures with high FloptoByte ratios due to wide SIMD (singleinstructionmultipledata) units have been developed in Kronbichler2012 ; Kronbichler2019fast ; Muething2017arxiv . These matrixfree evaluation routines using sumfactorization have a complexity of operations and data transfer from memory per element for polynomial degree in space dimensions. Traditional implementation strategies are based on the assembly of sparse matrices in the finite element discretization stage of the numerical algorithm and are handed over to the linear algebra part of the solver that can be applied in a blackbox fashion to the discrete problem. However, due to increased complexity in terms of arithmetic operations and data transfer from main memory with complexity for the matrixvector product (and worse complexity for the assembly part), it is clear that these sparse matrix methods can not be competitive for high polynomial degrees. Initial studies Vos2010 ; Cantwell2011 identified a breakeven point of for (and larger for ) beyond which a matrixfree approach with sumfactorization is faster. However, on modern computer hardware characterized by FloptoByte ratios significantly larger than one, matrixfree algorithms with sumfactorization on hexahedral elements outperform sparse matrices already for polynomial degree , see Kronbichler2012 ; Kronbichler2019 . This is also shown in Figure 1, which compares the throughput measured in unknowns processed per second for the evaluation of the scalar Laplace operator in three space dimensions, , suggesting a breakeven point of . Due to the paradigm shift in computer hardware, matrixfree algorithms nowadays tend to become memorybound on recent hardware Kronbichler2019fast if implemented with a minimum of floating point operations and if vectorization is used. As a consequence of this trend, also solution techniques such as the hybridizable discontinuous Galerkin (HDG) method, see for example Kirby2012 ; Yakovlev2016 , which reduces the global problem to the degrees of freedom on the mesh skeleton by elimination of inner degrees of freedom, can not keep up with fast matrixfree operator evaluation for quadrilateral/hexahedral elements on current computer hardware as investigated in detail in the recent article Kronbichler2018 . In terms of Figure 1, the gap would open at a slower pace for the HDG case, but still be more than one order of magnitude.
1.2 Multigrid for highorder discretizations: Stateoftheart
When it comes to the solution of (non)linear systems of equations by means of iterative solution techniques, the evaluation of nonlinear residuals as well as the application of linear(ized) operators in Krylov solvers are readily available in a matrixfree implementation environment Brown2010 . More importantly, however, also preconditioners should rely on matrixfree algorithms with optimal complexity, since the whole solver will otherwise run into similar bottlenecks with overwhelming memory transfer. Optimal complexity of all solver components is crucial in order to render highorder discretization methods more efficient in underresolved application scenarios Fehn2018b . For elliptic PDEs, multigrid methods Trottenberg2001 are among the most efficient solution techniques Gholami2016 , especially because of their applicability to problems on complex geometries. In the context of highorder finite element discretizations, multigrid methods can be categorized into multigrid, multigrid, and algebraic multigrid (AMG) techniques.
Geometric or multigrid methods rely on a hierarchy of meshes. Robust multigrid techniques for highorder DG discretizations have been developed and analyzed in Gopalakrishnan2003 ; Hemker2003 ; Brenner2005 ; Brenner2009 for uniformly refined meshes and in Kanschat2004 ; Kanschat2008 for adaptively refined meshes. Recent improvements of these algorithms towards highperformance, matrixfree implementations have been developed in Kronbichler2018 , where a performance comparison of highorder continuous and discontinuous Galerkin discretizations as well as hybridizable discontinuous Galerkin methods is carried out. A GPU variant for continuous finite elements has been proposed in Kronbichler2019 . The parallel efficiency for adaptively refined meshes is discussed in Clevenger2019 . Largescale applications of these multigrid methods can be found in the field of computational fluid dynamics (CFD) and the incompressible Navier–Stokes equations Fehn2018b ; Krank2017 .
For spectral element discretizations, multigrid, or synonymously spectral element multigrid, is frequently used, where the multigrid hierarchy is obtained by reducing the polynomial degree of the shape functions. Coarsening and multigrid transfer is particularly simple since the function spaces of different multigrid levels are nested also for element geometries deformed via highorder mappings and all operations are elementlocal. This approach has first been proposed and theoretically analyzed in Ronquist1987 ; Maday1988 , and later investigated, for example, in Helenbrook2003 ; Helenbrook2008 ; Mascarenhas2010 ; Lottes2005 ; Stiller2017 ; Stiller2016 ; Stiller2017b ; Huismann2019 . A related hierarchical scale separation solver is proposed in Aizinger2015 . Polynomial multigrid techniques are also frequently used to solve the compressible Euler equations Rasetarinera2001 ; Bassi2003 ; Fidkowski2004 ; Nastase2006JCP ; Luo2006 ; Hillewaert2006 ; Mascarenhas2009 ; Bassi2009 ; Helenbrook2016 and compressible Navier–Stokes equations Fidkowski2005 ; Persson2008 ; Shahbazi2009 ; Diosady2009 ; Bassi2011 ; Luo2012 ; Ghidoni2014 .
Algebraic multigrid techniques extract all information from the assembled system matrix to generate coarser levels and are attractive as they can be applied in a blackbox fashion. AMG is considered in Heys2005 for highorder continuous Galerkin discretizations and in Lasser2001 ; Prill2009 ; Olson2011 for (highorder) discontinuous Galerkin discretizations. AMG applied directly to highorder DG discretizations faces several challenges, among them the construction of robust smoothers for matrices that lose diagonal dominance, but most importantly the computational complexity of matrixbased algorithms (especially for threedimensional problems) compared to their matrixfree counterparts, see Figure 1. These limitations are also exemplified by the fact that AMG methods for highorder discretizations have only been applied to small twodimensional problems in the works mentioned above. For reasons of computational complexity (see Section 1.1), it appears to be inevitable to combine algebraic multigrid techniques with some form of geometric coarsening to achieve a computationally efficient approach for practical applications Prill2009 ; Bastian2012 ; Siefert2014 .
Multigrid transfer operators are typically negligible in terms of computational costs when implemented in a matrixfree way with optimal complexity Kronbichler2019 ; Bastian2019 . Therefore, multigrid smoothers and coarsegrid solvers remain as the main performancerelevant multigrid components. Adhering to the matrixfree paradigm poses two major challenges that need to be addressed and further improved:

Matrixfree smoothers: To fully exploit the advantages of matrixfree algorithms with sumfactorization for operator evaluation, multigrid smoothers should make use of these algorithms as well, but this problem has so far only been solved for certain types of smoothers (and mainly for elliptic problems). Polynomial smoothers based on the Chebyshev iteration Adams2003 ; Sundar2015 or explicit Runge–Kutta methods Helenbrook2003 ; Luo2006 ; Hillewaert2006 ; Luo2012 ; RuedaRamirez2019 can be immediately realized in a matrixfree way, are inherently parallel, and are therefore widely used in a highorder context. More challenging are smoothers based on elementwise inversion of operators such as block Jacobi, block Gauss–Seidel, block ILU, or (overlapping) Schwarz smoothers. Many implementations rely on matrixbased algorithms for smoothers Rasetarinera2001 ; Fidkowski2004 ; Fidkowski2005 ; Nastase2006JCP ; Bassi2009 ; Persson2008 ; Shahbazi2009 ; Diosady2009 ; Mascarenhas2009 ; Bassi2011 ; Ghidoni2014 ; Helenbrook2016 , limiting applicability mainly to twodimensional problems, while threedimensional problems become prohibitively expensive for higherpolynomial degrees due to the complexity of for the assembly, for factorizations, and for matrixvector products. On Cartesian meshes and tensor product elements, elementwise inversion of operators with optimal complexity is possible via the fast diagonalization method Lynch1964 , which has first been applied in Couzy1994 ; Couzy1995 in the context of spectral element discretizations and analyzed in more detail in Fischer2000 ; Lottes2005 ; Fischer2005 ; Stiller2017 ; Stiller2016 ; Stiller2017b in the context of overlapping Schwarz preconditioners and multigrid smoothers. Other tensorproduct based preconditioners for highorder DG discretizations that exploit fast matrixfree operator evaluation with sumfactorization and that are applicable to more complex operators (convection–diffusion, Navier–Stokes) and nonCartesian meshes have been proposed recently in Bastian2019 ; Pazner2018 , suggesting that the complexity can be reduced to in a general setting.

Hybrid multigrid algorithms: Due to the improved efficiency of matrixfree evaluation routines and better parallel scalability Sundar2012 as compared to sparse matrixvector products, the generation of coarser multigrid levels should rely on nonalgebraic coarsening, i.e., mesh (or geometric) coarsening and reducing the polynomial degree of the function space for higher order methods, leading to the idea of hybrid  and multigrid methods as illustrated in Figure 2. It is beneficial to stay as long as possible on the matrixfree curve in Figure 1 and to exploit all possibilities of geometric and polynomial coarsening in the multigrid hierarchy. For complex engineering applications, the number of geometric mesh levels is low (typically ) and coarse meshes might consist of millions of elements so that simple iterative solution techniques like a conjugate gradient iteration with Jacobi preconditioner used as coarsegrid solver would become too expensive and dominate the overall computational costs of the multigrid solver. Applying algebraic multigrid techniques for the coarsegrid problem discretized with linear finite elements is a good compromise between effectiveness of coarsegrid preconditioning and computational efficiency, since in this regime sparse matrixvectors products are competitive to matrixfree evaluation routines, see Figure 1. At the same time, smoothers for algebraic multigrid methods work best at low polynomial degrees due to a better diagonal dominance of the matrix as opposed to highorder shape functions Kronbichler2018 . In Table 1, we summarize important contributions in the field of hybrid multigrid solvers. In Helenbrook2003 ; Helenbrook2001 , hybrid multigrid solvers combining MG with MG have been presented for highorder discretizations. In Nastase2006JCP ; Shahbazi2009 , multigrid is used along with algebraic multigrid for the coarse problem. In terms of multigrid, the works Dobrev2006 ; Bastian2012 ; Siefert2014 can be categorized as twolevel algorithms with transfer to continuous linear shape functions from the fine level to the coarse level, which is solved by an multigrid approach in Dobrev2006 and an algebraic multigrid approach in Bastian2012 ; Siefert2014 . These works have the limitation that the underlying implementation is not matrixfree and, therefore, suffer from the complexity of matrixbased approaches. The main limitation of the approach in Sundar2012 for hexahedral meshes based on the octree concept is that it only supports linear continuous finite elements (a similar approach for tetrahedral elements is presented in Lu2014 ). An extension towards multigrid has been done in Rudi2015 but results are limited to linear and quadratic shape functions. The approach in OMalley2017 combines multigrid with AMG but uses expensive matrixbased implementations which could explain why results are only shown for quadratic elements. In more recent works, hybrid multigrid algorithms for highorder methods with completely matrixfree implementation are discussed in Bastian2019 , extending a previous work Bastian2012 towards a matrixfree implementation developed in Muething2017arxiv ; Kempf2018arxiv and available in the EXADUNE project Exadune2014 . That work does not exploit geometric coarsening (multigrid) and the highorder discretization with degree is immediately reduced to a linear space within the multigrid algorithm (and is therefore categorized as a twolevel algorithm rather than multigrid). Algebraic multigrid is employed for the coarse problem. An elaborate matrixfree implementation in the context of multigrid solvers is presented in Kronbichler2018 based on the matrixfree implementation developed in Kronbichler2012 ; Kronbichler2019fast and available in the deal.II finite element library dealII90 . That work clearly improves the performance numbers of sophisticated geometric multigrid solvers shown in Gholami2016 . One drawback of this pure multigrid approach is that its applicability is limited to problems where the coarse grid is comparably simple. Hybrid multigrid techniques in the context of HDG discretizations are considered, e.g., in Fabien2019 .
Study  highorder  matrixfree  MG  MG  AMG 

Helenbrook et al. Helenbrook2003  
Nastase et al. Nastase2006JCP , Shahbazi et al. Shahbazi2009  
Dobrev et al. Dobrev2006  ()  
Bastian et al. Bastian2012 , Siefert et al. Siefert2014  ()  
Sundar et al. Sundar2012  
Lu et al. Lu2014  
Rudi et al. Rudi2015  
O’Malley et al. OMalley2017  
Bastian et al. Bastian2019  ()  
Kronbichler and Wall Kronbichler2018  
present work 
There exist other techniques as well with the aim to overcome the complexity of matrixbased methods for high polynomial degrees. Preconditioners and multigrid methods applied to a loworder rediscretization of the operator on a mesh with vertices located on the nodes of the highorder discretization is a wellknown technique originating from Deville1985 ; Deville1990 and has been analyzed for example in Fischer1997 ; Lottes2005 ; Heys2005 ; Brown2010 ; Sundar2015 ; Pazner2019arxiv . Such an approach is not considered here.
1.3 Contributions of the present work
The present work extends our previous work in Kronbichler2018 towards hybrid multigrid techniques combining geometric (), polynomial (), and algebraic coarsening. Our goal is to devise a multigrid solver applicable to engineering problems with complex geometry characterized by coarse grids with many elements. As can be seen from Table 1, the individual components relevant for efficient hybrid multigrid methods are covered by different works. However, none of these works fulfills all properties and it is the aim of the present work to fill this gap.
As a model problem, the constantcoefficient Poisson equation in three space dimensions is studied in this work. With respect to the choice of multigrid smoothers, this study makes use of Chebyshev accelerated Jacobi smoothers which have the characteristic that convergence rates are independent of and mildly dependent on , see Kronbichler2018 ; Sundar2015 . Chebyshev smoothing is particularly attractive since it only requires application of the matrixvector product and the inverse diagonal of the system matrix, i.e., the smoother benefits from fast matrixfree evaluation routines and is efficient in a parallel setting. Although more aggressive smoothers based on overlapping Schwarz methods resulting in lower iteration counts exist, it should be noted that Chebyshev smoothing is nonetheless highly efficient and comparative studies would need to be carried out to answer which approach is more efficient, see the initial investigation in Kronbichler2019arxiv . These aspects are, however, beyond the scope of the present study.
In case of discontinuous Galerkin discretizations, a transfer from discontinuous to continuous function spaces (denoted as DGtoFE transfer) should be considered in addition to  and coarsening in order to further reduce the size of the coarsegrid problem. For example, the problem size is reduced by a factor of for linear elements with . Moreover, this approach is also beneficial in order to reduce iteration counts for the coarsegrid problem, due to the fact that existing AMG implementations and smoothers are often most effective on continuous function spaces. However, it is unclear whether to perform the DGtoFE transfer on the highorder polynomial space or for the coarse problem at , or likewise on the finest mesh or the coarsest mesh. It is a main finding of the present work that a DGtoFE transfer at the fine level is beneficial, both in terms of iteration counts and overall computational costs. Furthermore, we demonstrate that this approach results in a multigrid algorithm whose convergence rates are independent of the interior penalty factor. This leads to multigrid coarsening strategies denoted as  or multigrid, with a transfer to continuous () function spaces on the finest level followed by geometric () and polynomial () coarsening before the coarsegrid solver (e.g., AMG) is invoked.
In summary, the present work discusses the relevant design choices in the context of hybrid multigrid algorithms, i.e., combined geometric–polynomial–algebraic multigrid techniques, with an emphasis on computational costs as the driving force for algorithmic selections. The performance of these methods is detailed using a stateoftheart matrixfree implementation, considering a series of increasingly complex problems.
1.4 Outline
The model problem studied in this work and the discontinuous Galerkin discretization are introduced in Section 2. Section 3 discusses the hybrid multigrid algorithm including the main multigrid components such as smoothers, coarsening strategies and transfer operators, as well as the coarselevel solver. The matrixfree implementation is summarized in Section 4 which is the key to an efficient hybrid multigrid solver. Numerical results are shown in Section 5, and we conclude in Section 6 with a summary of our results and an outlook on future work.
2 Highorder discontinuous Galerkin discretization of the Poisson equation
As a model problem, we consider the Poisson equation discretized by discontinuous Galerkin methods with a focus on highorder polynomial spaces. Let us briefly motivate the use of discontinuous Galerkin discretizations for the Poisson problem. While continuous finite element discretizations might be regarded a suitable discretization scheme as well due to a reduced number of unknowns, DG discretizations can have an advantage over continuous discretizations for nonsmooth problems in terms of accuracy versus computational costs due to a better approximation in proximity to a singularity Kronbichler2018 . Furthermore, DG discretizations of the Poisson equation arise naturally from certain model problems such as the incompressible Navier–Stokes equations discretized with discontinuous Galerkin methods. For this type of problems, efficient multigrid methods for Poisson problems are a key ingredient determining overall efficiency. Largescale applications in the context of incompressible turbulent flows can be found in Krank2017 ; Fehn2018b , for earth mantle convection problems (with variable coefficients) in Rudi2015 , or for porous media flow in Bastian2014 . The constant coefficient Poisson equation reads
On the domain boundary , Dirichlet boundary conditions, on , and Neumann boundary conditions, on , are prescribed, with and .
We consider meshes composed of hexahedral elements , , that may be arbitrarily deformed via a highorder polynomial mapping from the reference element with coordinates to the physical element with coordinates . The space of test and trial functions is given as
(1) 
Here, denotes the space of polynomials of tensor degree defined on the reference element , i.e.,
(2) 
The multidimensional shape functions are formed by a tensor product of onedimensional shape functions, . Lagrange polynomials of degree based on the Legendre–Gauss–Lobatto nodes are a common choice for a nodal basis due to their conditioning and are therefore considered in the present work. As usual, volume and face integrals in the weak formulation are computed by means of Gaussian quadrature with points per coordinate direction, ensuring exact integration on affine geometries. Note that the tensor product structure of both the shape functions and the quadrature rule is important in order to use fast matrixfree evaluation routines exploiting sumfactorization.
The weak formulation of the Poisson problem written in primal formulation reads: Find such that
(3) 
where
(4) 
We use the notation and for volume and face integrals, respectively, with inner products symbolized by . As an important representative of the class of DG discretization methods we consider the symmetric interior penalty method Arnold1982 ; Arnold2002 with numerical fluxes
(5)  
(6) 
where is the average operator and the jump operator, and , denote two neighboring elements , . Boundary conditions are imposed weakly via the mirror principle Hesthaven2007 , setting
(7) 
Inserting equation (7) into equation (4), the weak form can be split into homogeneous and inhomogeneous contributions, . The definition of the penalty parameter according to Hillewaert2013 for quadrilateral/hexahedral elements is used
(8) 
with the element volume and the surface area . The penalty parameter in equation (6) is given as on interior faces , and on boundary faces .
For the multigrid algorithm detailed below, coarser discretizations of the Laplace operator are required, which is realized by evaluating the operator for modified discretization parameters and (including the interior penalty parameter), i.e., on a coarser mesh or for a lower polynomial degree . In the literature, this approach is sometimes denoted as rediscretization, as opposed to a Galerkin product. Further, a transfer from discontinuous to continuous function spaces is considered in our hybrid multigrid algorithm. The conforming finite element (FE) space is given as
(9) 
and the weak form of the (negative) Laplace operator simplifies to
(10) 
Dirichlet boundary conditions are imposed strongly for the FE discretization, but only the homogeneous operator is required inside the multigrid algorithm. When assembling the coarselevel matrix for the AMG coarse solver, constrained degrees of freedom are kept in the system with diagonal entries set to 1 to ensure that the matrix is invertible.
In matrixvector notation, the discrete problem can be written as the linear system of equations
(11) 
where , with the problem size (number of unknowns or degrees of freedom) denoted by . Contributions from inhomogeneous boundary conditions are included in the righthand side vector and the matrix only accounts for the homogeneous part . The matrix is not assembled explicitly (apart from the algebraic multigrid coarse solver), since iterative solvers and multigrid smoothers only require the action of applied to a vector, ^{1}^{1}1A notation like rather than would be more consistent in the matrixfree context, but we adhere to the matrixvector notation as this is the common notation in linear algebra. which is realized by means of fast matrixfree evaluations (Section 4).
3 Hybrid multigrid solver
The basic multigrid idea is to tackle oscillatory errors by smoothing and to tackle smooth errors by coarsegrid corrections, which applies to all types of multigrid coarsening (geometric, polynomial, algebraic) discussed here. We use multigrid as a preconditioner inside a Krylov solver instead of using multigrid as a solver. This approach is sometimes also denoted as a Krylovaccelerated multigrid method. The combination of a multigrid cycle with a Krylov method can be expected to be more robust and to result in lower iteration counts in general as opposed to pure multigrid solvers, see Lottes2005 ; Stiller2017 ; Stiller2016 ; Stiller2017b ; Shahbazi2009 ; Sundar2015 , especially for anisotropic problems. Since it appears that this is the most frequent use case in practice, we follow this strategy in the present work. Some performance numbers could be improved by alternative multigrid flavors, e.g., by considering full multigrid cycles Kronbichler2019 . The performance considerations and convergence rates in this work also apply to full multigrid where only a single or only few iterations on the finest level are needed. Due to the symmetric positive definite nature of the model problem considered here, we use the conjugate gradient (CG) algorithm Hestenes1952 ; Saad2003 as Krylov solver, which is detailed in Algorithm 1. The algorithmic components which are of main interest are the application of the discretized operator in line 8 and the application of the preconditioner in line 13. Other components are vector update operations and inner products (involving global communcation), but these parts of the algorithm are overall of subordinate importance since the computational costs are mainly determined by operator evaluation and the multigrid cycle called in the preconditioning step. However, it should be pointed out that the costs of all parts of the algorithm are explicitly taken into account by the experimental cost measures used in the present work, in the spirit of parallel textbook multigrid efficiency Gmeiner2015b , see Section 5.1.
In the preconditioning step of the conjugate gradient solver (preconditioner ), the operator is inverted approximately by performing one multigrid Vcycle according to Algorithm 2 with initial solution , where denotes the finest level. Pre and postsmoothing are done in lines 6 and 11, respectively, and the residual evaluation in line 7. The same number of smoothing steps is used for both pre and postsmoothing and for all multigrid levels . These steps typically form the most expensive part of the multigrid algorithm as long as the workload in degrees of freedom per core is sufficiently large, i.e., away from the strongscaling limit where latency effects become dominant. The coarselevel correction is called in line 9, recursively calling the multigrid Vcycle for the next coarser level until the coarsest level is reached, on which the coarselevel solver is called (line 3). Restriction (operator ) and prolongation (operator) are done in lines 8 and 10, respectively.
3.1 Chebyshevaccelerated Jacobi smoother
In the context of matrixfree methods analyzed here, an attractive multigrid smoother is a Chebyshevaccelerated Jacobi smoother Adams2003 , which requires the diagonal of the operator as well as the application of the matrixvector product . Therefore, any fast implementation for the evaluation of the discretized operator can be applied inside the smoother and parallel scalability is directly inherited from the operator evaluation. Algorithm 3 details the Chebyshev iteration with iteration index and smoothing steps, where the two additional scalar parameters and
are calculated according to the theory of Chebyshev polynomials and require an estimation of the maximum eigenvalue
of . The parameters are determined such that the Chebyshev smoother tackles eigenvalues in the range on the current level, while smaller eigenvalues are damped by the coarsegrid correction. Since the maximum eigenvalue is only estimated, a safety factor of is included to ensure robustness of the smoother. Note that the precise value used for the lower bound is not critical in terms of robustness and iteration counts. A Chebyshev iteration with pre and postsmoothing steps is denoted as Chebyshev() in the following, typical values for which the smoother is most efficient being , see for example Kronbichler2019 . As a default parameter, we use and show additional results in form of a parameter study in Section 5.The diagonal required by the Chebyshev smoother is calculated in the setup phase. The maximum eigenvalue needed by the Chebyshev iteration is estimated by performing conjugate gradient iterations. Compared to a single solution of the linear system, this cost is not negligible. However, for many largescale time dependent problems where time steps have to be solved, setup costs are amortized after a few time steps, which is why we do not explicitly consider these costs in the present work. For details on setup costs see, e.g., Kronbichler2019 . This is further justified by the fact that the costs are proportional to the costs of a finelevel matrixvector product, and therefore increase similarly under mesh refinement as the solution of the linear system of equations itself. While the present work is restricted to the constantcoefficient Poisson case, it should be mentioned that Chebyshev smoothing has been reported to work well also for variablecoefficient problems with smoothly varying coefficient Kronbichler2018 ; Sundar2015 .
3.2 Coarsening strategies and multigrid transfer operations
The multigrid level introduced in Algorithm 2 is uniquely defined by the grid size , the polynomial degree , and the continuity parameter
From one multigrid level to the next, only one of the three parameters may change for the hybrid multigrid methods discussed in this work. For example, a transfer from DG space to FE space leads to two multigrid levels that coincide with respect to grid size and polynomial degree , i.e., a combined coarsening from highorder discontinuous space to loworder continuous space is not considered here. The approach is denoted as /multigrid if geometric/polynomial coarsening is employed only. Combined geometricpolynomial multigrid is denoted as  or multigrid, depending on which coarsening is applied first, ^{2}^{2}2We only consider sequential coarsening in and as opposed to, e.g., Antonietti2015 , where the term multigrid is used for simultaneous coarsening in both and from one level to the next. as illustrated in Figure 2. Following this notation and depending on whether the DGtoFE transfer is performed at high degree or at low degree, we denote this approach as multigrid or multigrid, respectively, or as multigrid or multigrid if geometric coarsening is involved. Applying all three possibilities for coarsening would for example result in a multigrid strategy, with the coarsening performed first, followed by coarsening and finally coarsening. The different types of coarsening are illustrated in Figure 3. In all cases algebraic multigrid may be applied as a coarsegrid solver.
We write the prolongation of the coarselevel correction from coarse to fine levels generically (for all types of transfers ) as
(12) 
where the global prolongation operator is expanded into the sum over all elements on the fine level with elementwise prolongation operator . The gather operator extracts local degrees of freedom of a coarselevel element from the global DoF vector. The scatter operator coordinates the write of local degrees of freedom into the global finelevel DoF vector and additionally performs a weighting of degrees of freedom according to the multiplicity of a shared node in case of continuous function spaces. The elementwise prolongation operator is realized as orthogonal projection
(13) 
where denotes the mass matrix and the embedding from space into . Note that the integral is performed in reference space over the finelevel element
. Therefore, the operation is the same for all elements and is done only once in the setup phase where the 1D prolongation matrices are constructed. Prolongation in multiple dimensions is constructed as the tensor product of 1D operations, exploiting fast matrixfree evaluation techniques. The 1D prolongation matrices represent the interpolation of coarselevel basis functions into the nodes of the finelevel basis functions. In the case of
coarsening and for general mappings from reference to physical space, however, the coarselevel space is no longer a subset of the finelevel space. Therefore, the chosen multigrid transfer operations implicitly introduce the approximation of nested function spaces as also mentioned, e.g., in Lu2014 . In case of transfer and transfer, the function spaces are “strictly” nested. Restriction of the residual onto coarser levels is defined as the transpose of prolongation,(14) 
3.2.1 coarsening
A hierarchy of levels is constructed based on the octree concept, see for example Sundar2012 ; Burstedde2011 for details on aspects of the chosen mesh topology. Therefore, coarser meshes in the multigrid context are not obtained by coarsening a fine mesh, but rather by starting from a coarse mesh that is refined uniformly several times to obtain the fine mesh. This coarse mesh also forms the coarsegrid problem in the multigrid algorithm. From this perspective, it is clear that pure multigrid based on the octree approach works well for cubelike domains of moderate geometrical complexity, but reaches limitations for complex geometries where only one or two refinement levels applied to the coarse mesh might be affordable in practice. In these cases, it is essential to further coarsen the problem by the use of multigrid and algebraic multigrid techniques described in more detail below. Here, we restrict ourselves to meshes without hanging nodes and each octree has the same number of mesh refinement levels. Multigrid methods for highorder discretizations on adaptively refined meshes are discussed in Kanschat2004 ; Janssen2011 ; Kronbichler2018 ; Kronbichler2019 ; Sundar2012 in a pure multigrid context.
3.2.2 coarsening
As opposed to multigrid, multigrid offers the possibility for various coarsening strategies. Reducing the polynomial degree by one, , is frequently applied in literature Bassi2003 ; Fidkowski2004 ; Fidkowski2005 ; Nastase2006JCP ; Shahbazi2009 ; Diosady2009 ; Bassi2009 ; Bassi2011 ; Ghidoni2014 ; Antonietti2015 ; Fabien2019 . An alternative is to reduce the polynomial degree by a factor of two, (with appropriate rounding operation), which has been used in Helenbrook2003 ; Helenbrook2008 ; Mascarenhas2009 ; Mascarenhas2010 ; Helenbrook2016 . This coarsening strategy has a close analogy to coarsening since the number of degrees of freedom is reduced by a factor of two in each coordinate direction from one level to the next. It is also not uncommon to immediately reduce the polynomial degree to its minimum, or for all (twolevel algorithm), see for example Rasetarinera2001 ; Persson2008 ; Bastian2019 . Elementwise constant shape functions with are not considered in this work. On the one hand, the present DG discretization is not consistent for polynomial degree . On the other hand, as argued in Helenbrook2008 , the smallwavenumber modes that remain after smoothing are essentially continuous for diffusive problems and are, therefore, not well represented by a piecewise constant coarse space with . For the neutron diffusion problems studied in OMalley2017 , a continuous coarse space has been found to be advantageous over a piecewise constant space. It has been observed in Persson2008 ; Shahbazi2009 by the example of the compressible Navier–Stokes equations involving diffusive terms that performs better than . A piecewise constant space with is typically used in the convectiondominated limit and the compressible Euler equations Luo2006 ; Nastase2006JCP . In Bastian2019 , is also used for a Poisson problem with variable coefficients. These previous studies indicate that the optimal coarse space depends on the model problem under investigation. Since the present work is restricted to the constantcoefficient Poisson problem, we also restrict ourselves to a specific choice for the coarse space. Discussions and comparisons of different sequences can be found in Helenbrook2008 in the context of iteration counts and in Nastase2006JCP in terms of iteration counts and computational costs. In that work, only a single polynomial degree of is investigated. Here, we foster a more rigorous investigation of the following coarsening strategies

(decrease by one),

(bisection),

(twolevel algorithm),
considering a wide range of polynomial degrees and studying the impact on both iteration counts and computational costs. All levels are exploited in our multigrid algorithm until is reached.
3.2.3 coarsening (transfer from discontinuous to continuous space)
A transfer from the discontinuous space to a continuous space at the coarse degree is used in Helenbrook2008 ; OMalley2017 , an idea that has already been described in Lasser2001 in the context of twolevel overlapping preconditioners. A transfer at the highest degree is suggested in Rudi2015 without justification and with results shown only for the lowest degree . This approach might be counterintuitive at first sight since an additional multigrid level at high polynomial degree (and therefore with expensive smoothers) is introduced and the problem size is only marginally reduced for a DGtoFE transfer at high degree, i.e., by a factor of . It is interesting to note that a similar idea called conforming aggregation is used in Olson2011 in the context of smoothed aggregation algebraic multigrid techniques where degrees of freedom at the same spatial location are aggregated on the finest level. For the twolevel scheme proposed in Dobrev2006 ; Bastian2012 ; Siefert2014 ; Bastian2019 , the highorder DG space is directly reduced to a linear conforming space. According to our experiments, this could be the reason for the strong increase in iteration counts observed in Siefert2014 ; Bastian2019 for increasing (and for a similar twolevel preconditioner used in Remacle2016 ). As mentioned previously, we introduce an additional multigrid level for the DGtoFE transfer as in OMalley2017 , i.e., the transfer to continuous FE space is done at constant degree and mesh size and we found that this is important for optimal multigrid convergence rates. We investigate two variants of the DGtoFE transfer in this work, namely performing this transfer at highest degree or lowest degree (and similarly on the finest mesh or coarsest mesh). Performing the transfer to continuous elements on the finest level has very attractive properties. It reduces the iteration counts considerably, and yields a multigrid solver for SIPG discretizations of the Poisson equation that is robust w.r.t. the penalty parameter of the SIPG method. Theoretical background for this behavior is provided in Antonietti2017 , where this approach is motivated from the perspective of space splitting and auxiliary space methods. The important difference is that we here integrate this spliting into multigrid with the same smoother used on all levels.
The elementwise prolongation matrix is an identity matrix in the case of a DGtoFE transfer since the continuous and discontinuous function spaces are the same from an elementwise perspective. Accordingly, the degrees of freedom shared by neighboring elements in the continuous case are simply injected into the degrees of freedom duplicated in the discontinuous case. With restriction being the transposed operation, the residual contributions of degrees of freedom of duplicated nodes in the discontinuous case are summed into the uniquely defined degree of freedom in the continuous case.

The twolevel approaches in Lasser2001 ; Dobrev2006 ; Antonietti2017 are also known or interpreted as auxiliary space preconditioning. We refrain from this nomenclature in the present work and rather categorize these approaches as one type of multigrid coarsening in the generalized framework of hybrid multigrid algorithms. The multigrid methods in Bastian2012 ; Siefert2014 are introduced as algebraic multigrid methods that are “not fully algebraic”. In the present work, we foster a finelevel point of view and categorize these approaches as multigrid (potentially with additional coarsening) with algebraic multigrid applied as coarsegrid solver; for good reasons, because the fine levels are those where the numerical method spends its time (assuming that the method is applied away from the strong scaling limit) and are those that determine the computational efficiency of the approach.
3.3 Coarsegrid solver
The success of multigrid methods originates from the fact that the coarsegrid correction ensures meshindependent convergence rates as well as low iteration counts and – at the same time – causes only low computational overhead as compared to the operations on the fine level. It is therefore important that the coarsegrid correction does not deteriorate the multigrid convergence rate which should only be affected by the smoother on the fine level. This is particularly important for the AMG coarsegrid solver that does not necessarily converge at the same rate as the smoothers on the geometric levels of the multigrid hierarchy would allow to. For this reason, it is reasonable to solve the coarselevel problem by an inner Krylov solver preconditioned by an AMG Vcycle to a specified tolerance instead of only a single AMG Vcycle as coarsegrid solver. Note that using a Krylov solver within the preconditioner does no longer guarantee that the preconditioner is a stationary operation, which might require the use of flexible solvers in general. Since we did not observe convergence problems in the present work, a basic CG iteration is used throughout as outer Krylov solver. Extending AMG solvers designed for continuous discretizations to the discontinuous case is not trivial without further measures as shown in Olson2011 . Since we want to apply the AMG coarsegrid solver in a blackbox fashion in its optimal regime, we mainly show performance numbers for AMG applied to a continuous discretization of the coarse problem with lowest degree , see also Bastian2012 ; Siefert2014 . The present work makes use of the AMG implementation provided by the Trilinos ML project TrilionsML2006 , using one Vcycle with one smoothing step of an ILU smoother without fillin and no overlap (i.e., in a blockJacobi fashion over the MPI ranks) and an AmesosKLU coarse solver unless specified otherwise. A comparative study of different AMG solver frameworks is beyond the scope of this study, and is for example shown in OMalley2017 for the neutron diffusion equation, or in Offermans2019 in the context of computational fluid dynamics.
4 Matrixfree operator evaluation
The overall performance of the multigrid solver crucially depends on the speed at which the matrixvector product can be performed. The outer Krylov solver, the multigrid Vcycle, and also the multigrid smoothers only require the action of the linear operator applied to a vector. Since multigrid transfer operators can also be realized in a matrixfree way using sumfactorization, all components of the algorithm outlined in Section 3 (apart from the AMG coarsegrid solver) are amenable to fast matrixfree operator evaluation. The present work builds on matrixfree evaluation routines using the implementation developed in Kronbichler2012 ; Kronbichler2019fast and available in the deal.II finite element library dealII90 . The global matrixvector product is written as a loop over all elements and faces with the local weak form evaluated by numerical quadrature
(15) 
For volume integrals over , the gather operation extracts the local degrees of freedom associated to element , . Similarly, for the integral over a face , extracts the relevant degrees of freedom of the two elements required for the computation of the face integral, . Then, the computation of volume and face integrals is a 3step process described by and , respectively. This 3step process forms the core of the matrixfree operator evaluation and is explained in more detail below. Finally, the scatter operations and add contributions of volume and face integrals into the global residual vector according to the mapping of localtoglobal degrees of freedom. Independently of the specific discretization technique, matrixfree techniques are the stateoftheart implementation for highperformance realizations of PDE solvers, see for example Rudi2015 ; Ichimura2015 ; Gmeiner2015 .
Next, we detail the procedure for the matrixfree operator evaluation for the volume integral over
The integral over the physical domain is first transformed to the reference element, giving rise to geometry terms such as the Jacobian . Integration is then performed by Gaussian quadrature, introducing the quadrature weight and replacing the integral by a sum over all quadrature points. The last row shows how the elementwise computation of integrals can be interpreted in terms of the more abstract notation introduced in equation (15). The interpolation operator computes the gradient (in reference coordinates) of the solution at all quadrature points by interpolation of the basis functions according to the polynomial expansion introduced in Section 2. The differential operator applies the PDE operator for all quadrature points and depends on data associated to the current element for nonCartesian element geometries. The integration operator multiplies by the gradient of the test function and sums over all quadrature points (=integration). It can be easily seen from the above equation that the integration step is the transpose of the interpolation step. Interpolation and integration are done in reference coordinates and do not depend on the current element . To obtain optimal computational complexity, it is essential to exploit the tensor product structure of the shape functions in the interpolation and integration steps. This optimization technique is called sumfactorization and replaces the sum over all nodes by sums over the onedimensional nodes , leading to a complexity of operations. Applying onedimensional interpolation kernels for gradients gives rise to kernels. However, the operations can be reduced to kernels by first interpolating into a collocation basis ( kernels) and subsequently evaluating the gradient in the collocation basis (another kernels) Kronbichler2019fast
. Another optimization technique reducing the number of operations for the onedimensional kernels exploits the symmetry of the onedimensional shape functions and is called evenodd decomposition
Kopriva2009 . An illustration of the matrixfree evaluation process is provided in Figure 4. The computation of face integrals follows the same principles and we refer the interested reader to Kronbichler2012 ; Kronbichler2019fast for more details. For the special case of affine element geometries, a single Jacobian is used at all quadrature points of a cell. For deformed elements, a separate Jacobian is precomputed for each quadrature and stored as , which is then accessed during the operator evaluation and represents the main memory traffic. On faces, the quantity is precomputed at each quadrature points. We refer to Kronbichler2019fast for possible alternatives regarding the evaluation of the geometry. Apart from the operator evaluation and smoothing on all multigrid levels, also the multigrid transfer operators discussed in Section 3.2 are implemented with optimalcomplexity matrixfree algorithms.Vectorization over elements and faces
The matrixfree operator evaluation performs the same operations for all elements, the only difference is that integrals over different elements operate on different parts of the solution vector and the geometry information has to be stored and loaded separately for each element in case of deformed element geometries. In order to exploit the singleinstructionmultipledata (SIMD) vectorization capabilities of modern hardware with wide SIMD units, the present implementation groups together several elements or faces and performs the integrals in the weak form concurrently for this batch of elements or faces. This technique has first been proposed in Kronbichler2012 . The basic data type for the operations in the matrixfree evaluation process is therefore VectorizedArray<Number>, with Number being a template for a C++ data type such as double or float. For the hardware used in the present work with support for AVX512 (see Table 3), vectorization is done over elements/faces in double precision and in single precision. For meshes with the number of elements/faces not being a multiple of the vectorization width, parts of the vectorized array remain empty for these corner cases.
Complexity and throughput
The theoretical complexity of the matrixfree evaluation is operations and data, resulting in a linear complexity, , or constant complexity, , per degree of freedom, depending on whether arithmetics or memory transfer forms the main bottleneck. On modern hardware with high FloptoByte ratios, the present matrixfree implementation tends to be memorybound when implemented with a minimum of arithmetic operations Kronbichler2019fast . Figure 1 shows the throughput of the present implementation measured for the evaluation of the scalar Laplace operator on a 3D cube geometry with periodic boundary conditions for both Cartesian and curved elements. In practice, the throughput measured in degrees of freedom per second depends only mildly on the polyomial degree and suggests an almost constant complexity up to moderately high polynomial degrees. The fact that the observed complexity is significantly better than the theoretical complexity of volume integrals can be explained by the fact that face integrals as well as data access (with constant complexity per unknown) are performance relevant for moderately high polynomial degrees.
Mixedprecision multigrid
The matrixfree algorithm outlined above is perfectly suited for mixedprecision computations in the multigrid preconditioned Krylov solver, following the idea of Gropp2000 . This is due to the fact that the amount of data transferred from main memory reduces by a factor of two in case of single precision (implying twice the throughput in terms of elements processed per time), and the vectorization strategy with explicit vectorization over elements/faces also allows twice the throughput in terms of arithmetics. The throughput of the matrixvector product shown in Figure 1 is therefore raised by a factor of approximately when reducing accuracy from double precision to single precision. To not spoil accuracy of the numerical approximation of the solution and ensure convergence of the outer Krylov solver, single precision is only used in the multigrid Vcycle. The outer Krylov solver operates in double precision. Larger roundoff errors in the multigrid cycle can be tolerated since these highfrequency errors introduced by singleprecision roundoff errors are tackled by the multigrid smoothers Kronbichler2019 and since multigrid is only a preconditioner applied to the residual of the outer Krylov solver, see Algorithm 1. Since the Trilinos ML solver used here only supports double precision, the AMG coarsegrid preconditioner operates in double precision. The performance of mixedprecision is compared to pure doubleprecision computations in Figure 6 below and discussed in Section 5.
5 Results
We introduce relevant performance metrics used to evaluate the efficiency of the present hybrid multigrid methods in Section 5.1. Information on the hardware under consideration is given in Section 5.2. The considered test cases are briefly summarized in Section 5.3, before numerical results are shown for each problem in the subsequent sections.
5.1 Performance metrics
Frequently used metrics are the average multigrid convergence rate and the number of iterations needed to reduce the residual by ten orders of magnitude ()
where denotes the residual after iterations. These quantities are well suited to demonstrate meshindependent convergence rates, or to quantitatively investigate robustness of the multigrid method, i.e., the influence of certain parameters such as mesh anisotropies, variable coefficients, or the polynomial degree on the convergence behavior of the multigrid algorithm. However, they are not suited to quantify the effectiveness of smoothers in terms of computational efficiency. To measure computational costs, theoretical measures such as operation counts required for the matrixvector product or matrix nonzeros are often considered Lottes2005 ; Sundar2015 . These quantities should be considered with some care because they inherently contain assumptions on the bottleneck (for example that the algorithm is computebound so that floating point operations are really a cost measure). However, this depends on many aspects such as the hardware under consideration (FloptoByte ratio), the implementation strategy (matrixbased vs. matrixfree), and the optimization level of the implementation. For example, it is important to implement the matrixfree algorithms discussed here with a minimum of operations and to exploit SIMD capabilities of modern hardware. Due to these uncertainties and model assumptions of theoretical cost measures, we prefer experimental cost measures determined from the actual performance of the multigrid solver, in the spirit of Kronbichler2018 ; Bastian2019 . An effective number of finelevel matrixvector products, denoted as in the following, is helpful to incorporate computational costs for the smoother and to compare different smoothers in the metric of computational costs instead of global iteration counts. It is unclear whether more aggressive matrixbased smoothers resulting in lower iteration counts are also superior in the practically relevant timetosolution metric. The quantity is defined as the ratio of the wall time for one application of the linear solver with tolerance over the wall time for one operator evaluation. Since absolute wall times depend on the problem size, it is useful to express as a function of two normalized quantities. The first one is the efficiency of the matrixfree operator evaluation measured as the number of degrees of freedom processed per second per core (also denoted as throughput)
(16) 
The second one is the time required by the multigrid solver to solve one degree of freedom per core with a residual reduction of
(17) 
or equivalently the throughput of the linear solver in degrees of freedom solved per second per core. Then, the effective number of finelevel matixvector products is determined experimentally as
(18) 
The aim of is to obtain a measure for the algorithmic complexity of the whole multigrid solver, but as independent of hardware and absolute performance numbers as possible. The definition of has similarities with the parallel textbook efficiency factor defined in Gmeiner2015b , with the important difference that we define one finelevel matrixvector product as work unit instead of one finelevel smoothing step. Since the overall goal is optimizing timetosolution and since the operator evaluation is the only algorithmic component reoccurring for practically all iterative solvers and preconditioners, it is important to use as work unit so that the algorithmic complexity of different smoothers is reflected in the values achieved for . Furthermore, the performance advantage achieved by the use of mixedprecision multigrid is naturally included in our definition of . Table 2 summarizes our performance metrics.
Quantity  description 

number of iterations to reduce the residual by ten orders of magnitude ()  
wall time in seconds to solve one unknown per core to reach  
throughput of solver in unknowns solved per second per core ()  
throughput of matrixfree operator evaluation in unknowns processed per second per core  
effective number of finelevel matvec products () to reach 
5.2 Hardware
The numerical experiments shown in this work are performed on an Intel Skylake architecture with AVX512 vectorization. Table 3 lists the specifications of the SuperMUCNG supercomputer in Garching, Germany. The GNU compiler g++ version 7.3 with optimization flags O3 funrollloops std=c++17 march=skylakeavx512 is used. All computations are done on thin nodes unless specified otherwise. The present analysis focuses mainly on the nodelevel performance because multigrid solvers are wellknown to be scalable even to the largest supercomputers Kronbichler2018 ; Gholami2016 . The multigrid communication is between nearest neighbors, both horizontally within the matrixvector products and vertically between the multigrid levels with one roundtrip per Vcycle through the coarse solver, assuming the latter is sufficiently cheap. This is backed up by performance projections to exascale machines where multigrid is expected to be primarily memorylimited within the nodes Ibeid2018 . Good parallel scalability up to high core counts on large supercomputers when using AMG coarsegrid solvers is shown in Offermans2019 ; Offermans2016 .
Processor  Memory and Caches  

Processor type  Intel Skylake Xeon Platinum 8174  Memory per node (thin/fat)  96/768 GByte 
Frequency  2.7 GHz  Theoretical memory bandwidth  256 GByte/s 
Cores per node  48  STREAM memory bandwidth  205 GByte/s 
SIMD width  512 bit (AVX512)  Cache size (L2 + L3) per node  MByte 
5.3 Test cases
The proposed hybrid multigrid methods are investigated for a series of test cases with increasing complexity regarding the geometry and the number of elements on the coarse grid, as well as the maximum aspect ratio defined as
(19) 
where and
are the largest and smallest singular values of the Jacobian matrix
(evaluated at all quadrature points of the element). A visualization of the geometries and the meshes of the different test cases is shown in Figure 5. We consider the following problems:
Cube: the geometry is a unit cube with elements on the coarse grid. This geometry could also be discretized with a single element on the coarse grid, but we consider configurations with elements on the coarse grid to test all multigrid components and make sure that the coarsegrid problem is nontrivial (but very small). This test case is wellsuited to test the different multigrid ingredients, identify optimal multigrid coarsening strategies, perform parameter studies, study the impact of Cartesian and curved elements on iteration counts and throughput, and to compare the present implementation to the stateoftheart (since data is mainly available for simple cubelike geometries in the literature).

Nozzle: the geometry of this test case is the nozzle geometry of the FDA benchmark, which has been designed to assess CFD solvers for the simulation of the flow through medical devices Malinauskas2017 . The geometry is a tube with gradual or sudden contractions/expansions of the cross section area, inducing separating flows and involving laminar, transitional, and turbulent flow regimes. We use a coarsegrid mesh with elements in the present work. The tube and cone geometries are known analytically and used for highorder representations of the geometry via manifolds (using a cubic mapping). The blood flow through this device can be modeled as an incompressible flow, and the present work investigates the pressure Poisson component of the related incompressible Navier–Stokes solver. The mesh contains highaspectratio elements with a moderate distortion, especially in the outflow part of the domain on the right.

Lung: The most complex test case studied in this work is the geometry of the upper airway generations of the human lung, using a patientspecific geometry of a preterm infant, for which gas exchange mechanisms have been investigated recently in the literature Roth2018 . The geometry is discretized with a purely hexahedral mesh and the coarsegrid problem consists of elements for 8 airway generations. Simulating the flow of air through the human lung as a numerical solution of the incompressible Navier–Stokes equations again involves the solution of a pressure Poisson equation, which is studied in this work.


like MG  like MG 

5.4 Cube
We consider a simple analytical test case with solution
on a cube geometry in 3D, . Dirichlet boundary conditions are prescribed on the domain boundary using the known analytical solution. The righthand side is determined according to the method of manufactured solutions, . We analyze both a Cartesian mesh and a curvilinear mesh with deformation
(20) 
in each coordinate direction. An amplitude of is used, leading to elements that are deformed significantly, see Figure 5. For the curvilinear mesh with deformation manifold, element mappings of polynomial degree are used throughout this work independently of the polynomial degree of the shape functions.
5.4.1 Robustness with respect to refinement





Comments
There are no comments yet.