1 Introduction
Highorder spectral element methods (SEMs) are well established as an effective means for simulation of turbulent flow and heat transfer in a variety of engineering applications (e.g., dfm02 ; dutta2016 ; hosseini2016 ; merzari2017 ). Central to the performance and accuracy of these methods is the use of hexahedral elements, , which are represented by isoparametric mappings of the reference element in space dimensions. With such a configuration, it is possible to express all operators in a factored matrixfree form that requires only storage, where is the number of gridpoints for a mesh comprising elements of order . The fact that the storage scales as and not is a major advantage of the spectral element method that was put forth in the pioneering work of Orszag sao80 and Patera patera84 in the early 80s. The work is also low, , and can be cast as dense matrixmatrix products, which are highly efficient on modernday architectures dfm02 .
The efficiency and applicability of the SEM is tied closely to the ability to generate allhex meshes for a given computational domain. While alltet meshing is effectively a solved problem, the allhex case remains challenging in many configurations. Here, we explore Schwarz overlapping methods as an avenue to support nonconforming discretizations in the context of the SEM. Overlapping grids simplify mesh generation by allowing the user to represent the solution on a complex domain as grid functions on relatively simpler overlapping regions (also known as overset or chimera grids). These simpler overlapping regions allow grid generation where local mesh topologies are otherwise incompatible, which is a feature of particular importance for complex 3D domains and in the simulation of flows in timedependent domains subject to extreme mesh deformation. Overlapping grids also enable use of discretization of varying resolution in each domain based on the physics of the problem in that region.
Overlapping grids introduce a set of challenges that are not posed by a single conforming grid Rogersbestpractices . Using multiple meshes requires boundary condition for each interface/artificial boundary that is interior to the other domain ( and in Fig. 1). Since these surfaces are not actual domain boundaries, boundary condition data must be interpolated from a donor element in the overlapping mesh, which must be identified at the beginning of a calculation. For moving or deforming meshes, donor elements must be reidentified after each timestep. Additionally, if multiple overlapping meshes share a target donor point, it is important to pick the donor mesh that will provide the most accurate solution. For productionlevel parallel computing applications, the identification of donor element, interpolation of data, and communication must be robust and scalable. Additionally, differences in resolution of the overlapping meshes can impact global massflux balances, leading to a violation of the divergencefree constraint with potentially stability consequences for incompressible flow calculations. These issues must be addressed in order to enable scalable and accurate turbulent flow and heat transfer calculations.
Since their introduction in 1983 steger1983 , several overset methods have been developed for a variety of problems ranging from Maxwell’s Equations angel2018 to fluid dynamics chandar2018comparative to particle tracking koblitz2017 . Oversetgrid methods are also included in commercial and research software packages such as StarCCM cd2012v7 , Overflow nicholsoverflowmanual and Overture overturemanual . Most of these implementations, however, are at most fourthorder accurate in space angel2018 ; koblitz2017 ; coder2017contributions ; henshaw1994 ; brazell2016overset ; crabill2016 . Sixthorder finitedifference based methods have been presented in aarnes2018high ; nicholsoverflowmanual , while sixthorder finite volume based schemes are available in elsA cambier2013onera . A spectral element based Schwarz method presented by Merrill et. al. merrill2016 has demonstrated exponential convergence.
Here, we extend the work of Merrill et. al. merrill2016 to develop a robust, accurate, and scalable framework for overlapping grids based on the spectral element method. We start with a brief description of the SEM and existing framework which we build upon (Section 2). We discuss our approach to fast, parallel, highorder interpolation of boundary data and a massbalance correction that is critical for incompressible flow. Finally we present some applications in Section 3 which demonstrate the scalability and accuracy of our overlapping grid framework.
2 SEMSchwarz for NavierStokes
The spectral element method (SEM) was introduced by Patera for the solution of the incompressible NavierStokes equations (NSE) in patera84 . The geometry and the solution are represented using the
thorder tensorproduct polynomials on isoparametrically mapped elements. Variational projection operators are used to discretize the associated partial differential equations. While technically a
finite element method (FEM), the SEM’s strict matrixfree tensorproduct formulation leads to fast implementations that are qualitatively different than classic FE methods. SE storage is only and work scales as , where is the number of points for an element discretization of order in . (Standard type FE formulations have work and storage complexities of , which is prohibitive for .) All work terms in the SEM can be cast as fast matrixmatrix products. A central feature of the SEM is to use nodal bases on the GaussLobattoLegendre (GLL) points, which lead to an efficient and accurate (because of the high order) diagonal mass matrix. Overintegration (dealiasing) is applied only to the advection terms to ensure stability malm13 .For the unsteady NSE, we use semiimplicit BDF/EXT timestepping in which the time derivative is approximated by a thorder backward difference formula (BDF), the nonlinear terms (and any other forcing) are treated with thorder extrapolation (EXT), and the viscous, pressure, and divergence terms are treated implicitly. This approach leads to a linear unsteady Stokes problem to be solved at each timestep, which is split into independent viscous and pressure (Poisson) updates, with the pressurePoisson problem being the least well conditioned (and most expensive) substep in the time advancement. We support two spatial discretizations for the Stokes problem: the formulation with a continuous velocity space and a discontinuous pressure space; and the approach having equal order continuous velocity and pressure. Full details are provided in dfm02 ; fischer17 .
The SchwarzSEM formulation of the NSE brings forth several considerations. Like all Schwarz methods, the basic idea is to interpolate data on subdomain boundaries that are interior to from donor subdomains and to then solve the subdomain problems independently (in parallel) on different processor subsets. In principle, the NSE require only velocity boundary conditions. The first challenge is to produce that boundary data in an accurate and stable way. Spatial accuracy comes from using highorder (spectral) interpolation, as described in the next section.
For temporal accuracy, there are several new concerns. As a costsaving measure, one can simply use lagged donorvelocity values (corresponding to piecewiseconstant extrapolation) for the subdomain interface updates. While only firstorder accurate, this scheme is stable without the need for subiterations. To realize higherorder accuracy (up to ), one can extrapolate the boundary data in time. Generally, this extrapolation must be coupled with a predictorcorrector iteration for the unsteadyStokes substep. (The nonlinear term is already accurately treated by extrapolation and is stable provided one adheres to standard CFL stability limits.) Typically three to five subiterations () are needed per timestep to ensure stability peet2012 for thorder extrapolation of the interface boundary data, when . Using this approach, Merrill et al. merrill2016 demonstrated exponential convergence in space and up to thirdorder accuracy in time for SchwarzSEM flow applications.
From a practical standpoint, our SEM domain decomposition approach is enabled by using separate MPI communicators for each overlapping mesh, , which allows all of the existing solver technology (100K lines of code) to operate with minimal change. The union of these separate session (i.e., subdomain) communicators is MPI_COMM_WORLD, which is invoked for the subdomain data exchanges and for occasional collectives such as partitionofunityweighted integrals over . The data exchange and donor pointset identification is significantly streamlined through the availability of a robust interpolation library, gslibfindpts, which obviates the need for direct development of any new MPI code, as discussed next.
2.1 Interpolation
The centerpiece of our multidomain SEMbased nonconforming Schwarz code is the fast and robust interpolation utility findpts. This utility grew out of a need to support data interrogation and Lagrangian particle tracking on – processors. High fidelity interpolation for highly curved elements, like the ones supported by SEM, is quite challenging. Thus, findpts was designed with the principles of robustness (i.e., it should never fail) and speed (i.e., it should be fast).
findpts is part of gslib, a lightweight C communication package that readily links with any Fortran, C, or C++ code. findpts provides two key capabilities. First, it determines computational coordinates (element , processor and referencespace coordinates ) for any given point . Second, findpts_eval interpolates any given scalar field for a given set of computational coordinates (determined from findpts). Because interpolation is nonlocal (a point may originate from any processor), findpts and findpts_eval require interprocessor coordination and must be called by all processors in a given communicator. For efficiency reasons, interpolation calls are typically batched with all queries posted in a single call, but the work can become serialized if all target points are ultimately owned by a single processor. All communication overhead is at most by virtue of gslib’s generalized and scalable^{1}^{1}1We note that mpi_alltoall is not scalable because its interface requires arguments of length on each of processors, which is prohibitive when . alltoall utility, gs_crystal, which is based on the crystal router algorithm of fox88 .
To find a given point , findpts first uses a hash table to identify processors that could potentially own the target point . A call to gs_crystal exchanges copies of the entries between sources and potential destinations. Once there, elementwise bounding boxes further discriminate against entries passing the hash test. At that point, a trustregion based Newton optimization is used to minimize and determine the computational coordinates for that point. The initial guess in the minimization is taken to be the closest point on the mapped GLL mesh for element . In addition to returning the computational coordinates of a point, findpts also indicates whether a point was found inside an element, on a border, or was not found within the mesh. Details of the findpts algorithm are given in findpts .
In the context of our multidomain Schwarz solver, findpts is called at the level of MPI_COMM_WORLD. One simply posts all meshes to findpts_setup along with a pair of discriminators. The first discriminator is an integer field which, at the setup and execute phases, is equal to the subdomain number (the session id). In the findpts setup phase, each element in is associated with a single subdomain , and the subdomain number is passed in as the discriminator. During the findpts execute phase, one needs boundary values for interface points on , but does not want these values to be derived from elements in . All interface points associated with are tagged with the discriminator and findpts will only search elements in .
In the case of multiplyoverlapped domains, it is still possible to have more than one subdomain claim ownership of a given boundary point . To resolve such conflicts, we associate with each subdomain a local distance function, , which indicates the minimum distance from any point to . The ownership of any boundary point between two or more domains , , is taken to be the domain that maximizes . This choice is motivated by the standard Schwarz arguments, which imply that errors decay as one moves away from the interface, in accordance with decay of the associated Green’s functions. We illustrate this situation in Fig. 2 where belongs to and . In this case, interpolated values (from findpts_eval) will come from because is “more interior” to than .
2.2 MassBalance
Since pressure and divergencefree constraint are tightly coupled in incompressible flow, even small errors at interface boundaries (due to interpolation or use of overlapping meshes of disparate scales) can lead to massimbalance in the system, resulting in erroneous and unsmooth pressure contours. For a given subdomain , the mass conservation statement for incompressible flow is simply
(1) 
where
represents the outward pointing unit normal vector on
. Our goal is to find a nearby correction to the interpolated surface data that satisfies (1).Let denote the subset of the domain boundary corresponding to Dirichlet velocity conditions and be the Neumann (outflow) subset. If , then there is a potential to fail to satisfy (1) because the interpolated fluxes on may not integrate to zero. Let denote the tentative velocity field defined on through prescribed data on and interpolation on . Let be the fluxcorrected boundary data on and be the correction required to satisfy (1). One can readily show that the choice
(2) 
is the minimizer of possible tracespace corrections that allow (1) to be satisfied, provided that
(3)  
The denominator of (3) of course equates to the surface area of the interface boundary on . The correction (2) is imposed every time boundary data is interpolated between overlapping domains during Schwarz iterations.
3 Results & Applications
We now present several applications that demonstrate the performance and accuracy of the SchwarzSEM implementation.
3.1 Parallel Performance
A principal concern for the performance of Schwarz methods is the overhead associated with interpolation, especially for highorder methods, where interpolation costs scale as per interrogation point in . Our first examples address this question.
Figure 3 shows a test domain consisting of a single spectral element () in . The spiral configuration leads to many local mimima in the Newton functional, but the use of nearest GLL points as initial guesses generally avoids being trapped in false minima. On a 2.3 GHz Macbook Pro, findpts requires 0.08 seconds to find for 1000 randomly distributed points on to a tolerance of .
In addition to the findpts cost, we must be concerned with the overhead of the repeated findpts_eval calls, which are invoked once per subiteration for each Schwarz update. In a parallel setting, interpolation comes with the additional overhead of communication, which is handled automatically in time by findpts_eval. Figure 4 shows a strongscale plot of time versus number of processors for a calculation with =20,000 spectral elements at (=6.8 million grid points) on Cetus, an IBM Blue Gene/Q at the Argonne Leadership Computing Facility. For this geometry, shown in Fig. 5, there are 2000 elements at the interfaceboundary with a total of 128,000 interface points that require interpolation at each Schwarz iteration. Overlapping domains simplify mesh generation for this problem by allowing an inner mesh (twisting in the spanwise direction) to overlap with an extruded outer mesh.
Figure 4 shows scaling for overall time per step and time spent in findpts and findpts_eval. Due to the inherent load imbalance in overlapping grids, owing to the fact that all interface elements might not be located on separate processors, the scaling for findpts and findpts_eval is not ideal. However, findpts takes only 10% and findpts_eval takes 1% of time compared to the total time to solution per timestep, and as a result the scaling of the overall method is maintained. The parallel efficiency of the calculation is more than until the number of MPI ranks exceeds the number of interface elements. The parallel efficiency drops to at , which is in accord with performance models for the (monodomain) SEM fischer2015scaling .
3.2 Exact Solution for Decaying Vortices
Our next example demonstrates that the Schwarz implementation preserves the exponential convergence expected of the SEM. Walsh derived a family of exact eigenfunctions for the Stokes and NavierStokes equations based on a generalization of TaylorGreen vortices in the periodic domain
. For all integer pairs () satisfying , families of eigenfunctions can be formed by defining streamfunctions that are linear combinations of and Taking as an initial condition the eigenfunction , a solution to the NSE is . The solution is stable only for modest Reynolds numbers. Interesting longtime solutions can be realized, however, by adding a relatively highspeed mean flow to the eigenfunction. We demonstrate the multidomain capability using the three meshes illustrated in Fig. 6 (left); a periodic background mesh has a square hole in the center while a pair of circular meshes cover the hole.Exponential convergence of the velocity error with respect to is demonstrated in Fig. 6 (right). (Here, the norm is the pointwise maximum of the 2norm of the vector field, i.e., .) For extrapolation orders , 2, and 3, was set to 1, 4 and 7 to ensure stability.
3.3 Vortex Breakdown
Escudier escudier1984 studied vortex breakdown in a cylindrical container with a lid rotating at angular velocity . He considered cylinders for various different i.e. height to radius ratio, at different . Sotiropoulos & Ventikos sotiropoulos1998 did a computational study on this experiment comparing the structure and location of the bubbles that form as a result of vortex breakdown. Here, we use overlapping grids for and case to compare our results against escudier1984 , sotiropoulos1998 , and a monodomain SEM based solution. The monodomain mesh has 140 elements at . The overlapping mesh was generated by cutting the monodomain across the cylinder axis and extruding the two halves. The calculations were run with and (no subiteration) for 2000 convective timeunits to reach steady state. Figure 7 compares the axial velocity along the centerline for monodomain and overlapping grids based solution, and Table 1 compares these solution with Escudier and Sotiropoulos. At this resolution, the SchwarzSEM and SEM results agree to within 1 percent and to within 1 to 2 percent of the results of sotiropoulos1998 .
Overlapping  0.4222  0.7752  0.9596  1.1186 
Monodomain  0.4224  0.7748  0.9609  1.1191 
Escudier  0.42  0.74  1.04  1.18 
Sotiropoulos  0.42  0.772  0.928  1.09 
3.4 Turbulent Channel Flow
Turbulent boundary layers are one of the applications where overlapping grids offer the potential for significant savings. These flows feature fine scale structures near the wall with relatively larger scales in the farfield. As a first step to addressing this class of problems, we validate our SchwarzSEM scheme for turbulent channel flow, for which abundant data is available in literature. In particular, we compare mono and multidomain SEM results at Reynolds number with direct numerical simulation (DNS) results of Moser et. al. moser1999 , who used 2.1 million grid points, and with the DNS of Vreman & Kuerten vreman2014 , which used 14.2 million gridpoints. The Reynolds number is based on the friction velocity at the wall, channel halfheight and the fluid kinematic viscosity , with determined using the wall shear stress and the fluid density .
Table 2 lists the key parameters for the four different calculations. Following moser1999 and vreman2014 , the streamwise and spanwise lengths of the channel were and , respectively. Statistics for all SEM results were collected over 50 convective time units.
Gridsize  

Monodomain  179.9  
Overlapping  179.9  
Moser  178.1  
Vreman  180 
Figure 8 shows the mean streamwise velocity () and fluctuations (, and ) versus for each case, where is the distance from the nearest wall. For the Schwarz case, several combinations of resolution () and subiteration counts () were considered. We quantify the relative error for each quantity by computing the norm of the relative percent difference from the results of Vreman & Kuerten. Specifically, define
(4) 
and
(5) 
for each quantity = , , and in Table 3. All statistics are within one percent of the results of vreman2014 . The results indicate that increasing the number of Schwarz iterations and resolution leads to better comparison, as expected.
Monodomain,  0.17  1.00  0.60  0.57 
Monodomain,  0.16  0.46  0.49  0.71 
Overlapping, , , = 1  0.43  1.69  0.89  0.77 
Overlapping, , , = 3  0.21  0.92  0.60  1.05 
Overlapping, , , = 1  0.17  1.58  0.74  0.31 
Moser et. al.  0.04  0.15  0.52  1.62 
3.5 Heat Transfer Enhancement
The effectiveness of wirecoil inserts to increase heat transfer in pipe flow has been studied through an extensive set of experiments by Collins et al. at Argonne National Laboratory collins2002 . Monodomain spectral element simulations for this configuration based on 2D meshes that were extruded and twisted were described in goering2016 . Here, we consider an overlapping mesh approach pictured in Fig. 9, in which a 2D mesh is extruded azimuthally with a helical pitch. The singularity at the pipe centeriline is avoided by using a second (overlapping) mesh to model the central flow channel. The overlapping mesh avoids the topological constraint of mesh conformity and leads to better mesh quality. Mesh smoothing mittal2018 further improves the conditioning of the system for the pressure Poisson equation. This approach also allows us to consider noncircular (e.g., square) casings, which are not readily accessible with the monodomain approach. As a first step towards these more complex configurations, we validate our SchwarzSEM method with this realworld turbulent heat transfer problem.
Here, we present results from calculations for two different pitches of the wirecoil insert to demonstrate the accuracy of our method. The geometric parameters were (longpitch) and (optimalpitch), with , where and are the respective wire diameter and pitch and is the inner diameter of the pipe. The Reynolds number of the flow is and the Prandtl number is 5.8. Figure 9 shows a typical wirecoil insert (left), overlapping meshes used to discretize the problem (center), and a plot of velocity magnitude for flow at (right).
For , the Nusselt number was determined to be 100.25 by experimental calculation, 112.89 by the monodomain calculation, and 113.34 by the overlappinggrid calculation. For , was determined to be 184.25 by the experiment, and 186 by the overlappinggrid calculation. The overlapping grid calculations were spatially converged and used and . Figure 10 shows a slice of the velocity and temperature contours for the optimal pitch calculation.
4 Conclusion
We have presented a scalable SchwarzSEM method for incompressible flow simulation. Use of an extended gslibfindpts library for donorelement identification and for data interpolation obviated the need for direct development of MPI code, even in the presence of multiple overlapping meshes. Strong scaling tests show that the parallel performance meets theoretical expectations. We have introduced massflux corrections to ensure divergencefree flow, even in the presence of interpolation error. We demonstrated exponential convergence for the method and showed excellent agreememnt in several complex threedimensional flow configurations.
5 Acknowledgments
This work was supported by the U.S. Department of Energy, Office of Science, the Office of Advanced Scientific Computing Research, under Contract DEAC0206CH11357. An award of computer time on Blue Waters was provided by the National Center for Supercomputing Applications. Blue Waters is a sustainedpetascale HPC and is a joint effort of the University of Illinois at UrbanaChampaign and its National Center for Supercomputing Applications. The Blue Waters sustainedpetascale computing project is supported by the National Science Foundation (awards OCI0725070 and ACI1238993) and the state of Illinois. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility.
6 References
References
 (1) M. O. Deville, P. F. Fischer, E. H. Mund, Highorder methods for incompressible fluid flow, Vol. 9, Cambridge University Press, 2002.
 (2) S. Dutta, P. Fischer, M. H. Garcia, Large Eddy Simulation (LES) of flow and bedload transport at an idealized 90degree diversion: Insight into Bulle effect, River Flow 2016: Iowa City, USA, July 1114, 2016 (2016) 101.
 (3) S. M. Hosseini, R. Vinuesa, P. Schlatter, A. Hanifi, D. S. Henningson, Direct numerical simulation of the flow around a wing section at moderate reynolds number, International Journal of Heat and Fluid Flow 61 (2016) 117–128.
 (4) E. Merzari, A. Obabko, P. Fischer, Spectral element methods for liquid metal reactors applications, arXiv preprint arXiv:1711.09307.
 (5) S. A. Orszag, Spectral methods for problems in complex geometrics, in: Numerical methods for partial differential equations, Elsevier, 1979, pp. 273–305.
 (6) A. T. Patera, A spectral element method for fluid dynamics: laminar flow in a channel expansion, Journal of computational Physics 54 (3) (1984) 468–488.
 (7) W. Chan, R. Gomez, Rogers se, buning pg. best practices in overset grid generation. aiaa# 20023191, in: 32nd Fluid Dynamics Conference, St. Louis MI, 2002.
 (8) J. L. Steger, F. C. Dougherty, J. A. Benek, A chimera grid scheme.[multiple overset bodyconforming mesh system for finite difference adaptation to complex aircraft configurations].
 (9) J. B. Angel, J. W. Banks, W. D. Henshaw, A highorder accurate fdtd scheme for maxwell’s equations on overset grids, in: Applied Computational Electromagnetics Society Symposium (ACES), 2018 International, IEEE, 2018, pp. 1–2.
 (10) D. D. Chandar, B. Boppana, V. Kumar, A comparative study of different overset grid solvers between openfoam, starccm+ and ansysfluent, in: 2018 AIAA Aerospace Sciences Meeting, 2018, p. 1564.
 (11) A. Koblitz, S. Lovett, N. Nikiforakis, W. Henshaw, Direct numerical simulation of particulate flows with an overset grid method, Journal of Computational Physics 343 (2017) 414–431.
 (12) S.C. CDadapco, V7. 02.008, User Manual.
 (13) R. H. Nichols, P. G. Buning, User’s manual for overflow 2.1, University of Alabama and NASA Langley Research Center.
 (14) F. Bassetti, D. Brown, K. Davis, W. Henshaw, D. Quinlan, Overture: an objectoriented framework for high performance scientific computing, in: Proceedings of the 1998 ACM/IEEE conference on Supercomputing, IEEE Computer Society, 1998, pp. 1–9.
 (15) J. G. Coder, D. Hue, G. Kenway, T. H. Pulliam, A. J. Sclafani, L. Serrano, J. C. Vassberg, Contributions to the sixth drag prediction workshop using structured, overset grid methods, Journal of Aircraft (2017) 1–14.
 (16) W. D. Henshaw, A fourthorder accurate method for the incompressible navierstokes equations on overlapping grids, Journal of computational physics 113 (1) (1994) 13–25.
 (17) M. J. Brazell, J. Sitaraman, D. J. Mavriplis, An overset mesh approach for 3d mixed element highorder discretizations, Journal of Computational Physics 322 (2016) 33–51.
 (18) J. A. Crabill, J. Sitaraman, A. Jameson, A highorder overset method on moving and deforming grids, in: AIAA Modeling and Simulation Technologies Conference, 2016, p. 3225.
 (19) J. Aarnes, N. Haugen, H. Andersson, Highorder overset grid method for detecting particle impaction on a cylinder in a cross flow, arXiv preprint arXiv:1805.10039.
 (20) L. Cambier, S. Heib, S. Plot, The onera elsa cfd software: input from research and feedback from industry, Mechanics & Industry 14 (3) (2013) 159–174.
 (21) B. E. Merrill, Y. T. Peet, P. F. Fischer, J. W. Lottes, A spectrally accurate method for overlapping grid solution of incompressible navier–stokes equations, Journal of Computational Physics 307 (2016) 60–93.

(22)
J. Malm, P. Schlatter, P. F. Fischer, D. S. Henningson, Stabilization of the spectral element method in convection dominated flows by recovery of skewsymmetry, Journal of Scientific Computing 57 (2) (2013) 254–277.
 (23) P. Fischer, M. Schmitt, A. Tomboulides, Recent developments in spectral element simulations of movingdomain problems, in: Recent Progress and Modern Challenges in Applied Mathematics, Modeling and Computational Science, Springer, 2017, pp. 213–244.
 (24) Y. T. Peet, P. F. Fischer, Stability analysis of interface temporal discretization in grid overlapping methods, SIAM Journal on Numerical Analysis 50 (6) (2012) 3375–3401.

(25)
G. Fox, W. Furmanski, Hypercube algorithms for neural network simulation: the crystal_accumulator and the crystal router, in: Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issuesVolume 1, ACM, 1988, pp. 714–724.
 (26) A. Noorani, A. Peplinski, P. Schlatter, Informal introduction to program structure of spectral interpolation in nek5000.
 (27) P. F. Fischer, Scaling limits for pdebased simulation, in: 22nd AIAA Computational Fluid Dynamics Conference, 2015, p. 3049.
 (28) M. Escudier, Observations of the flow produced in a cylindrical container by a rotating endwall, Experiments in fluids 2 (4) (1984) 189–196.
 (29) F. Sotiropoulos, Y. Ventikos, Transition from bubbletype vortex breakdown to columnar vortex in a confined swirling flow, International journal of heat and fluid flow 19 (5) (1998) 446–458.
 (30) R. D. Moser, J. Kim, N. N. Mansour, Direct numerical simulation of turbulent channel flow up to re = 590, Physics of fluids 11 (4) (1999) 943–945.
 (31) A. Vreman, J. G. Kuerten, Comparison of direct numerical simulation databases of turbulent channel flow at re = 180, Physics of Fluids 26 (1) (2014) 015102.
 (32) J. T. Collins, C. M. Conley, J. N. Attig, M. M. Baehl, Enhanced heat transfer using wirecoil inserts for highheatload applications., Tech. rep., Argonne National Lab., IL (US) (2002).
 (33) A. Y. Goering, Numerical investigation of wire coil heat transfer augmentation, Ph.D. thesis (2016).
 (34) K. Mittal, P. Fischer, Mesh smoothing for the spectral element method, Journal of Scientific Computing (2018) 1–22.