Nektar++: enhancing the capability and application of high-fidelity spectral/hp element methods

06/08/2019 ∙ by David Moxey, et al. ∙ 0

Nektar++ is an open-source framework that provides a flexible, high-performance and scalable platform for the development of solvers for partial differential equations using the high-order spectral/hp element method. In particular, Nektar++ aims to overcome the complex implementation challenges that are often associated with high-order methods, thereby allowing them to be more readily used in a wide range of application areas. In this paper, we present the algorithmic, implementation and application developments associated with our Nektar++ version 5.0 release. We describe some of the key software and performance developments, including our strategies on parallel I/O, on in situ processing, the use of collective operations for exploiting current and emerging hardware, and interfaces to enable multi-solver coupling. Furthermore, we provide details on a newly developed Python interface that enables a more rapid introduction for new users unfamiliar with spectral/hp element methods, C++ and/or Nektar++. This release also incorporates a number of numerical method developments - in particular: the method of moving frames, which provides an additional approach for the simulation of equations on embedded curvilinear manifolds and domains; a means of handling spatially variable polynomial order; and a novel technique for quasi-3D simulations to permit spatially-varying perturbations to the geometry in the homogeneous direction. Finally, we demonstrate the new application-level features provided in this release, namely: a facility for generating high-order curvilinear meshes called NekMesh; a novel new AcousticSolver for aeroacoustic problems; our development of a 'thick' strip model for the modelling of fluid-structure interaction problems in the context of vortex-induced vibrations. We conclude by commenting some directions for future code development and expansion.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

page 13

page 14

page 15

page 17

page 18

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

High-order finite element methods are becoming increasingly popular in both academia and industry, as scientists and technological innovators strive to increase the fidelity and accuracy of their simulations whilst retaining computational efficiency. The spectral/ element method in particular, which combines the geometric flexibility of classical linear finite element methods with the attractive convergence properties of spectral discretisations, can yield a number of advantages in this regard. From a numerical analysis perspective, their diffusion and dispersion characteristics mean that they are ideally suited to applications such as fluid dynamics, where flow structures must be convected across long time- and length-scales without suffering from artificial dissipation moura2017eddy ; moura2017setting ; mengaldo2018spatial_a ; mengaldo2018spatial_b ; fernandez2019non

High-order methods are also less computationally costly than traditional low-order numerical schemes for a given number of degrees of freedom, owing to their ability to exploit a more locally compact and dense elemental operators when compared to sparse low-order discretisations 

VoShKi10 ; CaShKiKe11a ; CaShKiKe11b . In addition, high-order methods in various formulations can be seen to encapsulate other existing methods, such as finite volume, finite difference (e.g. summation-by-parts finite difference gassner2013skew ), finite element, and flux reconstruction approaches mengaldo2016connections ; mengaldo2015discontinuous . All of these features make the spectral/ element method an extremely attractive tool to practitioners.

As the name suggests, the spectral/ element method relies on the tesselation of a computational domain into a collection of elements of arbitrary size that form a mesh, where each element is equipped with a polynomial expansion of arbitrary and potentially spatially-variable order KaSh05 . Within this definition, we include continuous Galerkin (CG) and discontinuous Galerkin (DG) methods, along with their variants. High-order methods have been historically seen as complex to implement, and their adoption has been consequently limited to academic groups and numerical analysts. This mantra is rapidly being removed thanks to the development of open-source numerical libraries that facilitate the implementation of new high-fidelity solvers for the solution of problems arising across a broad spectrum of research areas in engineering, biomedicine, numerical weather and climate prediction, and economics. An additional challenge in the use of high-order methods, particularly for problems involving complex geometries, is the generation of a curvilinear mesh that conforms to the underlying curves and surfaces. However, advances in curvilinear mesh generation (such as turner2018curvilinear ), combined with open-source efforts to increase their prevalence, mean that simulations across extremely complex geometries are now possible.

Nektar++ is a project initiated in 2005 (the first commit to an online repository was made on 4 May 2006), with the aim of facilitating the development of high-fidelity computationally-efficient, and scalable spectral element solvers, thereby helping close the gap between application-domain experts (or users), and numerical-method experts (or developers). Along with Nektar++

, other packages that implement such high-order methods have been developed in the past several years. Nek5000, developed at Argonne National Laboratory, implements CG discretizations mainly for solving incompressible and low-Mach number flow problems on hexahedral meshes using the classical spectral element method of collocated Lagrange interpolants 

Nek5000 . Semtex BlSh04 is a fluid dynamics code that also uses the classical spectral element method in two dimensions, with the use of a 1D pseudospectral Fourier expansion for three dimensional problems that incorporate a homogeneous component of geometry, either in Cartesian or cylindrical coordinate systems. deal.II BaHaKa07 is a more generic finite element framework, which likewise restricts its element choices to quadrilaterals and hexahedra, but permits the solution of a wide array of problems, alongside the use of adaptive mesh refinement. Flexi hindenlang2012explicit and its spinoff Fluxo fluxo_SplitForm , developed at the University of Stuttgart and at the University of Cologne, implement discontinuous Galerkin methods for flow problems on hexahedral meshes. GNuME, and its NUMO and NUMA components developed at the Naval Postgraduate School, implement both continuous and discontinuous Galerkin methods mainly for weather and climate prediction purposes giraldo2008study ; abdi2016efficient . PyFR WiFaVi14 , developed at Imperial College London, implements the flux reconstruction approach huynh2007flux which shares various numerical properties with DG in particular mengaldo2016connections ; allaneau2011connections . DUNE implements a DG formulation, among a wide variety of other numerical methods such as finite difference and finite volume methods DeKlNoOh11 .

Nektar++ is a continuation of an earlier code Nektar, itself developed at Brown University originally using the C programming language, with some parts extended to use C++. Nektar++ is instead written using the C++ language, and greatly exploits its object-oriented capabilities. The aim of Nektar++ is to encapsulate many of the high-order discretisation strategies mentioned previously, in a readily accessible framework. The current release includes both CG and DG methods and, arguably, its distinguishing feature is its inherent support for complex geometries through various unstructured high-order element types; namely hexahedra, tetrahedra, prisms and pyramids for three-dimensional problems, and quadrilaterals and triangles for two-dimensional problems. Both CG and DG can be used on meshes that contain different element shapes (also referred to as hybrid meshes), and allow for curvilinear element boundaries in proximity of curved geometrical shapes. Along with these spatial discretizations, Nektar++ supports so-called quasi-3D simulations in a manner similar to Semtex, where a geometrically complex 2D spectral element mesh is combined with a classical 1D Fourier expansion in a third, geometrically homogeneous, direction. This mixed formulation can significantly enhance computational efficiency for problems of the appropriate geometry BlSh04 and Nektar++ supports a number of different parallelisation strategies for this approach bolis2016adaptable . The time discretization is achieved using a general linear method formulation for the encapsulation of implicit, explicit and mixed implicit-explicit timestepping schemes VoEsBoChKi11 . While the main purpose of the library is to create an environment under which users can develop novel solvers for the applications of their interest, Nektar++ already includes fully-fledged solvers for the solution of several common systems, including fluid flows governed either by the incompressible or compressible Navier-Stokes and Euler equations; advection-diffusion-reaction problems, including on a manifold, with specific applications to cardiac electrophysiology CaYaKiPeSh14 . One of the main shortcomings of the spectral/ element method is related to a perceived lack of robustness, arising from low dissipative properties, which can be a significant challenge for industrial applications. Nektar++ implements several techniques to address this problem, namely efficient dealiasing techniques mengaldo2015dealiasing ; winters2018comparative and spectral vanishing viscosity kirby2006stabilisation , that have proved invaluable for particularly challenging applications lombard2015implicit .

The scope of this review is to highlight the substantial number of new developments in Nektar++ since the last publication related to the software, released in 2015 and coinciding with the version 4 release of Nektar++ nektarpp2015 . Since this release, over 7,000 commits have been added to the main code for the version 5 documented here, with a key focus on expanding the capability of the code to provide efficient high-fidelity simulations for challenging problems in various scientific and engineering fields. To this end, the paper is organized as follows. After a brief review of the formulation in Section 2, in Section 3 we present our software and performance developments. This includes our strategies on parallel I/O; in situ processing; the use of collective linear algebra operations for exploiting current and emerging hardware; and interfaces for multi-solver coupling to enable multi-physics simulations using Nektar++. Furthermore, we provide details on our new Python interfaces that enable more rapid on-boarding of new users unfamiliar with either spectral/ element methods, C++ or Nektar++. In Section 4, we then present recent numerical method developments – in particular, the method of moving frames (MMF); recently added support for spatially-variable polynomial order for -adaptive simulations; and new ways of incorporating global mappings to enable spatially variable quasi-3D approaches. In Section 5, we then demonstrate some of the new features provided in our new release, namely: our new facility for generating high-order (curvilinear) meshes called NekMesh; a new AcousticSolver for aeroacoustic problems; and our development of a ‘thick’ strip model for enabling the solution of fluid-structure interaction (FSI) problems in the context of vortex-induced vibrations (VIV). We conclude in Section 7 by commenting on some lessons learned and by discussing some directions for future code development and expansion.

Contributors

Nektar++ has been developed across more than a decade, and we would like to explicitly acknowledge the many people who have made contributions to the specific application codes distributed with the libraries. In addition to the coauthors of the previous publication nektarpp2015 we would like to explicitly thank the following for their contributions:

  • Dr. Rheeda Ali (Department of Biomedical Engineering, Johns Hopkins University, USA) and Dr. Caroline Roney (Biomedical Engineering Department, King’s College London, UK) for their work on the cardiac electrophysiology solver;

  • Dr. Michael Barbour and Dr. Kurt Sansom (Department of Mechnical Engineering, University of Washington, USA) for developments related to biological flows;

  • Mr. Filipe Buscariolo (Department of Aeronautics, Imperial College London, UK) for contributions to the incompressible Navier-Stokes solver;

  • Dr. Jeremy Cohen (Department of Aeronautics, Imperial College London, UK) for work relating to cloud deployment and the Nekkloud interface;

  • Mr. Ryan Denny (Department of Aeronautics, Imperial College London, UK) for enhancements in the 1D arterial pulse wave model;

  • Mr. Jan Eichstädt (Department of Aeronautics, Imperial College London, UK) for initial investigations towards using many-core and GPU platforms;

  • Dr. Stanisław Gepner (Faculty of Power and Aeronautical Engineering, Warsaw University of Technology, Poland) for enhancements in the Navier-Stokes linear stability solver;

  • Mr. Dav de St. Germain (SCI Institute, University of Utah, USA) for enhancements of timestepping schemes;

  • Mr. Ashok Jallepalli (SCI Institute, University of Utah, USA) for initial efforts on the integration of SIAC filters into post-processing routines;

  • Prof. Dr. Johannes Janicka and Dr. Kilian Lackhove (Department of Energy and Power Plant Technology), Technische Universität Darmstadt, Germany), for support and development of the acoustic solver and solver coupling;

  • Mr. Edward Laughton (College of Engineering, Mathematics and Physical Sciences, University of Exeter, UK) for testing enhancements and initial efforts on non-conformal grids;

  • Dr. Rodrigo Moura (Divisão de Engenharia Aeronáutica, Instituto Tecnológico de Aeronáutica, Brasil) for numerical developments related to spectral vanishing viscosity stabilisation;

  • Dr. Rupert Nash and Dr. Michael Bareford (EPCC, University of Edinburgh, UK) for their work on parallel I/O; and

  • Mr. Zhenguo Yan and Mr. Yu Pan (Department of Aeronautics, Imperial College London, UK) for development of the compressible flow solver;

2 Methods

In this first section, we outline the mathematical framework that underpins Nektar++, as originally presented in nektarpp2015 ; jhd . Nektar++ supports a variety of spatial discretisation choices, primarily based around the continuous and discontinuous Galerkin methods (CG and DG). However, in the majority of cases CG and DG use the same generic numerical infrastructure. Here we therefore present a brief overview and refer the reader to KaSh05 for further details, which contains a comprehensive description of the method and its corresponding implementation choices.

Without loss of generality, the broad goal of Nektar++ is to provide a framework for the numerical solution of partial differential equations (PDEs) of the form on a domain , which may be geometrically complex, for some solution . Practically, in order to carry out a spatial discretisation of the PDE problem, needs to be partitioned into a finite number of -dimensional non-overlapping elements , where in Nektar++ we support , such that the collection of all elements recovers the original region () and for , is an empty set or an interface of dimension . The domain may be embedded in a space of equal or higher dimension, , as described in CaYaKiPeSh14 . One of the distinguishing features of Nektar++ is that it supports a wide variety of elements: namely triangles and quadrilaterals in two dimensions, and; tetrahedra, pyramids, prisms and hexahedra in three dimensions. This makes it broadly suitable for the solution of problems in complex domains, in which hybrid meshes of multiple elements are generally required.

Nektar++ supports the solution of PDE systems that are either steady-state or time-dependent; in the case of time-dependent cases, there is subsequently a choice to use either explicit, implicit or implicit-explicit timestepping. From an implementation and formulation perspective, steady-state and implicit-type problems typically require the efficient solution of a system of linear equations, whereas explicit-type problems rely on the evaluation of the spatially discretised mathematical operators. In the following sections, we briefly outline the support in Nektar++ for these regimes.

2.1 Implicit-type methods

This approach follows the standard finite element approach, so that before establishing the spatial discretisation, the abstract operator form is formulated in the weak sense alongside appropriate test and trial spaces and . In general, we require at least a first-order derivative and select, for example, , where

Following the Galerkin approach, we select . We note that where problems involve Dirichlet boundary conditions on a boundary of the form , we typically enforce this by lifting . For illustrative purposes, we assume that is linear and its corresponding weak form can be expressed as: find such that

(1)

where is a bilinear form and is a linear form.

To numerically solve the problem given in Equation 1 with the spatial partition of , we consider solutions in an appropriate finite-dimensional subspace . In a high-order setting, these spaces correspond to the choice of spatial discretisation on the mesh. For example, in the discontinous setting we choose that

where represents the space of polynomials on up to total degree , so that functions are permitted to be discontinuous across elemental boundaries. In the continuous setting, we select the same space intersected with , so that expansions are continuous across elemental boundaries. The solution to the weak form in Equation (1) can then be cast as: find such that

(2)

Assuming that the solution can be represented as , a weighted sum of trial functions defined on , the problem then becomes that of finding the coefficients , which in the Galerkin approach translates into the solution of a system of linear equations.

collapsed coordinates

reference coordinates

Figure 1: Coordinate systems and mappings between collapsed coordinates , reference coordinates and Cartesian coordinates for a high-order triangular element .

To establish the global basis , we first consider the contributions from each element in the domain. To simplify the definition of basis functions on each element, we follow the standard approach where is mapped from a standard reference space by a parametric mapping , so that . Here, is one of the supported region shapes in Table 1 and are -dimensional coordinates representing positions in a reference element, distinguishing them from which are -dimensional coordinates in the Cartesian coordinate space. On triangular, tetrahedral, prismatic and pyramid elements, one or more of the coordinate directions of a quadrilateral or hexahedral region are collapsed to form the appropriate reference shape, creating one or more singular vertices within these regions Du91 ; ShKa95

. Operations, such as calculating derivatives, map the tensor-product coordinate system to these shapes through Duffy transformations 

Du82 — for example, maps the triangular region to the quadrilateral region — to allow these methods to be well-defined. The relationship between these coordinates is depicted in Figure 1.

The mapping need not necessarily exhibit a constant Jacobian, so that the resulting element is deformed as shown in Figure 1. Nektar++ represents the curvature of these elements by taking a sub- or iso-parametric mapping for , so that the curvature is defined using at least a subset of the basis functions used to represent the solution field. The ability to use such elements in high-order simulations is critical in the simulation of complex geometries, as without curvilinear elements, one could not accurately represent the underlying curves and surfaces of the geometry, as demonstrated in bassi1997high . The generation of meshes involving curved elements is, however, a challenging problem. Our efforts to generate such meshes, as well as to adapt linear meshes for high-order simulations, are implemented in the NekMesh generator tool described in Section 5.1.

Name Class Domain definition
Segment StdSeg
Quadrilateral StdQuad
Triangle StdTri
Hexahedron StdHex
Prism StdPrism
Pyramid StdPyr
Tetrahedron StdTet
Table 1: List of supported elemental reference regions.

With the mapping and the transformation the discrete approximation to the solution on a single element can then be expressed as

where form a basis of ; i.e. a local polynomial basis need only be constructed on each reference element. A one-dimensional order- basis is a set of polynomials , , defined on the reference segment, . The choice of basis is usually made based on its mathematical or numerical properties and may be modal or nodal in nature. For two- and three-dimensional regions, a tensorial basis may be employed, where the polynomial space is constructed as the tensor-product of one-dimensional bases on segments, quadrilaterals or hexahedral regions. In spectral/ element methods, a common choice is to use a modified hierarchical Legendre basis (a ‘bubble’-like polynomial basis), given as a function of one variable by

which supports boundary-interior decomposition and therefore improves numerical efficiency when solving the globally assembled system. Equivalently, could also be defined by the Lagrange polynomials through the Gauss-Lobatto-Legendre quadrature points which would lead to a traditional spectral element method. In higher dimensions, a tensor product of either basis can be used on quadrilateral and hexahedral elements respectively. On the other hand, the use of a collapsed coordinate system also permits the use of a tensor product modal basis for the triangle, tetrahedron, prism and pyramid, which when combined with contraction techniques can yield performance improvements. This aspect is considered further in Section 3.3 and moxey-2016b ; moxey-2019b .

Elemental contributions to the solution may be assembled to form a global solution through an assembly operator. In a continuous Galerkin setting, the assembly operator sums contributions from neighbouring elements to enforce the

-continuity requirement. In a discontinuous Galerkin formulation, such mappings transfer flux values from the element interfaces into the global solution vector. For elliptic operators,

Nektar++ has a wide range of implementation choices available to improve computational performance. A common choice is the use of a (possibly multi-level) static condensation of the assembled system, where a global system is formed only on elemental boundary degrees of freedom. This is supported both for the classical continuous framework, as well as in the DG method which gives rise to the hybridizable discontinuous Galerkin (HDG) approach yakovlev-2016 , in which a global system is solved on the trace or skeleton of the overall mesh.

2.2 Explicit-type methods

Nektar++ has extensive support for the solution of problems in a time-explicit manner, which requires the evaluation of discretised spatial operators alongside projection into the appropriate space. As the construction of the implicit operators requires these same operator evaluations, most of the formulation previously discussed directly translates to this approach. We do note however that there is a particular focus on the discontinous Galerkin method for multi-dimensional hyperbolic systems of the form

such as the acoustic perturbation equations that we discuss in Section 5.2 and the compressible Navier-Stokes system used for aerodynamics simulations in Section 5.4. In this setting, on a single element, and further assuming is zero for simplicity of presentation, we multiply the above equation by an appropriate test function and integrate by parts to obtain

In the above, denotes a numerically calculated boundary flux determined through the use of an upwinding or Riemann solver depending on the element-interior velocity and its neighbour’s velocity , and is specific to the system to be solved. Where second-order diffusive terms are required, Nektar++ supports the use of a local discontinous Galerkin (LDG) approach to minimize the stencil required for communication cockburn1998local . From a solver perspective, the implementation is fairly generic, requiring only the evaluation of the flux term , conservation law and right-hand side source terms .

2.3 Recap of Nektar++ implementation

In this section, we briefly outline the implementation of these methods inside Nektar++. Further details on the overall design of Nektar++, as well as examples of how to use it, can be found in the previous publication nektarpp2015 .

The core of Nektar++ comprises six libraries which are designed to emulate the general mathematical formulation expressed above. They describe the problem in a hierarchical manner, by working from elemental basis functions and shapes through to a C++ representation of a high-order field and complete systems of partial differential equations on a complex computational domain. Depending on the specific application, this then allows developers to choose an appropriate level for their needs, balancing ease of use at the highest level with fine-level implementation details at the lowest. A summary of each library’s purpose is the following:

  • LibUtilites: elemental basis functions , quadrature point distributions and basic building blocks such as I/O handling;

  • StdRegions: reference regions along with the definition of key finite element operations across them, such as integration, differentiation and transforms;

  • SpatialDomains: the geometric mappings and factors , as well as Jacobians of the mappings and the construction of the topology of the mesh from the input geometry;

  • LocalRegions: physical regions in the domain, composing a reference region with a map , extensions of core operations onto these regions;

  • MultiRegions: list of physical regions comprising , global assembly maps which may optionally enforce continuity, construction and solution of global linear systems, extension of local core operations to domain-level operations; and

  • SolverUtils: building blocks for developing complete solvers.

In version 5.0, four additional libraries have been included. Each of these can be seen as a non-core, in the sense that they provide additional functionality to the core libraries above:

  • Collections: encapsulates the implementation of key kernels (such as inner products and transforms) with an emphasis on evaluating operators collectively for similar elements;

  • GlobalMapping: implements a mapping technique that allows quasi-3D simulations (i.e. those using a hybrid 2D spectral element/1D Fourier spectral discretisation) to define spatially-inhomogeneous deformations;

  • NekMeshUtils: contains interfaces for CAD engines and key mesh generation routines, to be used by the NekMesh mesh generator; and

  • FieldUtils: defines various post-processing modules that can be used both by the post-processing utility FieldConvert, as well as solvers for in-situ processing.

We describe the purpose of these libraries in greater detail in Sections 3.3, 4.3, 5.1 and 3.2 respectively.

3 Software and Performance Developments

This section reviews the software and performance developments added to Nektar++ since our last major release. We note that a significant change from previous releases is the use of C++11-specific language features throughout the framework, which alongside the many developments outlined here, further motivates the release of a new major version of the code. The developments described in this section are primarily geared towards our continuing efforts to support our users on large-scale high-performance computing (HPC) systems.

3.1 Parallel I/O

Although the core of Nektar++ has offered efficient parallel scaling for some time (as reported in previous work nektarpp2015 ), one aspect that has been improved substantially in the latest release is support for parallel I/O, both during the setup phase of the simulation and when writing checkpoints of field data for unsteady simulations. In both cases, we have added support for new, parallel-friendly mesh input files and data checkpoint files that use the HDF5 file format folk2011overview , in combination with Message Passing Interface (MPI) I/O, to significantly reduce bottlenecks relating to the use of parallel filesystems. This approach enables Nektar++ to either read or write a single file across all processes, as opposed to a file-per-rank output scheme that can place significant pressure on parallel filesystems, particularly during the partitioning phase of the simulation. Here we discuss the implementation of the mesh input file format; details regarding the field output can be found in barefordimproving .

One of the key challenges identified in the use of Nektar++ within large-scale HPC environments is the use of an XML-based input format used for defining the mesh topology. Although XML is highly convenient from the perspective of readability and portability, particularly for small simulations, the format does pose a significant challenge at larger scales since, when running in parallel, there is no straightforward way to extract a part of an XML file on each process. This means that in the initial phase of the simulation, where the mesh is partitioned into smaller chunks that run on each process, there is a need for at least one process to read the entire XML file. Even at higher orders, where meshes are typically coarse to reflect the additional degrees of freedom in each element, detailed simulations of complex geometries typically require large, unstructured meshes of millions of high-order elements. Having only a single process read this file therefore imposes a natural limit to the strong scaling of the setup phase of simulations – that is, the maximum number of processes that can be used – due to the large memory requirement and processing time to produce partitioned meshes. It also imposes potentially severe restrictions on start-up time of simulations and/or the post-processing of data, hindering throughput for very large cases.

Although various approaches have been used to partially mitigate this restriction, such as pre-partitioning the mesh before the start of the simulation and utilising a compressed XML format that reduces file sizes and the XML tree size, these do not themselves cure the problem entirely. In the latest version of Nektar++ we address this issue with a new Hierarchical Data Format v5 (HDF5) file format. To design the layout of this file, we recall that the structure of a basic Nektar++ mesh requires the following storage:

  • Vertices of the mesh are defined using double-precision floating point numbers for their three coordinates. Each vertex has a unique ID.

  • All other elements are defined using integer IDs in a hierarchical manner; for example in three dimensions edges are formed from pairs of vertices, faces from or edges and elements from a collection of faces.

This hierarchical definition clearly aligns with the HDF5 data layout. To accommodate the ‘mapping’ of a given ID into a tuple of IDs or vertex coordinates, we adopt the following basic structure:

  • The mesh group contains multi-dimensional datasets that define the elements of a given dimension. For example, given quadrilaterals, the quad dataset within the mesh group is a dataset of integers, where each row denotes the 4 integer IDs of edges that define that quadrilateral.

  • The maps group contains one-dimensional datasets that define the IDs of each row of the corresponding two-dimensional dataset inside mesh.

An example of this structure for a simple quadrilateral mesh is given in Figure 2. We also define additional datasets to define element curvature and other ancillary structures such as boundary regions.

vertex 0

vertex 1

vertex 2

vertex 3

edge 0

edge 1

edge 2

edge 3
(a) 2D quadrilateral mesh
(b) Schematic HDF specification
Figure 2: HDF5 specification of a single quadrilateral mesh.

When running in parallel, Nektar++ adopts a domain decomposition strategy, whereby the mesh is partitioned into a subset of the whole domain for each process. This can be done either at the start of the simulation, or prior to running it. Parallelisation is achieved using the standard MPI protocol, where each process is independently executed and there is no explicit use of shared memory in program code. Under the new HDF5 format, we perform a parallel partitioning strategy at startup. Each process is initially assigned a roughly equal number of elements to read by examining the size of the appropriate elemental datasets. Each process constructs the dual graph corresponding to this subdomain and links to other processes’ subdomains are established by using ghost nodes to those processes’ nodes. The dual graph is then passed to the PT-Scotch library chevalier-2008 to perform partitioning in parallel on either the full system or a subset of processes, depending on the size of the graph. Once the resulting graph is partitioned, the datasets are read in parallel using a top-down process; i.e. in the context of Figure 2, the quad dataset, followed by the seg dataset, followed by the vert dataset. At each stage, each processor only reads the geometric entities that are required for its own partition, which is achieved through the use of HDF5 selection routines when reading the datasets. The Nektar++ geometry objects are then constructed from these data in a bottom-up manner (i.e. vertices, followed by edges, followed by faces) as required by each processor. Curvature information is also read at this stage as required. Finally, ancillary information such as composites and domain definition are read from the remaining datasets.

The new HDF5 based format is typically significantly faster than the existing XML format to perform the initial partitioning phase of the simulation. Notably, whereas execution times for the XML format increase with the number of nodes being used (likely owing to the file that must be written for each rank by the root processor), the HDF5 input time remains roughly constant. We note that the HDF5 format also provides benefits for the post-processing of large simulation data, as the FieldConvert utility is capable of using this format for parallel post-processing of data.

3.2 In-situ processing

The increasing capabilities of high-performance computing facilities allow us to perform simulations with a large number of degrees of freedom which leads to challenges in terms of post-processing. The first problem arises when we consider the storage requirements of the complete solution of these simulations. Tasks such as generating animations, where we need to consider the solution at many time instances, may become infeasible if we have to store the complete fields at each time instance. Another difficulty occurs due to the memory requirements of loading the solution for post-processing. Although this can be alleviated by techniques such as subdividing the data and processing one subdivision at a time, this is not possible for some operations requiring global information, such as performing a -projection that involves the inversion of a global mass matrix. In such cases, the memory requirements might force the user to perform post-processing using a number of processing nodes similar to that used for the simulation.

To aid in dealing with this issue, Nektar++ now supports processing the solution in situ during the simulation. The implementation of this feature was facilitated by the modular structure of our post-processing tool, FieldConvert. This tool uses a pipeline of modules, passing mesh and field data between them, to arrive at a final output file. This comprises one or more input modules (to read mesh and field data), zero or more processing modules (to manipulate the data, such as calculating derived quantities or extracting boundary information), and a single output module (to write the data in one of a number of field and visualisation formats). To achieve in situ processing, FieldConvert modules were moved to a new library (FieldUtils), allowing them to be executed during the simulation as well as shared with the FieldConvert utility. The actual execution of the modules during in situ processing is performed by a new subclass of the Filter class, which is called periodically after a prescribed number of time-steps to perform operations which do not modify the solution field. This filter structure allows the user to choose which modules should be used and to set configuration parameters. Multiple instances of the filter can be used if more than one post-processing pipeline is desired.

There are many example applications for this new feature. The most obvious is to generate a field or derived quantity, such as vorticity, as the simulation is running. An example of this is given in the supplementary materials Example 16, in which the vorticity is calculated every 100 timesteps whilst removing the velocity and pressure fields to save output file space, using the following FILTER configuration in the session file:

1  <FILTER TYPE="FieldConvert">
2  <PARAM NAME="OutputFile"> vorticity.vtu </PARAM>
3  <PARAM NAME="OutputFrequency"> 100 </PARAM>
4  <PARAM NAME="Modules">
5    vorticity
6    removefield:fieldname=u,v,p
7  </PARAM>
8</FILTER> 

This yields a number of parallel-format VTU files that can be assembled to form an animation. Other example applications include extracting slices or isocontours of the solution at several time instants for creating an animation. Since the resulting files are much smaller than the complete solution, there are significant savings in terms of storage when compared to the traditional approach of obtaining checkpoints which are later post-processed. Another possibility is to perform the post-processing operations after the last time-step, but before the solver returns. This way, it is possible to avoid the necessity of starting a new process which will have to load the solution again, leading to savings in computing costs.

3.3 Collective linear algebra operations

One of the primary motivations for the use of high-order methods is their ability to outperform standard linear methods on modern computational architectures in terms of equivalent error per degree of freedom. Although the cost in terms of floating point operations (FLOPS) of calculating these degrees of freedom increases with polynomial order, the dense, locally-compact structure of higher-order operators lends itself to the current hardware environment, in which FLOPS are readily available but memory bandwidth is highly limited. In this setting, the determining factor in computational efficiency, or ability to reach peak performance of hardware, is the arithmetic intensity of the method; that is, the number of FLOPS performed for each byte of data transferred over the memory bus. Algorithms need to have high arithmetic intensity in order to fully utilise the computing power of modern computational hardware.

However, the increase in FLOPS at higher polynomial orders must be balanced so that execution times are not excessively high. An observation made early in the development of spectral element methods is that operator counts can be substantially reduced by using a combination of a tensor product basis, together with a tensor contraction technique referred to as sum-factorisation. This technique, exploited inside of Nektar++ as well as other higher-order frameworks such as deal.II BaHaKa07 , uses a small amount of temporary storage to reduce operator counts from to at a given order . For example, consider a polynomial interpolation on a quadrilateral across a tensor product of quadrature points , where the basis admits a tensor product decomposition . This expansion takes the form

By precomputing the bracketed term and storing it for each and , we can reduce the number of floating point operations from to . One of the distinguishing features of Nektar++ is that these types of basis functions are defined not only for tensor-product quadrilaterals and hexahedra, but also unstructured elements (triangles, tetrahedra, prisms and pyramids) through the use of a collapsed coordinate system and appropriate basis functions. For more details on this formulation, see KaSh05 .

The efficient implementation of the above techniques on computational hardware still poses a significant challenge for practitioners of higher-order methods. For example, Nektar++ was originally designed using a hierarchical, inheritance-based approach, where memory associated with elemental degrees of freedom is potentially scattered non-contiguously in memory. Although this was more appropriate at the initial time of development a decade ago, in modern terms this does not align with the requirements for optimal performance, in which large blocks of memory should be transferred and as many operations acted on sequentially across elements, so as to reduce memory access and increase data locality and cache usage. The current efforts of the development team are therefore focused on redesigns to the library to accommodate this process. In particular, since version 4.1, Nektar++ has included a library called Collections which is designed to provide this optimisation. In the hierarchy of Nektar++ libraries, Collections sits between LocalRegions, which represent individual elements, and MultiRegions, which represent their connection in either a or discontinuous Galerkin setting. The purpose of the library, which is described fully in moxey-2016b , is to facilitate optimal linear algebra strategies for large groupings of elements that are of the same shape and utilise the same basis. To facilitate efficient execution across a broad range of polynomial orders, we then consider a number of implementation strategies including:

  • StdMat: where a full-rank matrix of the operator on a standard element is constructed, so that the operator can be evaluated with a single matrix-matrix multiplication;

  • IterPerExp: where the sum-factorisation technique is evaluated using an iteration over each element, but geometric factors (e.g. ) are amalgamated between elements; and

  • SumFac: where the sum-factorisation technique is evaluated across multiple elements concurrently.

This is then combined with an autotuning strategy, run at simulation startup, which attempts to identify the fastest evaluation strategy depending on characteristics of the computational mesh and basis parameters. Autotuning can be enabled in any simulation through the definition of an appropriate tag inside the NEKTAR block that defines a session file:

1  <COLLECTIONS DEFAULT="auto" />

A finer-grained level of control over the Collections setup and implementation strategies is documented in the user guide. Performance improvements using collections are most readily seen in fully-explicit codes such as the CompressibleFlowSolver and AcousticSolver. The vortex pair example defined in Section 5.2 and provided in Example 15 demonstrates the use of the collections library.

3.4 Solver coupling

The Nektar++ framework was extended with a coupling interface Lackhove2018 that enables sending and receiving arbitrary variable fields at run time. Using such a technique, a coupling-enabled solver can exchange data with other applications to model multi-physics problems in a co-simulation setup. Currently, two coupling interfaces are available within Nektar++; a file-based system for testing purposes, and an MPI-based implementation for large-scale HPC implementations. The latter was designed to facilitate coupling Nektar++ solvers with different software packages which use other discretization methods and operate on vastly different time- and length-scales. Coupling is achieved by introducing an intermediate expansion, which uses the same polynomial order and basis definitions as the parent Nektar++ solver; however, a continuous projection and a larger number of quadrature points than the original expansion of the Nektar++ solver are used. Based on this intermediate representation, the coupling strategy is comprised of three major steps:

  • Step 1: The field values are requested from the sending application at the intermediate expansion’s quadrature points. Here, aliasing can be effectively avoided by an appropriate selection of quadrature order and distribution. Point values that lie outside of the senders’ computational domain can be either replaced by a default value or extrapolated from their available nearest neighbour.

  • Step 2: The physical values at the quadrature points are then transformed into modal space. This is achieved by a modified forward transform that involves the differential low-pass filter Germano1986a :

    (3)

    where denotes the received field, the filtered field and the user specified filter width. The filter removes small scale features a priori and thus reduces the error associated with the transform. Moreover, it does not add unwanted discontinuities at the element boundaries and imposes a global smoothing, due to the continuity of the intermediate expansion.

  • Step 3: A linear interpolation in time can be performed to overcome larger time scales of the sending application. Due to their identical expansion bases and orders, the resulting coefficients can be directly used in the original expansion of the solver.

As is evident from the above strategy, sending fields to other solvers only requires an application to provide discrete values at the requested locations. In Nektar++, this can be achieved by evaluating the expansions or by a simpler approximation from the immediately available quadrature point values. All processing is performed by the receiver. The complex handling of data transfers is accomplished by the open-source CWIPI library Refloch2011 , which enables coupling of multiple applications by using decentralized communication. It is based purely on MPI calls, has bindings for C, Fortran and Python, handles detection of out-of-domain points and has been shown to exhibit good performance Duchaine2015 . With only CWIPI as a dependency and a receiver-centric strategy that can be adjusted to any numerical setup, the implementation of compatible coupling interfaces is relatively straightforward.

1.5E8
5E7
0
-5E7
-1.5E8
Figure 3: Instantaneous acoustic source term as represented in CFD (proprietary finite volume flow solver with mesh) and CAA (Nektar++ AcousticSolver with mesh and fourth order expansion). Slice through a three-dimensional domain.

An example result of a transferred field is given in Figure 3. For a hybrid noise simulation Lackhove2018 , the acoustic source term depicted at the top was computed by a proprietary, finite volume flow solver on a high-resolution mesh () and transferred to the Nektar++AcousticSolver, which we describe in Section 5.2. After sampling, receiving, filtering, projection and temporal interpolation, the extrema of the source term are cancelled out and blurred by the spatial filter. Consequently, a much coarser mesh () with a fourth order expansion is sufficient for the correct representation of the resulting field, which significantly reduces the computational cost of the simulation. The corresponding loss of information is well defined by the filter width and limited to the high-frequency range, which is irrelevant for the given application.

3.5 Python interface

Although Nektar++ is designed to provide a modern C++ interface to high-order methods, its use of complex hierarchies of classes and inheritance, as well as the fairly complex syntax of C++

itself, can lead to a significant barrier to entry for new users of the code. At the same time, the use of Python in general scientific computing applications, and data science application areas in particular, is continuing to grow, in part due to its relatively simple syntax and ease of use. Additionally, the wider Python community offers a multitude of packages and modules to end users, making it an ideal language through which disparate software can be ‘glued’ to perform very complex tasks with relative ease. For the purposes of scientific computing codes, the Python

C API also enables the use of higher-performance compiled code, making it suitable in instances where interpreted pure Python would be inefficient and impractical, as can be seen using packages such as NumPy and SciPy. These factors therefore make Python an ideal language through which to both introduce new users to a complex piece of software, interact with other software packages and, at the same time, retain a certain degree of performance that would not be possible from a purely interpreted perspective.

The Version 5.0 release of Nektar++ offers a set of high-level bindings for a number of classes within the core Nektar++ libraries. The purpose of these bindings is to significantly simplify the interfaces to key Nektar++ libraries, offering both a teaching aide for new users to the code, as well as a way to connect with other software packages and expand the scope of the overall software. To achieve this, we leverage the Boost.Python package abrahams2003building , which offers a route to handling many of the complexities and subtleties of translating C++ functions and classes across to the Python C API. A perceived drawback of this approach is the lack of automation. As Boost.Python is essentially a wrapper around the Python C API, any bindings must be handwritten, whereas other software such as f2py peterson2009f2py or SWIG beazley1996swig offer the ability to automatically generate bindings from the C++ source. However, our experience of this process has been that, other than implementation effort, handwritten wrappers provide higher quality and more stability, particularly when combined with an automated continuous integration process as is adopted in Nektar++, as well as better interoperability with key Python packages such as numpy. In our particular case, heavy use of C++11 features such as shared_ptr and the Nektar++ Array class for shared storage meant that many automated solutions would not be well-suited to this particular application.

An example of the Python bindings can be seen in Listing 1, where we perform the Galerkin projection of the smooth function onto a standard quadrilateral expansion at order , using Gauss-Lobatto-Legendre quadrature points to exactly integrate the mass matrix. We additionally perform an integral of this function (whose exact value is ). As can be seen in this example, the aim of the bindings is to closely mimic the layout and structure of the C++ interface, so that they can be used as a learning aide for to the full C++ API. Additionally, the Python bindings make full use of Boost.Python’s automatic datatype conversion facilities. In particular, significant effort has been extended to facilitate seamless interaction between the NumPy.ndarray class, which is almost universally used in Python scientific computing applications for data storage, and the Nektar++ storage Array<OneD, *> classes. This allows an ndarray to be passed into Nektar++ functions directly and vice versa. Moreover this interaction uses the Boost.Python interface to NumPy to ensure that instead of copying data (which could be rather inefficient for large arrays), this interaction uses a shared memory space between the two data structures. Reference counting is then used to ensure data persistence and memory deallocation, depending on whether memory was first allocated within the C++ environment or Python.

  import NekPy.LibUtilities as LibUtil
import NekPy.StdRegions as StdReg
import numpy as np
# Set P = 8 modes and Q = P + 1 quadrature points.
nModes = 8
nPts   = nModes + 1
# Create GLL-distributed quadrature points.
pType  = LibUtil.PointsType.GaussLobattoLegendre
pKey   = LibUtil.PointsKey(nPts, pType)
# Create modified C^0 basis on these points.
bType  = LibUtil.BasisType.Modified_A
bKey   = LibUtil.BasisKey(bType, nModes, pKey)
# Create quadrilateral expansion using this basis
# in each coordinate direction (tensor product).
quad   = StdReg.StdQuadExp(bKey, bKey)
# L^2 projection of f(x,y) = cos(x)*cos(y) onto the
# quadrilateral element. Note x,y are numpy ndarrays
# and evaluation of cos() is performed using numpy.
x, y   = quad.GetCoords()
fx     = np.cos(x) * np.cos(y)
proj   = quad.FwdTrans(fx)
# Integrate function over the element.
print("Integral = {:.4f}".format(quad.Integral(fx)))
Listing 1: Using the Nektar++ 5.0 Python bindings to perform a simple Galerkin projection and integral on a standard quadrilateral element.

4 Developments in Numerical Methods

This section highlights our recent developments on numerical methods contained with the Nektar++ release.

4.1 Method of moving frames

(a) Conservational laws
(b) Diffusion equation
(c) Shallow water equations
(d) Maxwell’s equations
Figure 4: Numerical simulation of the MMF scheme in Nektar++ for several partial differential equations solved on the sphere.

Modern scientific computation faces unprecedented demands on computational simulation in multidimensional complex domains composed of various materials. Examples of this include solving shallow water equations on a rotating sphere for weather prediction, incorporating biological anisotropic properties of cardiac and neural tissue for clinical diagnosis, and simulating the electromagnetic wave propagation on metamaterials for controlling electromagnetic nonlinear phenomena. All of these examples require the ability to solve PDEs on manifolds embedded in higher-dimensional domains. The method of moving frames (MMF) implemented in Nektar++ is a novel numerical scheme for solving such computational simulations in geometrically-complex domains.

Moving frames, principally developed by Élie Cartan in the context of Lie groups in the early 20th century Cartan

, are orthonormal vector bases that are constructed at every grid point in the discrete space of a domain

. Building such moving frames is easily achieved by differentiating the parametric mapping of a domain element with respect to each coordinate axis of a standard reference space, followed by a Gram-Schmidt orthogonalization process. We obtain orthonormal vector bases, denoted as , with the following properties:

where denotes the Kronecker delta. Moreover, the moving frames are constructed such that they are differentiable within each element and always lie on the tangent plane at each grid point. These two intrinsic properties of frames implies that any vector or the gradient can be expanded on moving frames as follows:

Applying this expansion to a given PDE enables us to re-express it with moving frames on any curved surface. Then, the weak formulation of the PDE with moving frames, called the MMF scheme, on a curved surface is similar to the scheme in the Euclidean space, in the sense that it contains no metric tensor or its derivatives and it does not require the construction of a continuous curved axis in which often produces geometric singularities. This is a direct result of the fact that moving frames are locally Euclidean. However, the numerical scheme with moving frames results in the accurate solutions of PDEs on any types of surfaces such as spheres, irregular surfaces, or non-convex surfaces. Some examples of simulations that can be achieved under this approach include conservational laws MMF1 , the diffusion equation MMF2 , shallow water equations (SWEs) MMF3 , and Maxwell’s equations MMF4 . Representative results from Nektar++ for these equations on the surface of a sphere are shown in Figure 4.

Moreover, moving frames have been proven to be efficient for other geometrical realizations, such as the representation of anisotropic properties of media on complex domains MMF2 , incorporating the rotational effect of any arbitrary shape MMF3 , and adapting isotropic upwind numerical flux in anisotropic media MMF4 . The accuracy of the MMF scheme with the higher-order curvilinear meshes produced by NekMesh, described in Section 5.1, is reported to be significantly improved for a high and conservational properties such as mass and energy after a long time integration, whereas the accuracy of the MMF-SWE scheme on NekMesh is presented to be the best among all the previous SWE numerical schemes MMF5 . Ongoing research topics on moving frames are to construct the connections of frames, to compute propagational curvature, and finally to build Atlas in order to provide a quantitative measurement and analysis of a flow on complex geometry, such as the electrical activation in the heart MMF6 , fiber tracking of white matter in the brain, and many more.

4.2 Spatially-variable polynomial order

An important difficulty in the simulation of flows of practical interest is the wide range of length- and time-scales involved, especially in the presence of turbulence. This problem is aggravated by the fact that in many cases it is difficult to predict where in the domain an increase in the spatial resolution is required before conducting the simulation, while performing a uniform refinement across the domain is computationally prohibitive. Therefore, in dealing with these types of flows, it is advantageous to have an adaptive approach which allows us to dynamically adjust the spatial resolution of the discretisation both in time and in space.

Within the spectral/ element framework, it is possible to refine the solution following two different routes. -refinement consists of reducing the size of the elements as would be done in low-order methods. This is the common approach for the initial distribution of degrees of freedom over the domain, with the computational mesh clustering more elements in regions where small scales are expected to occur, such as boundary layers. The other route is -refinement (sometimes called -enrichment), where the spatial resolution is increased by using a higher polynomial order to represent the solution. As discussed in nekpp-icosahom2016 , the polynomial order can be easily varied across elements in the spectral/ element method if the expansion basis is chosen appropriately. This allows for a simple approach to performing local refinement of the solution, requiring only the adjustment of the polynomial order in each element.

With this in mind, an adaptive polynomial order procedure has been implemented in Nektar++, with successful applications to simulations of incompressible flows. The basic idea in this approach is to adjust the polynomial order of degree

during the solution based on an estimate of the spatial discretisation error, using an approach similar to that demonstrated for shock capturing in 

PePe2006 . After each time-steps, this estimate is evaluated for each element. For elements where the estimate of the error is above a chosen threshold, is incremented by one, whereas in elements with low error is decremented by one, respecting minimum and maximum values for . The choice of is critical for the efficiency of this scheme, since it has to be sufficiently large to compensate for the costs of performing the refinement over a large number of time-steps, yet small enough to adjust to changes in the flow. More details on this adaptive procedure for adjusting the polynomial order, as well as its implementation in both CG and DG regimes, are found in nekpp-icosahom2016 .

An example of an application of the adaptive polynomial order procedure is presented in Figure 5, showing the spanwise vorticity and polynomial order distributions for a quasi-3D simulation of the incompressible flow around a NACA 0012 profile at Reynolds number and angle of attack . The session files to generate this data can be found in Example 17. It is clear that the regions with transition to turbulence and the boundary layers are resolved using the largest polynomial order allowed, while regions far from the airfoil use a low polynomial order. This way, the scheme succeeds in refining the discretisation in the more critical regions where small scales are present, without incurring in the large computational costs that would be required to uniformly increase the polynomial order. More simply stated, it is possible to specify different polynomial order in the quadrilateral elements (typically used in boundary layer discretisation) and the triangle elements (typically used to fill the outer volume). As a final point, we note that the use of variable polynomial order is not limited to quasi-3D simulations; both CG and DG discretisations fully support all element shape types in 2D and 3D, with parallel implementations (including frequently used preconditioners) also supporting this discretisation option.

(a) Spanwise vorticity

(b) Polynomial order
Figure 5: Polynomial order and vorticity distributions obtained with simulation using adaptive polynomial order for the flow around a NACA 0012 profile with and . Taken from nekpp-icosahom2016 .

4.3 Global mapping

Even though the spectral/ element spatial discretization allows us to model complex geometries, in some cases it can be advantageous to apply a coordinate transformation for solving problems that lie in a coordinate system other than the Cartesian frame of reference. This is typically the case when the transformed domain possesses a symmetry; this allows us to solve the equations more efficiently by compensating for the extra cost of applying the coordinate transformation. Examples of this occur when a transform can be used to map a non-homogeneous geometry to a homogeneous geometry in one or more directions. This makes it possible to use the cheaper quasi-3D approach, where this direction is discretized using a Fourier expansion, and also for problems with moving boundaries, where we can map the true domain to a fixed computational domain, avoiding the need for recomputing the system matrices after every time-step.

The implementation of this method was achieved in two parts. First, a new library called GlobalMapping was created, implementing general tensor calculus operations for several types of transformations. Even though it would be sufficient to consider just a completely general transformation, specific implementations for particular cases of simpler transformations are also included in order to improve the computational efficiency, since in these simpler cases many of the metric terms are trivial. In a second stage, the incompressible Navier-Stokes solver was extended, using the functionality of the GlobalMapping library to implement the methods presented in SeMeSh2016 . Some examples of applications are given in SeMeSh2017a ; SeMeSh2017b . Embedding these global mappings at the library level allows similar techniques to be introduced in other solvers in the future.

Figure 6 presents an example of the application of this technique, showing the recirculation regions (i.e. regions where the streamwise velocity is negative) for the flow over a wing with spanwise waviness. In this case, the coordinate transformation removes the waviness from the wing, allowing us to treat the transformed geometry with the quasi-3D formulation. It is important to note that this technique becomes unstable as the waviness amplitude becomes too large. However, in cases where it can be successfully applied, it leads to significant gains in terms of computational cost when compared against a fully 3D implementation. The session files used in this example can be found in Example 18.

Figure 6: Time-averaged recirculation regions for incompressible flow over a wing with spanwise waviness with and .

5 Applications

In this section, we demonstrate some of the new features provided in our new release, with a focus on application areas.

5.1 NekMesh

In the previous publication nektarpp2015 , we briefly outlined the MeshConvert utility, which was designed to read various external file formats and perform basic processing and format conversion. In the new release of Nektar++, MeshConvert has been transformed into a new tool, called NekMesh, which provides a series of tools for both the generation of meshes from an underlying CAD geometry, as well as the manipulation of linear meshes to make them suitable for high-order simulations. While MeshConvert was dedicated to the conversion of external mesh file formats, the scope of NekMesh has been significantly broadened to become a true stand-alone high-order mesh generator.

The generation of high-order meshes in NekMesh follows an a posteriori approach proposed in Peiro1 . We first generate a linear mesh using traditional low-order mesh generation techniques. We then add additional (high-order) nodes along edges, faces and volumes to achieve a high-order polynomial discretisation of our mesh. In this bottom-up procedure, high-order nodes are first added on edges then faces and finally volumes. At each step, nodes are generated on boundaries to ensure a geometrically accurate representation of the model.

Much of the development of NekMesh has focused on the access to a robust CAD system for CAD queries required for traditional meshing operations. Assuming that the CAD is watertight, we note that only a handful of CAD operations are required for mesh generation purposes. NekMesh therefore implements a lightweight wrapper around these CAD queries, allowing it to be interfaced to a number of CAD systems. By default, we provide an interface to the open-source OpenCASCADE library OpenCascadeSAS2019 . OpenCASCADE is able to read the STEP CAD file format, natively exported by most CAD design tools, and load it into the system. At the same time, the use of a wrapper means that users and developers of NekMesh are not exposed to the extensive OpenCASCADE API. Although OpenCASCADE is freely available and very well suited to simple geometries, it lacks many of the CAD healing tools required for more complex geometries of the type typically found in industrial CFD environments, which can frequently contain many imperfections and inconsistencies. However, the use of a lightweight wrapper means that other commerical CAD packages can be interfaced to NekMesh if available. To this end, we have implemented a second CAD interface to the commerical CFI CAD engine, which provides a highly robust interface and is described further in references Marcon2018 ; Marcon2019 ; Turner2017 .

While users are recommended to create their CAD models in a dedicated CAD software, export them in STEP format and load them in NekMesh, they also have the possibility to create their own simple two-dimensional models using one of two tools made available to them. The first tool is an automatic NACA aerofoil generator. With just three inputs – a NACA code, an angle of attack and dimensions of the bounding box – a geometry is generated and passed to the meshing software. An example is shown in Figure 7 of a mesh generated around a NACA 0012 aerofoil at an angle of attack of .

Figure 7: Example of mesh generated around a NACA 0012 aerofoil.

The other tool is based on the GEO geometry file format of the Gmsh Geuzaine2009 open source mesh generator. The GEO format is an intepreter for the procedural generation of geometries. NekMesh has been made capable to understand basic GEO commands, which gives the possibility to generate simple two-dimensional geometries. An example is shown in Figure 8 of a mesh generated around a T106C turbine blade: the geometry was created using a GEO script of lines and splines.

Figure 8: Example of mesh generated around a t106c turbine blade.

An important contribution of NekMesh to the high-order mesh generation community was presented in references MoHaPeSh13 ; MoHaPeSh14 . High-order mesh generation is highly sensitive to boundary curvature. After high-order nodes are projected onto CAD boundaries, the quality and validity of the mesh may be compromised due to self-intersecting elements. One approach to alleviate this risk is the creation of a coarse boundary layer mesh with edges orthogonal to the boundary. The thickness of the layer of elements gives enough room for a valid curving of elements. After creation of the high-order mesh, a splitting of these boundary elements can be performed using the isoparametric mapping between the reference space and the physical space. This ensures conservation of the validity and quality of subdivided elements while achieving very fine meshes. An example is shown in Figure 9 where the coarse boundary layer mesh of Figure 7 was split in layers with a geometric progression of in the thickness of each layer. The session files used to create the meshes for Figure 7 and 8 can be found in Example 20.

Figure 9: Example of split boundary layer mesh.

Another approach to avoid invalid or low quality high-order elements is to optimise the location of high-order nodes in the mesh. The approach proposed in turner2018curvilinear ; Turner2017 of a variational framework for high-order mesh optimisation was implemented in NekMesh. The process aims at minimising a functional of the deformation of the high-order element by moving high-order nodes individually. The approach is scalable and can use different formulations for the functional.

5.2 Acoustic solver

Time-domain computational aeroacoustics simulations are commonly used to model noise emission over wide frequency ranges or to augment flow simulations in hybrid setups. Compared with fully compressible flow simulations, they require less computational effort due to the reduced complexity of the governing equations and larger length scales Colonius2004 . However, due to the small diffusive terms, as well as the long integration times and distances required for these simulations, highly accurate numerical schemes are crucial for stability Tam2006 . This numerical accuracy can be provided by spectral/ element methods, even on unstructured meshes in complex geometries, and hence Nektar++ provides a good candidate framework on which to build such an application code.

The latest release of Nektar++ includes a new AcousticSolver, which implements several variants of aeroacoustic models. These are formulated in a hyperbolic setting and implemented in a similar fashion to the compressible Euler and Navier-Stokes equations, encapsulated in Nektar++ inside the CompressibleFlowSolver. Following this implementation guideline, the AcousticSolver uses a discontinuous Galerkin spatial discretisation with modal or nodal expansions to model time-domain acoustic wave propagation in one, two or three dimensions. It implements the operators of the linearized Euler Equations (LEE) and the Acoustic Perturbation Equations 1 and 4 (APE-1/4) Ewert2003 , both of which describe the evolution of perturbations around a base flow state. For the APE-1/4 operator, the system is defined by the hyperbolic equations

(4a)
(4b)

where denotes the flow velocity, its density, its pressure and corresponds to the speed of sound. The quantities and refer to the irrotational acoustic perturbation of the flow velocity and its pressure, with overline quantities such as denoting the time-averaged mean. The right-hand-side acoustic source terms terms and are specified in the session file. This allows for the implementation of any acoustic source term formulation so that, for example, the full APE-1 or APE-4 can be obtained. In addition to using analytical expressions, the source terms and base flow quantities can be read from disk or transferred from coupled applications, enabling co-simulation with a driving flow solver. Both, LEE and APE support non-quiescent base flows with inhomogeneous speed of sounds. Accordingly, the Lax-Friedrichs and upwind Riemann solvers used in the AcousticSolver

employ a formulation which is valid even for strong base flow gradients. The numerical stability can be further improved by optional sponge layers and suitable boundary conditions, such as rigid wall, farfield or white noise.

A recurring test case for APE implementations is the “spinning vortex pair” Muller1967 . It is defined using two opposing vortices, that are each located at from the center of a square domain with edge length . The vortices have a circulation of and rotate around the center at the angular frequency and circumferential Mach number . The resulting acoustic pressure distribution is shown in Figure 9(a) and was obtained on an unstructured mesh of 465 quadrilateral elements with a fifth order modal expansion (). The session files used to generate this example can be found in Example 19. Along the black dashed line, the acoustic pressure shown in Figure 9(b) exhibits minor deviations from the analytical solution defined in Muller1967 , but is in excellent agreement with the results of the original simulation in Ewert2003 . The latter was based on a structured mesh with 19,881 nodes and employed a sponge layer boundary condition and spatial filtering to improve the stability. Due to the flexibility and numerical accuracy of the spectral/ method, a discretization with only 16,740 degrees of freedom was sufficient for this simulation, and no stabilization measures (e.g. SVV or filtering) were necessary to reproduce this result.

-5E-4
5E-4
-2E-4
2E-4
0
[-]
(a) Normalized acoustic pressure distribution at with the mesh shown in light gray and the sampling line in a black dashed line.
0
50
100
0
-1
-2
-3
1
2
3
1E-4
Analytical
AcousticSolver
[-]
(b) Normalized acoustic pressure along sample line, obtained with the AcousticSolver and analytical solution Muller1967 .
Figure 10: Normalized acoustic pressure for and at .

5.3 Fluid-Structure Interaction (FSI) and Vortex-Induced Vibration (VIV)

Fluid-structure interaction (FSI) modelling poses a great challenge for the accurate prediction of vortex-induced vibration (VIV) of long flexible bodies, as the full resolution of turbulent flow along their whole span requires considerable computational resources. This is particularly true in the case of large aspect-ratio bodies. Although 2D strip-theory-based modelling of such problems is much more computationally efficient, this approach is unable to resolve the effects of turbulent fluctuations on dynamic coupling of FSI systems ChBe05 ; WiGr01 . A novel strip model, which we refer to as ‘thick’ strip modelling, has been developed using the the Nektar++ framework in bao2016generalized , whose implementation is supported within the incompressible Navier-Stokes solver. In this approach, a three-dimensional DNS model with a local spanwise scale is constructed for each individual strip. Coupling between strips is modelled implicitly through the structural dynamics of the flexible body.

In the ‘thick’ strip model, the flow dynamics are governed by a series of incompressible Navier-Stokes equations. The governing equations over a general local domain associated with the -th strip are written as

(5)
(6)

where the vector denotes the fluid velocity inside the -th strip, with being the corresponding dynamic pressure and the Reynolds number, which we assume to be constant across all strips. The governing equations are supplemented by boundary conditions of either Dirichlet or Neumann type. In particular, no-slip boundary conditions are applied to the wall of the body, and the velocity of the moving wall is imposed and determined from the transient solution of structural dynamics equations of motion. A linearized tensioned beam model is used to describe the structural dynamic behavior of the flexible body, which is expressed by the system

(7)

In the above, is the structural mass per unit length, is the structural damping per unit length, is the tension and is the flexural rigidity. denotes the vector of hydrodynamic force per unit length exerted on the body’s wall and is the structural displacement vector.

Homogeneity is imposed in the spanwise direction to the local flow within individual strips, under the assumption that the width of the strips is much shorter with respect to the oscillation wavelength of excited higher-order modes of the flexible body. This therefore enables the use of the computationally-efficient quasi-3D approach discussed in previous sections within each strip domain, in which two-dimensional spectral elements with piecewise polynomial expansions are used in the plane and Fourier expansions are used in the homogeneous

direction. This also requires the assumption of a spanwise oscillation of the flexible body with respect to its full-length scale. As a consequence, the motion variables and fluid forces are expressed as a complex Fourier series, and the tensioned beam model is decoupled into a set of ordinary differential equations, which can be solved simply by a second-order Newmark-

method NeKa97 . A partitioned approach is adopted to solve the coupled FSI system, in which coordinate mapping technique discussed in Section 4.3 is implemented for the treatment of the moving wall SeMeSh2016 .

To illustrate the application of this modelling approach, VIV of a long flexible cylinder with an aspect ratio of which is pinned at both ends is simulated at , with 16 thick strips allocated evenly along the axial line of the cylinder. The instantaneous spanwise wake structure is visualized by the vortex-identification of Q-criterion in Figure 11. As the figure demonstrates, the distribution of vortex shedding illustrates that a second harmonic mode is excited along the spanwise length and the turbulent structure is captured well in the local domain of the strips. This emphasises the convincing advantage of providing highly-resolved description of hydrodynamics involved in the FSI process. The session files used to run this simulation can be found in Example 21.

Figure 11: Instantaneous vortex shedding visualized by the vortex-identification of Q-criterion (iso-surfaces of the Q-value= ) in body-fitted coordinates: (a) full domain view and zoom-in view of (b) first strip; (c) fourth strip and (d) 16th strip.

5.4 Aeronautical applications

CFD is now an indispensable tool for the design of aircraft engines, and it has become commonplace in the design guidance of new technologies and products Laskowski2016 . In order for CFD to be effectively adopted in industry, validation and verification is required over a broad design space. This is challenging for a number of reasons, including the range of operating conditions (i.e. Reynolds numbers, Mach numbers, temperatures and pressures), the complexity of industrial geometries (including uncertainty due to manufacturing variations) and their relative motion (i.e. rotor-stator interactions). Even though RANS continues to be the backbone of CFD-based design, the recent development of high-order unstructured solvers and high-order unstructured meshing algorithms, combined with the lowering cost of HPC infrastructures, has the potential to allow for the introduction of high-fidelity transient simulations using large-eddy or direct numerical simulations (LES/DNS) in the design loop, taking the role of a virtual wind tunnel.

As part of our effort to bridge the gap between academia and industry, we have been developing the expertise to analyse turbomachinery cases of industrial interest using Nektar++. A key problem to overcome in these cases is the sensitivity of these simulations to variations in spatial resolution, which requires the use of stabilisation techniques in order to improve robustness. Nektar++ makes use of the spectral vanishing viscosity (SVV) method, originally introduced for the Fourier spectral method by Tadmor tadmor1989convergence . SVV is a model-free stabilization technique that acts at the subgrid-scale level and allows for exponential convergence properties to be conserved in sufficiently resolved simulations. Recent developments in this area have focused on a new SVV kernel by Moura et al. mengaldo2018spatial_a , which replicates the desirable dispersion and diffusion properties of DG schemes and does not require the manual tuning of parameters found in the classical SVV formulation. More specifically, the dissipation curves of the CG scheme of order were compared to those of DG order , and the DG kernel was determined from minimization of the point-wise norm between these curves. SVV stabilization is combined with spectral/hp dealiasing mengaldo2015dealiasing to eliminate errors arising from the integration of non-linear terms.

A T106A low pressure turbine vane was investigated at moderate regime (), and the convergence properties of the main flow statistics were extensively explored with the aim of developing a set of best practices for the use of spectral/hp element methods as a high-fidelity digital twin Cassinelli2018 . The velocity correction scheme of KaIsOr91 implemented in the IncNavierStokesSolver is adopted, using the quasi-3D approach discussed in the previous sections and Taylor-Hood type elements in 2D (where spaces of order polynomials on each element are used for the velocity components, and for pressure). Uniform inflow velocity is combined with pitchwise periodicity and high-order outflow boundary conditions DoKaCh14 . Numerical stability is ensured by employing SVV with the DG kernel in the - planes, and the traditional exponential kernel for the spanwise Fourier direction. A representation of the vortical structures is shown in Figure 12: transition to turbulence takes place only in the final portion of the suction surface, where the separated shear layer rolls up due to Kevin-Helmoltz instability. The separation bubble remains open and merges into the trailing edge wake, giving rise to large-scale vortical structures. This work was conducted with clean inflow conditions to isolate the effect of the numerical setup on the various flow statistics. However, turbomachinery flows are highly turbulent: subsequent work focused on the treatment of flow disturbances to reproduce more accurately a realistic environment Cassinelli2019 . With this aim, a localised synthetic momentum forcing was introduced in the leading edge region to cause flow symmetry breakdown on the suction surface, and promote anticipated transition to turbulence. This approach yields an improvement in the agreement with experimental data, with no increase in the computational cost.

Figure 12: Instantaneous isosurfaces of -criterion () contoured by velocity magnitude, showing the vortical structures evolving on the suction surface and in the wake of a T106A cascade. The computational domain is replicated in the spanwise and pitchwise directions for visual clarity.

With the intent of being able to tackle cases in which compressibility effects are not negligible, there has been an effort in validating the CompressibleFlowSolver for shock-wave boundary layer interaction (SWBLI) configurations. This solver, described in our previous publication nektarpp2015 , formulates the compressible Navier-Stokes equations in their conservative form, discretised using a DG scheme and explicit timestepping methods. In order to regularize the solution in the presence of discontinuities, the right hand side of the Navier-Stokes equations is augmented with a Laplacian viscosity term of the form , where is the vector of conserved variables, and is a spatially-dependent diffusion term that is defined on each element as

Here, is a constant, is the maximum local characteristic speed, is a reference length of the element, its polynomial order, and a discontinuity sensor value using the formulation of PePe2006 . To benchmark this approach in the context of SWBLI problems, we consider a laminar problem studied experimentally and numerically in degrez1987interaction . Several authors have studied this SWBLI with slightly different free stream conditions; here we follow the physical parameters used by boin20063d , where we select a free-stream Mach number , shock angle , a stagnation pressure Pa, a stagnation temperature of , a Reynolds number (referred to the inviscid shock impingement location measured from the plate leading edge), and a Prandtl number . Unlike boin20063d , the leading edge is not included in the simulations. The inflow boundary is located at where the analytical compressible boundary layer solution of white2006viscous is imposed. The session files used in this example can be found in Example 22. At the inlet, the Rankine-Hugoniot relations that describe the incident shock are superimposed over the compressible boundary layer solution. At the top boundary we impose the constant states corresponding to inviscid post incident shock wave state. At the outlet in the subsonic part of the boundary layer a pressure outlet is imposed based on the inviscid post reflected state conditions. All boundary conditions are imposed in a weak sense through a Riemann solver, as described in mengaldo2014guide , and use a coarse grid of quadrilateral elements at order . For illustrative purposes, Figure 13 shows a snapshot of the Mach number field. For a more quantitative comparison, Figure 14 compares the skin friction coefficient with those from eckert1955engineering and boin20063d , which is in fair agreement with the results of boin20063d .

Figure 13: Mach number field of SWBLI test case ( quadrilateral elements, ); configuration based on degrez1987interaction .
Figure 14: Skin friction coefficient for the SWBLI test case: blue line Nektar++ ( quadrilateral elements, ); triangles are from boin20063d ; dotted line is empirical solution by eckert1955engineering .

6 Availability

Nektar++ is open-source software, released under the MIT license, and is freely available from the project website (https://www.nektar.info/). While the git repository is freely accessible and can be found at https://gitlab.nektar.info/, discrete releases are made at milestones in the project and are available to download as compressed tar archives, or as binary packages for a range of operating systems. These releases are considered to contain relatively complete functionality compared to the repository master branch.

7 Conclusions

In this paper, we have reviewed the latest features and enhancements of the Nektar++ version 5.0 release. A key theme of our work in this release has been to evolve the fundamental design of the software detailed in our previous publication nektarpp2015 , towards providing an enabling tool for efficient high-fidelity simulations in various scientific areas. To this end, this latest version of Nektar++ provides a complete pipeline of tools: from pre-processing with NekMesh and a new parallel I/O interface for mesh and field representations; new solvers and improvements to existing ones through numerical developments such as spatially variable polynomial order and the global mapping technique; to parallel post-processing and in-situ processing with the FieldConvert utility developments. This gives scientific end-users a tool to enable efficient high-fidelity simulations in a number of fields, such as the applications we discuss in Section 5.

Although this version represents a major milestone in the development of Nektar++, there is still clear scope for future work. A particular area of focus remains the efficient use of many-core CPU and GPU systems, recognising that optimisation and performance on an increasingly diverse range of hardware presents a major challenge. Initial research in this area has investigated the use of matrix-free methods as a potential route towards fully utilising computational hardware even on unstructured grids, by combining efficient sum factorisation techniques and the tensor-product basis for unstructured elements presented in ShKa95 . From the perspective of code maintainability, we have also investigated various performance-portable programming models in the context of mesh generation eichstadt-2018 and implicit solvers eichstadt-2019 . Looking towards the next major release of Nektar++, we envision the use of these studies as a guideline to implementing efficient operators for the spectral/ element method, whilst retaining ease of use for the devcelopment of increasingly efficient solvers.

Acknowledgements

The development of Nektar++ has been supported by a number of funding agencies including the Engineering and Physical Sciences Research Council (grants EP/R029423/1, EP/R029326/1 EP/L000407/1, EP/K037536/1, EP/K038788/1, EP/L000261/1, EP/I037946/1, EP/H000208/1, EP/I030239/1, EP/H050507/1, EP/D044073/1, EP/C539834/1), the British Heart Foundation (grants FS/11/22/28745 and RG/10/11/28457), the Royal Society of Engineering, European Union FP7 and Horizon 2020 programmes (grant nos. 265780, 671571 and 675008), McLaren Racing, the National Science Foundation (IIS-0914564, IIS-1212806 and DMS-1521748), the Army Research Office (W911NF-15-1-0222), the Air Force Office of Scientific Research and the Department of Energy. HX acknowledges support from the NSF Grants 91852106 and 91841303. SC acknowledges the support of the National Research Foundation of Korea (No. 2016R1D1A1A02937255). KL acknowledges the Seventh Framework Programme FP7 Grant No. 312444 and German Research Foundation (DFG) Grant No. JA 544/37-2.

Appendix A Supplementary material

File: user-guide.pdf

Figure 15: User guide for Nektar++ detailing compilation, installation, input format and usage, including examples.

File: insitu.zip

Figure 16: Nektar++ input files for simulating flow past a cylinder at using IncNavierStokesSolver. This example uses the HDF5 input format described in Section 3.1 and the in-situ processing facilities of Section 3.2 to generate an animation of the vorticity field and show the von Kármán vortex shedding in this regime.

File: adaptiveOrder.zip

Figure 17: Nektar++ input files for simulating flow over a NACA0012 wing at using IncNavierStokesSolver. This example uses an adaptive-in-time polynomial order described in Section 4.2 to increase efficiency of the simulation when compared to a spatially-constant polynomial order.

File: wavyWing.zip

Figure 18: Nektar++ input files for simulating flow over a wavy NACA0012 wing at using IncNavierStokesSolver. This example uses the mapping technique described in Section 4.3 to perform simulations in a quasi-3D setting.

File: acoustic.zip

Figure 19: Nektar++ input files for simulating the spinning vortex pair using the AcousticSolver of Section 5.2 at a polynomial order of , accelerated using the Collections library described in Section 3.3.

File: meshGen.zip

Figure 20: NekMesh input files for a NACA0012 aerofoil section and T106C turbine blade geometry outlined in Section 5.1.

File: vivCylFlow.zip

Figure 21: Nektar++ input files for simulating flow over a flexible cylinder using the IncNavierStokesSolver. This example uses the the thick strip model outlined in Section 5.3 to reduce computational cost against a full 3D simulation.

File: shockBL.zip

Figure 22: Nektar++ input files for simulating a shock boundary-layer interaction test case, at a Reynolds number , Mach number and shock angle as outlined in Section 5.4.

References

References

  • (1) R. C. Moura, G. Mengaldo, J. Peiró, S. Sherwin, On the eddy-resolving capability of high-order discontinuous Galerkin approaches to implicit LES/under-resolved DNS of Euler turbulence, Journal of Computational Physics 330 (2017) 615–623.
  • (2) R. C. Moura, G. Mengaldo, J. Peiró, S. J. Sherwin, An LES setting for DG-based implicit LES with insights on dissipation and robustness, in: Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2016, Springer, 2017, pp. 161–173.
  • (3) G. Mengaldo, R. Moura, B. Giralda, J. Peiró, S. Sherwin, Spatial eigensolution analysis of discontinuous Galerkin schemes with practical insights for under-resolved computations and implicit LES, Computers & Fluids 169 (2018) 349–364.
  • (4) G. Mengaldo, D. De Grazia, R. C. Moura, S. J. Sherwin, Spatial eigensolution analysis of energy-stable flux reconstruction schemes and influence of the numerical flux on accuracy and robustness, Journal of Computational Physics 358 (2018) 1–20.
  • (5) P. Fernandez, R. C. Moura, G. Mengaldo, J. Peraire, Non-modal analysis of spectral element methods: Towards accurate and robust large-eddy simulations, Computer Methods in Applied Mechanics and Engineering 346 (2019) 43–62.
  • (6) P. E. Vos, S. J. Sherwin, R. M. Kirby, From to efficiently: Implementing finite and spectral/ element methods to achieve optimal performance for low- and high-order discretisations, Journal of Computational Physics 229 (13) (2010) 5161–5181.
  • (7) C. D. Cantwell, S. J. Sherwin, R. M. Kirby, P. H. J. Kelly, From to efficiently: strategy selection for operator evaluation on hexahedral and tetrahedral elements., Computers & Fluids 43 (2011) 23–28.
  • (8) C. D. Cantwell, S. J. Sherwin, R. M. Kirby, P. H. J. Kelly, From to efficiently: selecting the optimal spectral/ discretisation in three dimensions, Math. Mod. Nat. Phenom. 6 (2011) 84–96.
  • (9)

    G. J. Gassner, A skew-symmetric discontinuous Galerkin spectral element discretization and its relation to SBP-SAT finite difference methods, SIAM Journal on Scientific Computing 35 (3) (2013) A1233–A1253.

  • (10) G. Mengaldo, D. De Grazia, P. E. Vincent, S. J. Sherwin, On the connections between discontinuous Galerkin and flux reconstruction schemes: extension to curvilinear meshes, Journal of Scientific Computing 67 (3) (2016) 1272–1292.
  • (11) G. Mengaldo, Discontinuous spectral/hp element methods: development, analysis and applications to compressible flows, Ph.D. dissertation (2015).
  • (12) G. E. Karniadakis, S. J. Sherwin, Spectral/hp element methods for CFD, Oxford University Press, 2005.
  • (13) M. Turner, J. Peiró, D. Moxey, Curvilinear mesh generation using a variational framework, Computer-Aided Design 103 (2018) 73–91.
  • (14) P. Fischer, J. Kruse, J. Mullen, H. Tufo, J. Lottes, S. Kerkemeier, Nek5000–open source spectral element cfd solver, Argonne National Laboratory, Mathematics and Computer Science Division, Argonne, IL, see https://nek5000. mcs. anl. gov/index. php/MainPage (2008).
  • (15) H. M. Blackburn, S. Sherwin, Formulation of a galerkin spectral element–fourier method for three-dimensional incompressible flows in cylindrical geometries, Journal of Computational Physics 197 (2) (2004) 759–778.
  • (16) W. Bangerth, R. Hartmann, G. Kanschat, deal.II–a general-purpose object-oriented finite element library, ACM Transactions on Mathematical Software (TOMS) 33 (4) (2007) 24.
  • (17) F. Hindenlang, G. J. Gassner, C. Altmann, A. Beck, M. Staudenmaier, C.-D. Munz, Explicit discontinuous Galerkin methods for unsteady problems, Computers & Fluids 61 (2012) 86–93.
  • (18) G. J. Gassner, A. R. Winters, D. A. Kopriva, Split form nodal discontinuous Galerkin schemes with summation-by-parts property for the compressible Euler equations, Journal of Computational Physics 327 (2016) 39–66.
  • (19) F. X. Giraldo, M. Restelli, A study of spectral element and discontinuous Galerkin methods for the Navier–Stokes equations in nonhydrostatic mesoscale atmospheric modeling: Equation sets and test cases, Journal of Computational Physics 227 (8) (2008) 3849–3877.
  • (20) D. S. Abdi, F. X. Giraldo, Efficient construction of unified continuous and discontinuous Galerkin formulations for the 3D Euler equations, Journal of Computational Physics 320 (2016) 46–68.
  • (21) F. Witherden, A. Farrington, P. Vincent, PyFR: An open source framework for solving advection–diffusion type problems on streaming architectures using the flux reconstruction approach, Computer Physics Communications 185 (2014) 3028–3040. doi:10.1016/j.cpc.2014.07.011.
  • (22) H. T. Huynh, A flux reconstruction approach to high-order schemes including discontinuous galerkin methods, in: 18th AIAA Computational Fluid Dynamics Conference, 2007, p. 4079.
  • (23) Y. Allaneau, A. Jameson, Connections between the filtered discontinuous galerkin method and the flux reconstruction approach to high order discretizations, Computer Methods in Applied Mechanics and Engineering 200 (49-52) (2011) 3628–3636.
  • (24) A. Dedner, R. Klöfkorn, M. Nolte, M. Ohlberger, A generic interface for parallel and adaptive discretization schemes: abstraction principles and the DUNE-FEM module, Computing 90 (3-4) (2010) 165–196.
  • (25) A. Bolis, C. D. Cantwell, D. Moxey, D. Serson, S. Sherwin, An adaptable parallel algorithm for the direct numerical simulation of incompressible turbulent flows using a fourier spectral/hp element method and mpi virtual topologies, Computer Physics Communications 206 (2016) 17–25.
  • (26) P. E. Vos, C. Eskilsson, A. Bolis, S. Chun, R. M. Kirby, S. J. Sherwin, A generic framework for time-stepping partial differential equations (PDEs): general linear methods, object-oriented implementation and application to fluid problems, International Journal of Computational Fluid Dynamics 25 (3) (2011) 107–125.
  • (27) C. D. Cantwell, S. Yakovlev, R. M. Kirby, N. S. Peters, S. J. Sherwin, High-order spectral/ element discretisation for reaction-diffusion problems on surfaces: Application to cardiac electrophysiology, Journal of Computational Physics 257 (2014) 813–829.
  • (28) G. Mengaldo, D. De Grazia, D. Moxey, P. E. Vincent, S. Sherwin, Dealiasing techniques for high-order spectral element methods on regular and irregular grids, Journal of Computational Physics 299 (2015) 56–81.
  • (29) A. R. Winters, R. C. Moura, G. Mengaldo, G. J. Gassner, S. Walch, J. Peiro, S. J. Sherwin, A comparative study on polynomial dealiasing and split form discontinuous Galerkin schemes for under-resolved turbulence computations, Journal of Computational Physics 372 (2018) 1–21.
  • (30) R. M. Kirby, S. J. Sherwin, Stabilisation of spectral/hp element methods through spectral vanishing viscosity: Application to fluid mechanics modelling, Computer Methods in Applied Mechanics and Engineering 195 (23-24) (2006) 3128–3144.
  • (31) J.-E. W. Lombard, D. Moxey, S. J. Sherwin, J. F. Hoessler, S. Dhandapani, M. J. Taylor, Implicit large-eddy simulation of a wingtip vortex, AIAA Journal 54 (2) (2015) 506–518.
  • (32) C. D. Cantwell, D. Moxey, A. Comerford, A. Bolis, G. Rocco, G. Mengaldo, D. de Grazia, S. Yakovlev, J.-E. Lombard, D. Ekelschot, B. Jordi, H. Xu, Y. Mohamied, C. Eskilsson, B. Nelson, P. Vos, C. Biotto, R. M. Kirby, S. J. Sherwin, Nektar++: An open-source spectral/hp element framework, Computer Physics Communications 192 (2015) 205–219. doi:10.1016/j.cpc.2015.02.008.
  • (33) H. Xu, C. D. Cantwell, C. Monteserin, C. Eskilsson, A. P. Engsig-Karup, S. J. Sherwin, Spectral/hp element methods: Recent developments, applications, and perspectives, Journal of Hydrodynamics 30 (1) (2018) 1–22.
  • (34) M. Dubiner, Spectral methods on triangles and other domains, Journal of Scientific Computing 6 (4) (1991) 345–390.
  • (35) S. J. Sherwin, G. E. Karniadakis, A triangular spectral element method; applications to the incompressible Navier-Stokes equations, Computer Methods in Applied Mechanics and Engineering 123 (1-4) (1995) 189–229.
  • (36) M. G. Duffy, Quadrature over a pyramid or cube of integrands with a singularity at a vertex, SIAM journal on Numerical Analysis 19 (6) (1982) 1260–1262.
  • (37) F. Bassi, S. Rebay, High-order accurate discontinuous finite element solution of the 2d Euler equations, Journal of Computational Physics 138 (2) (1997) 251–285.
  • (38) D. Moxey, C. D. Cantwell, R. M. Kirby, S. J. Sherwin, Optimizing the performance of the spectral/hp element method with collective linear algebra operations, Comput. Meth. Appl. Mech. Eng. 310 (2016) 628–645. doi:10.1016/j.cma.2016.07.001.
  • (39) D. Moxey, R. Amici, R. M. Kirby, Efficient matrix-free high-order finite element evaluation for simplicial elements, under review in SIAM J. Sci. Comp. (2019).
  • (40) S. Yakovlev, D. Moxey, S. J. Sherwin, R. M. Kirby, To CG or to HDG: a comparative study in 3D, Journal of Scientific Computing 67 (1) (2016) 192–220.
  • (41) B. Cockburn, C.-W. Shu, The local discontinuous galerkin method for time-dependent convection-diffusion systems, SIAM Journal on Numerical Analysis 35 (6) (1998) 2440–2463.
  • (42) M. Folk, G. Heber, Q. Koziol, E. Pourmal, D. Robinson, An overview of the hdf5 technology suite and its applications, in: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, ACM, 2011, pp. 36–47.
  • (43) M. Bareford, N. Johnson, M. Weiland, Improving nektar++ io performance for cray xc architecture, in: Cray User Group Proceedings, Stockholm, Sweden, 2018.
  • (44) C. Chevalier, F. Pellegrini, PT-Scotch: A tool for efficient parallel graph ordering, Parallel computing 34 (6-8) (2008) 318–331.
  • (45) K. Lackhove, Hybrid Noise Simulation for Enclosed Configurations, Doctoral thesis, Technische Universität Darmstadt (2018).
  • (46) M. Germano, Differential filters for the large eddy numerical simulation of turbulent flows, Physics of Fluids 29 (6) (1986) 1755. doi:10.1063/1.865649.
  • (47) A. Refloch, B. Courbet, A. Murrone, C. Laurent, J. Troyes, G. Chaineray, J. B. Dargaud, F. Vuillot, CEDRE Software, AerospaceLab (2011).
  • (48) F. Duchaine, S. Jauré, D. Poitou, E. Quémerais, G. Staffelbach, T. Morel, L. Gicquel, Analysis of high performance conjugate heat transfer with the OpenPALM coupler, Computational Science & Discovery 8 (1) (7 2015). doi:10.1088/1749-4699/8/1/015003.
  • (49) D. Abrahams, R. W. Grosse-Kunstleve, O. Overloading, Building hybrid systems with boost.python, CC Plus Plus Users Journal 21 (7) (2003) 29–36.
  • (50) P. Peterson, F2py: a tool for connecting fortran and python programs, International Journal of Computational Science and Engineering 4 (4) (2009) 296–305.
  • (51) D. M. Beazley, et al., Swig: An easy to use tool for integrating scripting languages with c and c++., in: Tcl/Tk Workshop, 1996, p. 43.
  • (52) Élie Cartan, Riemannian geometry in an orthogonal frame, World Scientific Pub. Co. Inc., 2002.
  • (53) S. Chun, Method of moving frames to solve conservation laws on curved surfaces, Journal of Scientific Computing 53 (2) (2012) 268–294.
  • (54) S. Chun, Method of moving frames to solve (an)isotropic diffusion equations on curved surfaces, Journal of Scientific Computing 59 (3) (2013) 626–666.
  • (55) S. Chun, C. Eskilsson, Method of moving frames to solve the shallow water equations on arbitrary rotating curved surfaces, Journal of Computational Physics 333 (2017) 1–23.
  • (56) S. Chun, Method of moving frames to solve the time-dependent Maxwell’s equations on anisotropic curved surfaces: Applications to invisible cloak and ELF propagation, Journal of Computational Physics 340 (2017) 85–104.
  • (57) S. Chun, J. Marcon, J. Peiró, S. J. Sherwin, High-order curvilinear mesh in the numerical solution of PDEs with moving frames on the sphere, submitted.
  • (58) S. Chun, C. Cantwell, A PDE-induced connection of moving frames for building Atlas in application to the analysis of the electric flow on the atrium, in preparation.
  • (59) D. Moxey, C. D. Cantwell, G. Mengaldo, D. Serson, D. Ekelschot, J. Peiró, S. J. Sherwin, R. M. Kirby, Towards p-adaptive spectral/hp element methods for modelling industrial flows, in: M. L. Bittencourt, N. A. Dumont, J. S. Hesthaven (Eds.), Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2016, Springer International Publishing, 2017, pp. 63–79.
  • (60) P.-O. Persson, J. Peraire, Sub-cell shock capturing for discontinuous galerkin methods, in: 44th AIAA Aerospace Sciences Meeting and Exhibit, 2006, p. 112.
  • (61) D. Serson, J. R. Meneghini, S. J. Sherwin, Velocity-correction schemes for the incompressible Navier–Stokes equations in general coordinate systems, Journal of Computational Physics 316 (2016) 243 – 254.
  • (62) D. Serson, J. R. Meneghini, S. J. Sherwin, Direct numerical simulations of the flow around wings with spanwise waviness at a very low Reynolds number, Computers & Fluids 146 (2017) 117 – 124.
  • (63) D. Serson, J. R. Meneghini, S. J. Sherwin, Direct numerical simulations of the flow around wings with spanwise waviness, Journal of Fluid Mechanics 826 (2017) 714 –– 731.
  • (64) S. J. Sherwin, J. Peiró, Mesh generation in curvilinear domains using high-order elements, Int. J. Numer. Meth. Engng 53 (2002) 207–223.
  • (65) Open Cascade SAS, Open Cascade (2019).
  • (66) J. Marcon, M. Turner, J. Peiró, D. Moxey, C. Pollard, H. Bucklow, M. Gammon, High-order curvilinear hybrid mesh generation for CFD simulations, in: 2018 AIAA Aerospace Sciences Meeting, American Institute of Aeronautics and Astronautics, Reston, Virginia, 2018, pp. 1403–. doi:10.2514/6.2018-1403.
  • (67) J. Marcon, J. Peiró, D. Moxey, N. Bergemann, H. Bucklow, M. Gammon, A semi-structured approach to curvilinear mesh generation around streamlined bodies, in: AIAA Scitech 2019 Forum, American Institute of Aeronautics and Astronautics, Reston, Virginia, 2019, pp. 1725–. doi:10.2514/6.2019-1725.
  • (68) M. Turner, D. Moxey, J. Peiró, M. Gammon, C. Pollard, H. Bucklow, A framework for the generation of high-order curvilinear hybrid meshes for CFD simulations, Procedia Engineering 203 (2017) 206–218. doi:10.1016/j.proeng.2017.09.808.
  • (69) C. Geuzaine, J.-F. Remacle, Gmsh: A 3-D finite element mesh generator with built-in pre- and post-processing facilities, International Journal for Numerical Methods in Engineering 79 (11) (2009) 1309–1331. doi:10.1002/nme.2579.
  • (70) D. Moxey, M. D. Green, S. J. Sherwin, J. Peiró, An isoparametric approach to high-order curvilinear boundary-layer meshing, Computer Methods in Applied Mechanics and Engineering 283 (2015) 636–650. doi:10.1016/j.cma.2014.09.019.
  • (71) D. Moxey, M. Hazan, J. Peiró, S. J. Sherwin, On the generation of curvilinear meshes through subdivision of isoparametric elements, to appear in proceedings of Tetrahedron IV (Jan. 2014).
  • (72) T. Colonius, S. K. Lele, Computational aeroacoustics: Progress on nonlinear problems of sound generation, Progress in Aerospace Sciences 40 (6) (2004) 345–416. doi:10.1016/j.paerosci.2004.09.001.
  • (73) C. K. W. Tam, Recent advances in computational aeroacoustics, Fluid Dynamics Research 38 (9) (2006) 591–615. doi:10.1016/j.fluiddyn.2006.03.006.
  • (74) R. Ewert, W. Schröder, Acoustic perturbation equations based on flow decomposition via source filtering, Journal of Computational Physics 188 (2) (2003) 365–398. doi:10.1016/S0021-9991(03)00168-2.
  • (75) E.-A. Müller, F. Obermeier, The spinning vortices as a source of sound, in: AGARD CP-22, 1967, pp. 21–22.
  • (76) J. Chaplin, P. Bearman, Y. Cheng, E. Fontaine, J. Graham, K. Herfjord, F. H. Huarte, M. Isherwood, K. Lambrakos, C. Larsen, et al., Blind predictions of laboratory measurements of vortex-induced vibrations of a tension riser, Journal of Fluids and Structures 21 (1) (2005) 25–40.
  • (77) R. Willden, J. Graham, Numerical prediction of VIV on long flexible circular cylinders, Journal of Fluids and Structures 15 (3) (2001) 659–669.
  • (78) Y. Bao, R. Palacios, M. Graham, S. Sherwin, Generalized thick strip modelling for vortex-induced vibration of long flexible cylinders, Journal of Computational Physics 321 (2016) 1079–1097.
  • (79) D. Newman, G. Karniadakis, A direct numerical simulation study of flow past a freely vibrating cable, Journal of Fluid Mechanics 344 (1997) 95–136.
  • (80) G. M. Laskowski, J. Kopriva, V. Michelassi, S. Shankaran, U. Paliath, R. Bhaskaran, Q. Wang, C. Talnikar, Z. J. Wang, F. Jia, Future Directions of High Fidelity CFD for Aerothermal Turbomachinery Analysis and Design, in: 46th AIAA Fluid Dynamics Conference, Washington, D.C., USA, 2016, pp. 1–30.
  • (81) E. Tadmor, Convergence of spectral methods for nonlinear conservation laws, SIAM Journal on Numerical Analysis 26 (1) (1989) 30–44.
  • (82) A. Cassinelli, F. Montomoli, P. Adami, S. J. Sherwin, High Fidelity Spectral/hp Element Methods for Turbomachinery, in: ASME Paper No. GT2018-75733, 2018, pp. 1–12.
  • (83) G. E. Karniadakis, M. Israeli, S. A. Orszag, High-order splitting methods for the incompressible Navier-Stokes equations, Journal of Computational Physics 97 (2) (1991) 414–443.
  • (84) S. Dong, G. E. Karniadakis, C. Chryssostomidis, A robust and accurate outflow boundary condition for incompressible flow simulations on severely-truncated unbounded domains, Journal of Computational Physics 261 (2014) 83–105.
  • (85) A. Cassinelli, H. Xu, F. Montomoli, P. Adami, R. Vazquez Diaz, S. J. Sherwin, On the effect of inflow disturbancs on the flow past a linear LPT vane using spectral/ element methods, in: ASME Paper No. GT2019-91622, 2019, pp. 1–12.
  • (86) G. Degrez, C. Boccadoro, J. Wendt, The interaction of an oblique shock wave with a laminar boundary layer revisited. An experimental and numerical study, Journal of Fluid Mechanics 177 (1987) 247–263.
  • (87) J.-P. Boin, J. Robinet, C. Corre, H. Deniau, 3D steady and unsteady bifurcations in a shock-wave/laminar boundary layer interaction: a numerical study, Theoretical and Computational Fluid Dynamics 20 (3) (2006) 163–180.
  • (88) F. M. White, Viscous fluid flow, McGraw-Hill New York, 2006.
  • (89) G. Mengaldo, D. De Grazia, F. Witherden, A. Farrington, P. Vincent, S. Sherwin, J. Peiro, A guide to the implementation of boundary conditions in compact high-order methods for compressible aerodynamics, in: 7th AIAA Theoretical Fluid Mechanics Conference, 2014, p. 2923.
  • (90) E. Eckert, Engineering relations for friction and heat transfer to surfaces in high velocity flow, Journal of the Aeronautical Sciences 22 (8) (1955) 585–587.
  • (91) J. Eichstädt, M. Green, M. Turner, J. Peiró, D. Moxey, Accelerating high-order mesh generation with an architecture-independent programming model, Comp. Phys. Comm. 229 (2018) 36–53.
  • (92) J. Eichstädt, M. Vymazal, D. Moxey, J. Peiró, A comparison of the shared-memory parallel programming models OpenMP, OpenACC and Kokkos in the context of implicit solvers for high-order FEM, under review in Comp. Phys. Comm. (2019).