ExaHyPE: An Engine for Parallel Dynamically Adaptive Simulations of Wave Problems

05/20/2019
by   Anne Reinarz, et al.
0

ExaHyPE ("An Exascale Hyperbolic PDE Engine") is a software engine for solving systems of first-order hyperbolic partial differential equations (PDEs). Hyperbolic PDEs are typically derived from the conservation laws of physics and are useful in a wide range of application areas. Applications powered by ExaHyPE can be run on a student's laptop, but are also able to exploit thousands of processor cores on state-of-the-art supercomputers. The engine is able to dynamically increase the accuracy of the simulation using adaptive mesh refinement where required. Due to the robustness and shock capturing abilities of ExaHyPE's numerical methods, users of the engine can simulate linear and non-linear hyperbolic PDEs with very high accuracy. Users can tailor the engine to their particular PDE by specifying evolved quantities, fluxes, and source terms. A complete simulation code for a new hyperbolic PDE can often be realised within a few hours - a task that, traditionally, can take weeks, months, often years for researchers starting from scratch. In this paper, we showcase ExaHyPE's workflow and capabilities through real-world scenarios from our two main application areas: seismology and astrophysics.

READ FULL TEXT VIEW PDF

page 6

page 8

page 9

page 11

page 16

page 17

page 19

page 22

11/18/2021

Error analysis of first time to a threshold value for partial differential equations

We develop an a posteriori error analysis for a novel quantity of intere...
05/17/2021

Physics-informed attention-based neural network for solving non-linear partial differential equations

Physics-Informed Neural Networks (PINNs) have enabled significant improv...
04/19/2021

Multigrid Reduction in Time for non-linear hyperbolic equations

Time-parallel algorithms seek greater concurrency by decomposing the tem...
05/05/2021

On the Spatial and Temporal Order of Convergence of Hyperbolic PDEs

In this work, we determine the full expression of the local truncation e...
09/28/2020

Learning Interpretable and Thermodynamically Stable Partial Differential Equations

In this work, we develop a method for learning interpretable and thermod...
11/15/2019

Role-Oriented Code Generation in an Engine for Solving Hyperbolic PDE Systems

The development of a high performance PDE solver requires the combined e...
10/09/2018

Studies on the energy and deep memory behaviour of a cache-oblivious, task-based hyperbolic PDE solver

We study the performance behaviour of a seismic simulation using the Exa...

1 Introduction

The study of waves has always been an important subject of research. It is not difficult to see why: Earthquakes and tsunamis have a direct and serious impact on the daily lives of millions of people. A better understanding of electromagnetic waves has enabled ever faster wireless communication. The study of gravitational waves has allowed new insight into the composition and history of our Universe. Those physical phenomena, despite arising in different fields of physics and engineering, can be modelled in a similar way from a mathematical perspective: as a system of hyperbolic partial differential equations (PDE). The consortium behind the ExaHyPE project (“An Exascale Hyperbolic PDE Engine”) translated this structural similarity into a software engine for modelling and simulating a wide range of hyperbolic PDE systems. The consortium focuses on two challenging scenarios, long-range seismic risk assessment, see e.g. Duru:2017 ; tavelli:dim ; and the search for gravitational waves emitted by binary neutron stars, see e.g Bishop:2016 ; Tsorkaros:2016 ; dumbser:2017 ; zanotti:2016 ; fambri:2017 .

ExaHyPE implements a high-order discontinuous Galerkin (DG) approach. DG schemes were first introduced by Reed et al. reed:1973 for the neutron transport equation and subsequently extended to general hyperbolic systems in a series of papers by Cockburn et al. shu:1 ; shu:2 ; shu:3 ; shu:4 ; shu:5 . Within our DG framework, higher-order accuracy in time and space is achieved using the arbitrary high-order accurate ADER-DG approach first introduced by Toro and Titarev Titarev:2002 . In the original ADER approach, high-order accuracy in time is achieved by using the Cauchy-Kowaleskaya procedure to replace time derivatives with spatial derivatives. This procedure is very efficient for linear problems Gassner:2011:ExplicitOneStep , but becomes cumbersome for non-linear problems. Moreover, this procedure cannot deal with stiff source terms. To tackle these shortcomings, an alternative ADER approach was introduced by Dumbser et al. Dumbser:2008 . In this approach, the Cauchy-Kowaleskaya procedure is replaced by an implicit solve of a cell-local space-time weak formulation of the PDE. This removes the problem dependency of the approach and allows the handling of stiff source terms. ExaHyPE provides an implementation of both ADER-DG variants.

In non-linear hyperbolic PDE systems, discontinuities and steep gradients can arise even from smooth initial conditions. High-order methods may then produce spurious oscillation which decrease the approximation quality and may even render the computed solution unphysical. To remedy this issue, a wide range of limiters for DG schemes have been proposed. Notably, these include approaches based on artificial viscosity hartmann:2002 ; persson:2006 , filtering rezzolla:2011 and WENO- or HWENO-based reconstruction shu:WENO ; shu:HWENO . The approach taken within the ExaHyPE engine is multi-dimensional optimal-order-detection (MOOD). This approach was initially applied to finite volume schemes loubere:1 ; loubere:2 ; loubere:3 and has recently been extended to DG schemes loubere:4 . In this approach, the solution is checked a-posteriori for certain admissibility or plausibility criteria and is marked troubled if it does not meet them. Troubled cells are then recalculated with a more robust finite volume scheme. The MOOD approach bypasses limitations of a-priori detection of troubled zones. We identify problematic regions after each time step and roll back to a more robust scheme on demand. This means that we do not need to reliably know all areas with issues a-priori. This approach allows for good resolution of shocks and other discontinuities Dumbser:14:Posteriori .

In the developed engine, problems are discretised on tree-structured fully adaptive Cartesian meshes provided by the Peano framework Weinzierl:11:Peano ; Weinzierl:2019 . Dynamical adaptive mesh refinement (AMR) enhances the shock-capturing abilities of the ADER-DG scheme further Zanotti:2015:SpaceTimeAMR and allows good resolution of local features.

ExaHyPE comprises the following key features:

  • High-order ADER discontinuous Galerkin (ADER-DG) with a-posteriori subcell limiting and finite volume (FV) schemes;

  • Dynamic mesh refinement on Cartesian grids in two and three dimensions;

  • A simple API that allows users to quickly realise complex applications;

  • User-provided code can be written in Fortran or C++;

  • Automatically generated architecture- and application-specific optimised ADER-DG routines;

  • Shared memory parallelisation through Intel’s Threading Building Blocks (TBB);

  • Distributed memory parallelisation with MPI.

Furthermore, ExaHyPE offers a wide range of post-processing and plotting facilities such as support for vtk. The software can be compiled with Intel and GNU compilers and provides a switch for choosing different compilation modes: Our release mode aggressively optimises the application, while the assertion and debug compilation modes activate the assertions within the code and print additional output. Users and developers can write log filters to filter output relevant to them. For continuous testing a Jenkins server jenkins verifies that the code compiles with all parallelisation features in all different compilation modes. The server is currently being extended to also run automatic convergence and scalability studies.

Our paper starts with an overview of the problem setting using several examples. We then briefly sketch the solution methods used in ExaHyPE, focusing on the ADER-DG algorithm. Next we use a simple example to demonstrate the workflow when using ExaHyPE and describe the engine architecture. We conclude with a sequence of numerical examples from various application areas to demonstrate the capabilities of the engine.

2 Problem Formulation

We consider hyperbolic systems of balance laws that may contain both conservative and non-conservative terms. They have to be given in the following first-order form:

(1)

where

is the state vector of the

conserved variables, is the computational domain, is the material matrix,

is the flux tensor, and

represents its non-conservative part. Finally, is the source term and are the given point sources.

Hyperbolic systems in the form (1) can be used to model a wide range of applications that involve waves. In the following, we demonstrate the versatility of our formulation by introducing equations from three different application areas. Numerical experiments for these examples are provided in Section 6.

2.1 Waves in Elastic Media

Linear elasticity models velocities, stress and strain of a heterogeneous medium. They are driven by Hooke’s law and the conservation of momentum. Following the form of equation (1) we write them as

where denotes the mass density, the velocity and the stress tensor, which can be written in terms of its six independent components as . In this paper we consider isotropic materials, i.e. the material matrix depends only on the two Lamé constants and of the material. However, the formulation holds also for more general anisotropic materials.

These equations can be used to simulate seismic waves, such as earthquakes. In this context the restriction of ExaHyPE to Cartesian meshes seems to be restrictive. However, adaptive Cartesian meshes can be extended to allow the modelling of complex topography. Two methods have been realised in ExaHyPE to represent complex topographies. The first approach treats ExaHyPE’s adaptive Cartesian mesh as reference domain that is mapped to a complex topography via high-order curvilinear transformations duru:curvilinear . The second approach, a diffuse interface method, represents the topography as a smooth field tavelli:dim . These approaches are described in more detail in Section 6.2.

2.2 Shallow Water Equations

In atmospheric and oceanic modelling of coastal areas, horizontal length scales are typically significantly greater than the vertical length scale. In this case, fluid flow can be modelled with the two-dimensional shallow water equations instead of the more complicated three-dimensional Navier-Stokes equations.

Following the form of equation (1) they can be written as

(2)

where denotes the height of the water column, the horizontal flow velocity, the gravity and the bathymetry.

Hyperbolic systems of balance laws have non-trivial equilibrium solutions in which flux and source terms cancel. A well-balanced numerical scheme is capable of maintaining such an equilibrium state. In Section 3 we describe the ADER-DG scheme used in ExaHyPE and in Section 6 we will describe how to keep the scheme well balanced while allowing wetting and drying.

2.3 Compressible Navier-Stokes Equations

ExaHyPE is extensible to non-hyperbolic equations. As an example of such an extension we show the compressible Navier-Stokes equations. They are used to model the dynamics of a viscous fluid, and are given by

(3)

where denotes the density, the momentum, the energy density, the temperature and the pressure (this term includes gravitational effects). The temperature diffusion is given by with constant , these effects depend on the gradient of . In the source term, the vector is the unit vector in -direction and is the gravitation of Earth. The viscous effects are modelled by the stress tensor . Thus, the flux from (1) has been modified to allow as input. More details on the implementation of these equations can be found in Krenz:19:ExaCloud .

2.4 General Relativistic Magneto-Hydrodynamics

The equations of classical magnetohydrodynamics (MHD) are used to model the dynamics of an electrically ideally conducting fluid with comparable hydrodynamic and electromagnetic forces. When modelling astrophysical objects with strong gravitational fields, e.g. neutron stars, it becomes necessary to model the background space-time as well. We use the standard split to decompose the four dimensional space-time manifold into 3D hyper-surfaces parameterised by a time coordinate . The background space-time is introduced into the equations in the form of a non-conservative product.

Following the form of equation (1) they can written as

where and .

The curved space-time is parameterised by several hyper-surface variables: lapse , spatial metric tensor shift vector and extrinsic curvature . The spatial metric tensor is given as a vector of its six independent components and has the determinant . Further, is the conserved density, which is related to the rest mass density by the Lorentz factor , is the fluid velocity, is the Maxwell 3-energy momentum tensor, is the conserved momentum, is the magnetic field and is the conserved energy density. Finally, is an artificial scalar introduced to ensure a divergence-free magnetic field, and is the characteristic velocity of the divergence cleaning. For more details on this formulation see e.g. koeppel:grmhd ; fambri:grmhd ; rezzolla:2013 .

3 Solver Components

This section briefly summarises the numerical algorithms used in ExaHyPE. For brevity, let us use a pared down form of (1), in which only the flux term is nontrivial:

(4)

Assume that (4) is subject to appropriate initial and boundary conditions:

3.1 Discretisation

ExaHyPE allows for Cartesian grids in two and three dimensions. The computational domain is divided into a grid using a space-tree construction scheme based on tripartition Weinzierl:11:Peano . Each cell is recursively refined giving an adaptive Cartesian grid as shown in Figure 1. Meshes can be defined with an arbitrary numbers of elements in each direction on the coarsest level. Refinement is based on tripartitioning.

Figure 1: Left: Adaptive Cartesian mesh in 3 dimensions. The FV limiter is active on the finest level. Right: Peano space-filling curve running through a 2D mesh that has a similar refinement pattern.

3.2 Finite Volumes

Classically systems of hyperbolic PDE have been solved using finite difference or finite volume schemes. In a finite volume method cell averages are calculated. Volume integrals in a PDE that contain a divergence term are converted to surface integrals using the divergence theorem. These terms are then evaluated as fluxes at the surfaces of each element. However, to achieve high order accuracy in a finite volume scheme, large stencils and expensive recovery or reconstruction procedures are needed. Examples include essentially non-oscillatory (ENO) or weighted ENO (WENO) schemes, see e.g. abgrall:1994 ; dumbser:weno .

ExaHyPE provides access to classical Godunov type schemes as well as MUSCL schemes leer:1997 . Further, users can use parts of the generic compute routines while implementing other parts on their own. For instance, in both the ADER-DG and Finite Volume schemes, users can overwrite the Riemann Solver with their own implementation. The Finite volume scheme in ExaHyPE always works on uniform grids of a specific size.

3.3 Ader-Dg

In a DG method the numerical solution for the state vector is represented within each mesh element by basis functions from the space of polynomials. The method operates on a weak form of the equation (4). We use the ADER-DG method as proposed by Dumbser et al. in Zanotti:2015 . The ADER-DG method, which was introduced by Toro and Titarev in 2002 Titarev:2002 , reaches high order convergence in time and space element locally. First the integration in time is performed only within the element, neglecting the element interfaces and finally a single correction step is performed to take the element interfaces into account.

At the start of time-step the numerical solution of (4) is represented by a piecewise polynomial of degree . The basis of polynomials is constructed using tensor products of Lagrange polynomials over Gauss-Legendre or Gauss-Lobatto points. The solution can thus be represented by coefficients on each cell , where is the number of quantities.

Let us derive the weak form of (4). To do so we multiply with a space-time test function from the space of piecewise polynomials as above for the solution . Integrating over each element and over the current time interval then gives the element-local weak formulation:

(5)

There are three phases per ADER-DG time step:

  1. Per grid cell and time interval , we first implicitly solve (5) using Picard iterations Dumbser:14:Posteriori . The concurrent solves of (5) do not take into account any information from neighbouring elements and thus yield jumps along the cell faces in and .

  2. The second phase traverses all faces of the grid and computes a numerical normal flux from both adjacent cells. ExaHyPE uses a Rusanov flux by default; users can replace it with any other Riemann solver.

  3. In the third algorithmic phase, we traverse the cells again and solve

    (6)

    for , see Dumbser:2008 . The time step (6) is derived from spatially testing and partially integrating (4). Equation 6 can be easily inverted given that the ansatz and test space typically yield a diagonal mass matrix.

3.4 Time-Step Restrictions

Nonlinear effects and mesh adaptation require adjustments to the time step size during a simulation. This is expressed by the CFL condition, which gives an upper bound on the stable time step size for explicit DG schemes:

(7)

where and

are the mesh size and the maximum eigenvalue, respectively, and

is a stability factor that depends on the polynomial order Dumbser:2008 .

3.5 A-Posteriori Limiting

The unlimited ADER-DG algorithm will suffer from numerical oscillations (Gibbs phenomenon) in the presence of steep gradients or shock waves. Therefore a limiter must be applied. The approach followed in ExaHyPE is based on the a-posteriori MOOD method of Loubére et al. loubere:4 . The solution is checked a-posteriori for certain admissibility or plausibility criteria and is recalculated with a robust FV scheme if it does not meet them. The FV patch size is chosen to have an order of , this is the smallest cell size that does not violate the CFL condition (7).

In contrast to the original approach, ExaHyPE’s approach incorporates the observation that cells usually require a recalculation with FV multiple time steps in a row after the initial check failed. Therefore, ExaHyPE implements the a-posteriori limiting ADER-DG method as a hybrid ADER-DG-FV method (Figure 2).

Figure 2: Left: Limiter status stencil. Two FV cells (red) are surrounded by (ADER-)DG cells (white). Cells in the interface layers compute with FV and project to DG (orange) or with DG and project to FV (yellow). Right: The FV and DG subdomains are initialised along a discontinuity.

As a-posteriori detection criteria we use

  1. Physical Admissibility Criteria: Depending on the system of PDEs being studied certain physical constraints can be placed on the solution. For the shallow water equations these are positivity of the water height. These criteria are supplied by the user along with the PDE terms.

  2. Numerical Admissibility Criteria: To identify shocks we use a relaxed discrete maximum principle (DMP) in the sense of polynomials, for details see e.g. loubere:4 .

We call DG cells that do not satisfy the above criteria troubled cells. If a cell is flagged as troubled and has not been troubled in the previous time step, the scheme goes back to the old time step and recomputes the solution in all troubled cells (and their direct neighbours) with the FV scheme.

4 Engine Architecture and Programming Workflow

Figure 3: Engine architecture

This section briefly walks through the workflow of setting up an application in ExaHyPE using the shallow water equations given in (2) as an example. The architecture of ExaHyPE is illustrated in Figure 3. ExaHyPE is a solver engine, domain-specific code has to be written by the user to obtain simulation code. In Figure 3 a turquoise colour is used to highlight files written by the user. To write an ExaHyPE application users typically start from a specification file. The specification file is passed to the ExaHyPE toolkit, which creates glue code, empty application-specific classes and optionally application and architecture tailored core routines. The application specific classes are filled by the user with the PDE terms. This code can be written in C++ or Fortran. The generated glue code and the initially empty templates make up the ExaHyPE user solver.

The ExaHyPE core is realised upon the Peano framework (green), which builds dynamically adaptive Cartesian meshes. In addition, it provides an efficient mesh traversal loop that ExaHyPE’s algorithms plug into. Peano itself is a third-party component.

The number of dependencies in the ExaHyPE core is minimal. However, the architecture may not fulfil the requirements of all applications, so the user can extend and connect further software fragments to the ExaHyPE core. In red we show an optional dependency, libxsmm Heinecke:16:LIBXSMM . This package provides efficient kernels for small matrix multiplications, which are used by the optimised ADER-DG routines described in Section 5.2. Similarly, further software packages can be added by the user as needed.

In the following section, we will provide a brief summary of the components of the user solver, highlighting the parts of the specification file that need to be modified for each component. The solver components can be set up flexibly by modifying the specification file. Users only need to write application-specific code that sets up their PDE system.

4.1 Code Generation and Compilation

After the specification file has been written it is handed over to the ExaHyPE toolkit. A minimal ExaHyPE specification file is shown below.

 exahype-project SWE
  output-directory const = ./SWE

  computational-domain
    dimension const         = 2
    width                   = 1.0, 1.0
    offset                  = 0.0, 0.0
    end-time                = 1.0
  end computational-domain

  solver ADER-DG MySWESolver
    variables const    = h:1,hu:1,hv:1,b:1
    order const        = 3
    maximum-mesh-size  = 0.1
    time-stepping      = global
    type const         = non-linear
    terms const        = flux,ncp
    optimisation const = generic, usestack
  end solver
end exahype-project

To prepare this example for the simulation, run

> ./Toolkit/toolkit.sh Demonstrators/SWE.exahype

The toolkit generates a Makefile, glue code, as well as various helper files. Among them is one C++ class per solver that was specified in the specification file. Within each implementation file, the user can specify initial conditions, mesh refinement control, etc. For example, to set the eigenvalues the following function is generated in the file MySWESolver.cpp.

void SWE::MySWESolver::eigenvalues(const double* const Q,
                                   const int direction,
                                   double* const lambda) {
  // @todo Please implement/augment if required
  lambda[0] = 1.0;
  lambda[1] = 1.0;
  lambda[2] = 1.0;
  lambda[3] = 1.0;
}

This function can then be filled with the chosen initial conditions. Similarly, the other functions flux, eigenvalues and boundary conditions can be defined in the file. For the shallow water equations this would be:

void SWE::MySWESolver::eigenvalues(const double* const Q,
                                   const int direction,
                                   double* const lambda) {
  ReadOnlyVariables vars(Q);
  Variables eigs(lambda);

  const double c = std::sqrt(gravity*vars.h());
  double u_n = Q[direction + 1] * 1.0/vars.h();

  eigs.h()  = u_n + c;
  eigs.hu() = u_n - c;
  eigs.hv() = u_n;
  eigs.b()  = 0.0;
}

In this example we have used named variables to enhance readability. However, ExaHyPE also allows the user to access the vectors directly as can be seen in the generated function above.

Figure 4: Vtk output of a shallow water simulation of the Tohoku tsunami. Left: FV and DG domains. Right: the tsunami 5 min after the initial event. These figures are taken from rannabauer:swe:2018 .

The whole build environment is generated. A simple make will create the ExaHyPE executable. ExaHyPE’s specification files always act as both specification and configuration file, i.e when running the code the specification file is passed in again.

> ./ExaHyPE-SWE ./SWE.exahype

A successful run yields a sequence of vtk files that you can open with Paraview or VisIt. In this example, we plot two quantities. Such an output is shown in Figure 4.

Global metrics such as integrals can also be realised using a plotter with no output variables, i.e. variable const = 0.

5 Parallelisation and Optimisation Features

ExaHyPE relies on the Peano framework for mesh generation. Peano realises cache-efficient, tree-structured, adaptive mesh refinement. To traverse the cells or vertices the mesh traversal automaton of Peano runs along the Peano space-filling-curve (SFC), see Figure 1. The action of a PDE operator is mapped to the SFC traversal by plugging into events triggered by the traversal automaton, these mappings can be generated by a toolkit.

Peano also takes care of the distributed-memory and shared-memory parallelisation. Domain decomposition for distributed-memory parallel simulations is realised by forking off or merging subtrees. The domain decomposition can be influenced via load balancing callbacks. The shared-memory parallelisation relies on identifying regular substructures in the tree and employing parallel-for loops in these areas Weinzierl:RunLengthEncoding . Recently, we introduced a runtime tasking system to Peano and ExaHyPE that introduces additional multi-threading concurrency Charrier:18:Enclave in highly adaptive mesh regions and allows overlapping computations with MPI communication.

ExaHyPE builds on the Peano framework and it inherits a user model from Peano: Our engine removes the responsibility for algorithmic issues from the user. The time step, the mesh traversal and the parallelisation are implemented in a generic way. The user only controls the PDE system being solved, by specifying how many quantities are needed, which terms are required, and ultimately what all terms look like. Our goal is to allow users to focus on the physics only and to hide away as many implementation details as possible.

5.1 High-Level Optimisations

ExaHyPE realises few high level algorithmic optimisations that users can switch on and off at code startup through the specification file. To gain access to these optimisations, we add the following optional section to the minimal specification file shown in Section 4.

global-optimisation
    fuse-algorithmic-steps               = all
    spawn-predictor-as-background-thread = on
    spawn-amr-background-threads         = on
end global-optimisation

A discussion of each individual algorithmic tuning is beyond scope of this paper. Some techniques are:

  • Step fusion: Each time step of an ExaHyPE solver consists of three phases: computation of local ADER-DG predictor (and projection of the prediction onto the cell faces), solve of the Riemann problems at the cell faces, and update of the predicted solution. We may speed up the code if we fuse these four steps into one grid traversal.

  • Spawn background jobs: TBB has a thread pool in which idle threads wait for tasks, as soon as new tasks are spawned the threads in this pool start this task. We refer to these threads as background threads. Since spawning and scheduling these tasks has a overhead we wait until a certain number of tasks has accumulated and schedule these tasks together. Certain space-time-predictor computations can be spawned as a background job. Costly AMR operations such as the imposition of initial conditions and evaluation of refinement criteria can also be performed as a background jobs.

  • Modify storage precision: ExaHyPE internally can store data in less-than-IEEE double precision. This can reduce the memory footprint of the code significantly.

Most users will not modify these options while they prototype. When they start production runs, they can tweak the engine instantiation through them. The rationale behind exposing these control values is simple: It is not clear at the moment which option combinations robustly improve performance. An auto-tuning/machine learning approach to find the optimal parameter combinations could be considered..

ExaHyPE allows running multiple simulations on the same computational mesh. This can be facilitated by adding multiple solver descriptions to the specification file. Each solver uses its own base grid and refinement criterion.

5.2 Optimised ADER-DG Routines

One of ExaHyPE’s key ideas is to use tailored, extremely optimised code routines whenever it comes to the evaluation of the ADER-DG scheme’s steps and other computationally expensive routines. AMR projections and the space-time predictor (5) are typically the dominant steps. If the user decides to use these optimised variants in the specification file, the toolkit calls a specific code generator Python3 module and links to its output in the generated glue code so that the calls to the generic routines are replaced by calls to the generated optimised ones.

SIMD operations, notably introduced with AVX-512 on KNL and Skylake, become increasingly critical to fully exploit the potential of modern CPUs. Therefore, on Intel machines, the optimised routines’ main goal is to either directly use SIMD or enable as much auto-vectorisation from the compiler as possible. To that end the optimised routines use Intel’s libxsmm Heinecke:16:LIBXSMM

, which is the second third-party building block in the basic ExaHyPE architecture. We map all tensor operations required by the ADER-DG algorithm to general matrix multiplications (gemms) and apply architecture-specific vectorised matrix operations. Furthermore, the code generation allows the introduction of data padding and alignment to get the most out of the compiler auto-vectorisation.

While improving SIMD vectorisation is our current prime use case for the optimised kernels, we provide alternative optimisation dimensions: Instead of a vectorisation of the native ADER-DG loops, the optimised routines can also be configured to work with vectorised PDE formulations. This implies that the PDE terms are computed through SIMD operations instead of scalar ones and requires some work from the user. This has been found to greatly improve performance Dumbser:2018 .

Instead of the standard Picard iteration, we provide an alternative realisation which uses an explicit Finite Volume scheme to yield an initial guess to the Picard solve, leading to a much lower number of required iterations.

6 Numerical Results

In this section, we show the ExaHyPE engine in action. The applications in this section are taken from our introductory discussion in Section 2. Only the Euler equations are added as an all-time classic starting point for solvers of hyperbolic systems. All specification and source files used to generate the results in this section are made publicly available, and all of the tests themselves can be run on a standard laptop. The scaling tests shown require the use of a larger cluster.

6.1 Euler Equations

The non-linear compressible Euler equations model the flow of an inviscid fluid with constant density. Solutions of the Euler equations are sometimes used as approximations to real fluids problem, e.g. the lift of a thin airfoil. They are given by

(8)

where denotes the mass density, denotes the momentum density, denotes the energy density, denotes the fluid pressure, to be given by an equation of state, and denotes the outer product.

Figure 5: An ADER-DG implementation of the sod shock tube problem in 2D using a base grid of cells. From left to right: The density at , a comparison of the analytical solution and the solution for polynomial degrees for the density across , the same comparison with one level of adaptive mesh refinement at the shock.

The sod shock tube problem is a well-known benchmark for the accuracy of hyperbolic system solvers book:leveque . Initially, the domain contains gases at two different densities at rest, separated at . This could be two different gases, or simply one gas at different temperatures. More precisely,

The time evolution consists of a shock moving into the gas at lower pressure and a rarefaction wave expanding into the gas at higher pressure. A contact discontinuity moves at the interface between the two gases.

This very simple test, which is a special case of the Riemann problem, nevertheless allows us to test several features of the code at once. We test the ability of the code to capture and resolve shocks and contact discontinuities, as well as to accurately capture the density profile of the rarefaction wave. Due to Gibbs phenomenon the shock wave cannot be accurately captured by the high order DG solution. However, the FV limiter can indeed deal with the shock.

In Figure 5, we show the density at time . We see that with only elements in each direction the solution with polynomial order is already very accurate. Further, we note that the adaptive mesh refinement around the areas of discontinuity allows us to accurately capture the solution even with low polynomial order.

Figure 6: Shared-memory scalability of the non-linear ADER-DG implementation for a circular, spherical explosion problem experiments using a base grid with up to two levels of adaptive refinement. Left: 2D test with a base grid of of cells. Right: 3D test with a base grid of cells.

To demonstrate the shared-memory scalability of the code we use a variant of the Sod shock tube problem, the circular, spherical explosion problem, also referred to as a “multi-dimensional Sod shock tube”. This test case uses the following initial conditions

and wall boundary conditions.

In Figure 6 we show the shared-memory scalability in two and three dimensions. This scaling test was run on the cores of an Intel Xeon E5-v3 processor with GHz core frequency (SuperMUC Phase 2). The tests were run on one processor with 14 cores using Intel’s TBB Reinders:2007 for parallelisation. We consider a regular base grid and allow a dynamic refinement criterion to add up to two additional levels of cells around the shock front. In these tests all kernel level optimisations that are available in ExaHyPE have been used. As a result, scalability is more challenging, however, we are interested mainly in reducing the overall run-time.

The scalability improves as the work per element increases, i.e. it improves with higher polynomial degree or for increasing dimension. In general good scalability can be observed on up to 14 cores at higher orders. All ExaHyPE codes employ a hybrid parallelisation strategy with at least two MPI ranks per node CharrierUndEkaterinasPaper2019 . Results for this hybrid parallelisation strategy are provided in Section 6.6.

6.2 Elastic Wave Equation

Figure 7: From left to right: Plot of the limited area indicated by ; The volicity field in with the simulated topography in the background at (transparent indicates the free surface).

The restriction of ExaHyPE to Cartesian meshes seems restrictive in the context of most seismic applications. In this section, we introduce two methods for incorporating complex geometries into such a mesh.

6.2.1 Diffuse Interface Approach

In our diffuse interface approach the geometry is represented using a volume fraction of the solid medium present in a control volume tavelli:dim . Diffuse interfaces completely avoid the problem of mesh generation, since all that is needed for the definition of the complex surface topography is to set a scalar colour function to unity inside the regions covered by the solid and to zero outside.

The diffuse interface model is given by:

where , , , and the stress tensor are defined as in the Section 2.1.

The material parameters are assumed to remain constant, i.e.

Physically, represents the volume fraction of the solid medium present in a control volume. The equations become non-linear in those regions in which .

This formulation can be extended to allowing moving materials. Instead of solving , we can solve

In this way the free surface boundary is allowed to move according the local velocity field tavelli:dim .

To verify the accuracy of this method we solve the layer over homogeneous halfspace (LOH.1) benchmark problem described by Day et al. Day:LOH1 . This problem is a well-known reference benchmark for seismic wave propagation in numerical codes. The LOH.1 benchmark considers way propagation in a hexahedral geometry filled with two materials that are stacked on top of each other. The first material is characterised by a lower density and smaller seismic wave speeds. The exact material parameters are defined in Table 1. The parameters and of the equation can be derived from these values.

Table 1: Material parameters of the LOH.1 benchmark.

A point source is placed km below the surface at the centre of the domain, such that the resulting wave propagates through the change of material.

In the diffuse interface method we limit the free surface to be able to resolve the local discontinuity of . The left picture in Figure 7 shows the resulting distribution of limited (red) and unlimited areas (blue) and indicates the transition of both (light blue and red). The right picture shows the velocity field in x direction at . Our implementation of the diffuse interface method is able to successfully resolve both the changing material parameters and the absorbing surface boundary conditions.

Figure 8: Simulation of seismic wave fields around the Zugspitze with curvilinear meshes. The internal Cartesian mesh (left) is used to simulate a complex topography (right).

6.2.2 Curvilinear meshes

Our second approach models complex topographies by applying a curvilinear transformation to the elements of an adaptive Cartesian mesh. Taking this mapping into account, the linear elastic wave equations can be written as:

where is the Jacobian matrix of the mapping from mesh to topography. This approach allows us to model meshes with complex topographies including faults and inner branches duru:curvilinear .

In Figure 8 shows a snapshot of an experiment that simulates propagation of seismic waves in the area around the mountain Zugspitze, in Germany. Both our curvilinear approach and the diffuse interface method are able to resolve the complicated topographies in this benchmark scenario.

Figure 9 shows the shared-memory scalability of the linear ADER-DG implementation on SuperMUC’s Phase 2 when running the LOH.1 benchmark comparing the scalability of the two meshing approaches. We consider a regular base grid with cells and allow a dynamic refinement criterion to add one additional level of cells to the layer over the halfspace. Memory constraints prevented further refinement. As in the non-linear test case the scalability improves as the work per element increases. However, due to the lower amount of total work in the linear setting a higher polynomial degree needs to be reached to attain scalability in this case. This means that the overall scalability of the diffuse interface approach is higher, however, this comes at the price of a higher overall computational cost due to the non-linearity of the formulation.

The question of which method should be used is highly problem dependent. The curvilinear method is purely linear and as such computationally cheap. The non-linear diffuse interface approach, on the other hand, requires additional limiting on the surface. However, the time step size for the DIM is independent of the simulated topography, while in the curvilinear method it is highly dependent on the distortion introduced by the topography. Simulations with a relatively smooth, flat surface are expected to be faster with the curvilinear method, while highly varying topographies are better discretised with the DIM. DIM has the added advantage of allowing moving free surface boundaries, a feature that is difficult to realise with curvilinear meshes.

Figure 9: Shared-memory scalability of the linear ADER-DG implementation on SuperMUC’s Phase 2 when running the LOH.1 benchmark. Left: Curvilinear elements are used. Right: The diffuse interface method is used.

6.3 Shallow Water Equations

We demonstrate the dynamic mesh refinement capabilities of ExaHyPE via the shallow water equations (2). Even more importantly, we present a non-toy problem which uses ExaHyPE’s 2D facilities. To solve (2), we use the a-posteriori limiting ADER-DG method and equip its FV limiter with a recently developed HLLEM Riemann solver Dumbser:2016:HLLEM . The solver allows for wetting and drying. This also demonstrates that users can inject their own Riemann solvers in ExaHyPE.

The same scenario is available in ExaHyPE using a Finite Volume scheme instead of an ADER-DG scheme with FV limiter rannabauer:swe:2018 ; Rannabauer:2018 . In both implementations the same Riemann solver can be used. The Riemann solver considers a dry tolerance below which cells are marked as dry. The constant is necessary to avoid negative water heights, which usually lead to unphysical non-linear effects. In the case of two adjacent wet cells the solver reduces to the well known Rusanov flux. If one of the cells is flooded the jump in bathymetry is taken into account by the Riemann solver. When reconstructing the DG solution from the FV limiter the water level is reconstructed first and then the bathymetry is subtracted. This ensures that the non-linear reconstruction process does not produce artificial waves. In tsunami simulations, this scheme allows us to simulate areas close to the coast with the FV subcell limiter, while areas in the deep ocean are processed by the high order ADER-DG method.

                                                             

Figure 10: ADER-DG with a-posteriori limiting for the oscillating lake scenario (shallow water equations). Top row: water height at time . Bottom row: The locations (red) in which the FV limiter is active, the adaptively refined mesh and the error at times and .

To test the shallow water equations given in (2) we use the oscillating lake scenario. Here, a water droplet travels in circular motion over a dry basin. The analytical solution, which is used as an initial condition is given by

where and denotes the gravitational constant. The initial conditions are supplemented by wall boundary conditions. In this setup we use polynomial order and a base grid of elements. We adaptively refine mesh elements where the water height is small but not zero. We allow up to two levels of adaptive refinement.

This scenario provides a challenging test case for numerical codes due to the continuous wetting and drying of cells. There exists an analytical solution for this benchmark scenario allowing us to verify the well-balancedness of a scheme, as well as in the resolution of drying or inundated cells.

In the top row of Figure 10, we plot the water height at . Despite the frequent wetting and drying the scheme performs well. In those areas in which the DG solution is used, we note the accuracy of the higher order scheme on a coarse grid. In those areas in which a FV solution is necessary, the dynamic AMR is useful to retain accuracy and avoid diffusion. The bottom row of Figure 10 shows the locations in which the FV limiter is active and adaptive mesh refinement is used. Furthermore, it shows the error in the water height. We observe that the FV limiter is used only in those cells in which it is required, the limiter locations move with the solution. The error remains below in all cells.

Figure 11: Shared-memory scalability of the non-linear ADER-DG implementation for the oscillating lake experiments using a base grid of cells with up to two levels of adaptive refinement.

In Figure 11, we show the shared-memory scalability for the oscillating lake. This scaling test was run on one GHz core Intel Xeon E5-v3 processor of SuperMUC Phase 2. We consider a regular base grid and allow a dynamic refinement criterion to add up to two additional levels of cells around the water droplet. In this area the FV limiter is active. As in the previous test case the scalability improves for higher polynomial degree .

6.4 General Relativistic Magneto-Hydrodynamics

Figure 12: (Left) The solution for pressure of the Michel accretion test at . (Right) Convergence of the ADER-DG scheme at . The dotted lines show the expected convergence rates.
#cells Error Order Error Order Error Order
2.56e-02 2.31e-02 3.23e-02
4.22e-03 1.65 2.27e-03 2.11 2.28e-05 6.45
4.66e-04 2.01 3.09e-05 3.91 6.24e-08 5.52
4.06e-05 2.21 4.47e-07 3.86 1.09e-11 7.88
Table 2: Convergence of the ADER-DG scheme for the Michel accretion test at .

To uncover and validate ExaHyPE’s high convergence order, we require a test case which does not contain discontinuities, as the limiter reduces the convergence order. Furthermore, the analytical solution has to be known. We simulate the spherical accretion onto a stationary black hole. This well-known test case has a known analytical solution michel:1972 . Details of the setup in the context of ADER-DG methods can also be found in fambri:2017 . This setup is challenging since GRMHD is a non-linear equation of variables containing both fluxes and non-trivial non-conservative products. Further complexity arises from the the closeness of the singularity at the critical radius to the computational domain.

We use the analytical solution as the external state vector for the Riemann solver at the boundary of the domain . The test is performed in Kerr-Schild coordinates on the spatial domain , away from the critical radius.

In Figure 12 we show the solution at time and the convergence of the error in terms of -norm for three different polynomial degrees and . Table 2 lists -errors and convergence order for each polynomial degree. This shows that our scheme converges to the expected order of .

For the next test, we move to a simulation for which the analytical solution is not known. In this test, a stable non-rotating and non-magnetised neutron star is simulated in three space dimensions by solving the GRMHD equations in the Cowling approximation, i.e. assuming a static background spacetime. The initial state has been obtained as a solution to the Tolman-Oppenheimer-Volkoff (TOV) equations. The corresponding fluid and metric variables are compatible with the Einstein field equations. We set the magnetic field to zero for TOV stars.

The TOV equations constitute a non-linear ODE system in the radial space coordinate, that has been solved numerically by means of a fourth order Runge-Kutta scheme on a very fine grid with step size . The parameters of the problem have been chosen to be: , adiabatic exponent and a constant atmospheric pressure .

The star sits at the origin of the domain . We apply reflection boundary conditions at the three simulation boundary surfaces , and . At the surfaces , and , we apply exact boundary conditions, evaluating the initial data at the boundary.

In Figure 13, we show the shared memory of scalability of the the non-linear hybrid ADER-DG - FV implementation of the GRMHD equations for the TOV star run on SuperMUC phase 2. We consider a regular base grid with cells and allow a dynamic refinement criterion to add one additional level of cells along a sphere around the TOV star.

Figure 13: Left: The TOV star setup showing two levels of adaptive mesh refinement along a sphere around the star. Right: Shared-memory scalability of the non-linear hybrid ADER-DG - FV implementation when running the TOV star example.

6.5 Compressible Navier-Stokes

The compressible Navier-Stokes equations (3) demonstrate that ExaHyPE is not only able to solve hyperbolic PDEs, but is also capable of handling viscous effects. We use the numerical flux of Gassner:2008:ViscousDG to integrate this into our ADER-DG framework.

To test the equations, we utilise the Arnold-Beltrami-Childress (ABC) flow

where the constant governs the Mach number and is the viscosity Tavelli:2016:StaggeredDG . This equation has an analytical solution in the incompressible limit. We impose this solution at the boundary. In Figure 14 we show a comparison between the analytical solution and a simulation with order on a regular mesh of cells. We see a good agreement with the analytical solution a time .

Figure 14: Left: Pressure of the ABC-flow, - slice. Right: Velocity slices of the ABC-flow. Markers indicate samples of our solution, line indicates analytical solution. Image reproduced from Krenz:19:ExaCloud

In addition, we evaluate our scenario for the colliding bubbles scenario of Muller:2010:Cumulus . The initial conditions of this scenario are obtained by an atmosphere with a background state that is in hydrostatic balance, i.e. where

Initially, the domain has the same potential temperature . This is then perturbed by

where is the Euclidean distance from the centre of the bubble and the spatial position . We have two bubbles, with constants

The simulation runs for and uses a viscosity of for regularisation. We use order and a mesh with up to two adaptive AMR levels. The results of this simulation are shown in Figure 15 (left). There is an excellent agreement with the reference solution of Muller:2010:Cumulus . For further details on the setup, we refer to Krenz:19:ExaCloud . In Figure 15 (right), we show the shared-memory scalability using the ABC-flow on SuperMUC phase 2. In this test case the shared memory scalability is very good for all tested polynomial orders.

Figure 15: Left: The colliding bubble scenario with two levels of dynamic adaptive mesh refinement. Image taken from Krenz:19:ExaCloud Right: Shared-memory scalability of the nonlinear ADER-DG implementation when running the ABC-flow example with viscous effects.

6.6 Hybrid Parallelisation Strategy

In the previous sections we showed the shared memory scalability of various test cases in two and three dimensions. In general we observed that the scalability improves as the work per element increases, i.e. it improves with higher polynomial degree , for increasing dimension, or PDE complexity. In this section we demonstrate the effectiveness of a hybrid parallelisation strategy using the GRMHD TOV star setup described in Section 6.4. We consider a regular base grid of cells and allow up to two levels of adaptive mesh refinement. As before all kernel level optimisations have been switched on.

This scaling test was run on up to nodes of SuperMUC Phase 2 consisting of two GHz core Intel Xeon E5-v3 processors each. The tests were run using both Intel’s TBB and MPI for parallelisation. As a reference value we use a test run on cores and calculate the corresponding parallel efficiency compared to this baseline. Table 3 contains the results of this strong scaling test. Here, efficiency measures the speedup obtained divided by the expected optimal speedup. The efficiency is over in some cases can be explained by the chosen baseline of cores.

regular
# cores (nodes) Threads time efficiency time efficiency time efficiency
56 (2) 14 91.93 1.00 139.43 1.00 395.90 1.00
112 (4) 7 37.68 1.22 57.03 1.29 151.11 1.31
224 (8) 4 30.88 0.77 34.17 0.89 139.42 0.93
448 (16) 2 17.67 0.43 26.66 0.65 57.02 0.85
784 (28) 1 13.88 0.32 22.89 0.44 34.17 0.67
Table 3: Hybrid scaling of the GRMHD application. A ADER-DG approximation plus Finite Volumes limiting is used. Up to two levels of adaptive mesh refinement are employed.

7 Conclusions and Future Work

This paper introduces a software engine that allows users to write higher-order ADER-DG codes for hyperbolic PDE systems with both conservative and non-conservative terms and a-posteriori limiters. The engine, ExaHyPE, has been designed to work on a wide range of computer systems from laptops to large high-performance compute clusters. Its core vision is to provide an engine rather than a framework: Technical details both on the computer science and numerics side are more or less completely hidden from the user. A specification file plus very few routines realising the PDE terms are typically the only customisation points modified by user codes. Users can focus on the physics. To achieve this high abstraction, writing codes in ExaHyPE requires the use of a certain set of numerical methods. The code thus clearly stands in the tradition of software packages such as Clawpack Clawpack:2017 , as opposed to more generic, general-purpose software packages such as AMReX, deal.II or DUNE.

There are three canonical directions of travel for future work. First, the sketched application areas have to make an impact in their respective domain. This comprises application-specific performance engineering and studies of real-world setups and data. It notably also comprises the coupling of engine applications with other code building blocks. Examples for first work into this direction are duru:curvilinear ; tavelli:dim . Second, we have to continue to investigate and to invest into the methods under the hood of ExaHyPE. Examples for such new ingredients on our agenda are accelerator support, local time stepping, and more dynamic load balancing and autotuning Schreiber:18:Invasic ; Charrier:17 . Finally, we plan extensions of the core paradigm of the engine. The development of ExaHyPE from scratch has been possible as we restricted ourself to ADER-DG only. In the future, we plan to elaborate to which degree we are able to add particle-based (Particle-in-Cell methods or simple tracers) algorithms or multigrid solvers (for elliptic subterms) on top of ExaHyPE. This will open new user communities to the engine. First feasibility studies for this Reps:15:Helmholtz ; Weinzierl:16:PIC ; Weinzierl:18:BoxMG already exist.

8 Acknowledgements and Funding

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671698, www.exahype.eu.

The authors gratefully acknowledge the support by the Leibniz Supercomputing Centre (www.lrz.de), which also provided the computing resources on SuperMUC (grant no. pr48ma) .

We would especially like to thank the many people who have made contributions to ExaHyPE, in particular our previous team members Benjamin Hazelwood, Angelika Schwarz, Vasco Varduhn and Olindo Zanotti.

References

References

  • (1) K. Duru, A.-A. Gabriel, H. Igel, A new discontinuous Galerkin spectral element method for elastic waves with physically motivated numerical fluxes, arXiv:1802.06380 (2017).
  • (2) M. Tavelli, M. Dumbser, D. E. Charrier, L. Rannabauer, T. Weinzierl, M. Bader, A simple diffuse interface approach on adaptive Cartesian grids for the linear elastic wave equations with complex topography, Journal of Computational Physics 386 (2019) 158–189.
  • (3) N. T. Bishop, L. Rezzolla, Extraction of gravitational waves in numerical relativity, Living Reviews in Relativity 19 (1) (2016) 2. doi:10.1007/s41114-016-0001-9.
  • (4) A. Tsokaros, B. C. Mundim, F. Galeazzi, L. Rezzolla, K. b. o. Uryū, Initial-data contribution to the error budget of gravitational waves from neutron-star binaries, Phys. Rev. D 94 (2016) 044049. doi:10.1103/PhysRevD.94.044049.
  • (5) M. Dumbser, F. Guercilena, S. Koeppel, L. Rezzolla, O. Zanotti, A strongly hyperbolic first-order CCZ4 formulation of the Einstein equations and its solution with discontinuous Galerkin schemes, ArXiv e-printsarXiv:1707.09910.
  • (6) O. Zanotti, M. Dumbser, Efficient conservative ADER schemes based on WENO reconstruction and space-time predictor in primitive variable, Computational Astrophysics and Cosmology 3.
  • (7) F. Fambri, M. Dumbser, O. Zanotti, Space-time adaptive ADER-DG schemes for dissipative flows: Compressible Navier-Stokes and resistive MHD equations, Computer Physics Communications 219.
  • (8) W. Reed, T. R. Hill, Triangular mesh methods for the neutron transport equation.
  • (9) B. Cockburn, C.-W. Shu, The Runge-Kutta local projection - discontinuous-Galerkin finite element method for scalar conservation laws, ESAIM: Mathematical Modelling and Numerical Analysis 25 (3) (1991) 337–361.
  • (10) B. Cockburn, C.-W. Shu, TVB Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws II: General framework, Mathematics of Computation 52 (186) (1989) 411–435.
  • (11) B. Cockburn, S.-Y. Lin, C.-W. Shu, TVB Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws III: One-dimensional systems, Journal of Computational Physics 84 (1) (1989) 90 – 113.
  • (12) B. Cockburn, S. Hou, C.-W. Shu, The Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws. IV. The multidimensional case, Mathematics of Computation 54 (1990) 545–581.
  • (13) B. Cockburn, C.-W. Shu, The Runge–Kutta discontinuous Galerkin method for conservation laws V: Multidimensional systems, Journal of Computational Physics 141 (2) (1998) 199 – 224.
  • (14) V. A. Titarev, E. F. Toro, ADER: Arbitrary High Order Godunov Approach, Journal of Scientific Computing 17 (1) (2002) 609–618. doi:10.1023/A:1015126814947.
  • (15) G. Gassner, M. Dumbser, F. Hindenlang, C.-D. Munz, Explicit one-step time discretizations for discontinuous Galerkin and finite volume schemes based on local predictors, Journal of Computational Physics 230 (11) (2011) 4232–4247. doi:10.1016/j.jcp.2010.10.024.
  • (16) M. Dumbser, C. Enaux, E. F. Toro, Finite volume schemes of very high order of accuracy for stiff hyperbolic balance laws, J. Comput. Phys. 227 (8) (2008) 3971–4001.
  • (17) R. Hartmann, P. Houston, Adaptive discontinuous Galerkin finite element methods for the compressible Euler equations, Journal of Computational Physics 183 (2) (2002) 508 – 532.
  • (18) P.-O. Persson, J. Peraire, Sub-cell shock capturing for discontinuous Galerkin methods, in: 44th AIAA Aerospace Sciences Meeting and Exhibit, p. 112.
  • (19) D. Radice, L. Rezzolla, Discontinuous Galerkin methods for general-relativistic hydrodynamics: Formulation and application to spherically symmetric spacetimes, Phys. Rev. D 84 (2011) 024010.
  • (20) J. Qiu, C. Shu, Runge–Kutta discontinuous Galerkin method using WENO limiters, SIAM Journal on Scientific Computing 26 (3) (2005) 907–929.
  • (21) J. Qiu, C.-W. Shu, Hermite WENO schemes and their application as limiters for runge-kutta discontinuous Galerkin method: one-dimensional case 193 (2003) 115–135.
  • (22) R. Loubère, M. Dumbser, S. Diot, A new family of high order unstructured MOOD and ADER finite volume schemes for multidimensional systems of hyperbolic conservation laws, Communications in Computational Physics 16 (3) (2014) 718–763. doi:10.4208/cicp.181113.140314a.
  • (23) S. Diot, R. Loubère, S. Clain, The MOOD method in the three-dimensional case: Very-high-order finite volume method for hyperbolic systems.
  • (24) S. Diot, S. Clain, R. Loubère, Multi-dimensional optimal order detection (MOOD) – a very high-order finite volume scheme for conservation laws on unstructured meshes.
  • (25) M. Dumbser, O. Zanotti, R. Loubère, S. Diot, A Posteriori Subcell Limiting of the Discontinuous Galerkin Finite Element Method for Hyperbolic Conservation Laws, J. Comput. Phys. 278 (C) (2013) 47–75.
  • (26) M. Dumbser, O. Zanotti, R. Loubère, S. Diot, A posteriori subcell limiting of the discontinuous Galerkin finite element method for hyperbolic conservation laws, Journal of Computational Physics 278 (2014) 47–75.
  • (27) T. Weinzierl, M. Mehl, Peano – A Traversal and Storage Scheme for Octree-Like Adaptive Cartesian Multiscale Grids, SIAM J. Sci. Comput. 33 (5) (2011) 2732–2760. doi:10.1137/100799071.
  • (28) T. Weinzierl, The Peano software—parallel, automaton-based, dynamically adaptive grid traversals, ACM Transactions on Mathematical SoftwareIn press.
  • (29) O. Zanotti, F. Fambri, M. Dumbser, A. Hidalgo, Space–time adaptive ADER discontinuous Galerkin finite element schemes with a posteriori sub-cell finite volume limiting, Computers & Fluids 118 (2015) 204–224.
  • (30) The jenkins project, https://jenkins.io/.
  • (31) K. Duru, L. Rannabauer, A.-A. Gabriel, On energy stable discontinuous Galerkin approximations for scattering problems in complex elastic media with adaptive curvilinear meshes, Computer Methods in Applied Mechanics and Engineering.
  • (32) L. Krenz, L. Rannabauer, M. Bader, A high-order discontinuous Galerkin solver with dynamic adaptive mesh refinement to simulate cloud formation processes (2019). arXiv:1905.05524.
  • (33) S. Köppel, Towards an exascale code for GRMHD on dynamical spacetimes, ArXiv e-printsarXiv:1711.08221.
  • (34) F. Fambri, M. Dumbser, S. Köppel, L. Rezzolla, O. Zanotti, ADER discontinuous Galerkin schemes for general-relativistic ideal magnetohydrodynamics, Monthly Notices of the Royal Astronomical Society 477 (4) (2018) 4543–4564. arXiv:/oup/backfile/content_public/journal/mnras/477/4/10.1093_mnras_sty734/1/sty734.pdf, doi:10.1093/mnras/sty734.
  • (35) L. Rezzolla, O. Zanotti, Relativistic Hydrodynamics, Oxford University Press, Oxford UK, 2013.
  • (36) R. Abgrall, On essentially non-oscillatory schemes on unstructured meshes: Analysis and implementation, Journal of Computational Physics 114 (1) (1994) 45 – 58.
  • (37) M. Dumbser, A. Hidalgo, O. Zanotti, High order space–time adaptive ADER-WENO finite volume schemes for non-conservative hyperbolic systems, Computer Methods in Applied Mechanics and Engineering 268 (2014) 359 – 387.
  • (38) B. van Leer, Towards the ultimate conservative difference scheme, Journal of Computational Physics 135 (2) (1997) 229 – 248. doi:https://doi.org/10.1006/jcph.1997.5704.
  • (39) O. Zanotti, F. Fambri, M. Dumbser, A. Hidalgo, Space–time adaptive ADER discontinuous Galerkin finite element schemes with a posteriori sub-cell finite volume limiting, Computers & Fluids 118 (2015) 204 – 224. doi:https://doi.org/10.1016/j.compfluid.2015.06.020.
  • (40) A. Heinecke, G. Henry, M. Hutchinson, H. Pabst, LIBXSMM: Accelerating small matrix multiplications by runtime code generation, in: SC16: International Conference for High Performance Computing, Networking, Storage and Analysis(SC), Vol. 00, 2016, pp. 981–991. doi:10.1109/SC.2016.83.
  • (41) L. Rannabauer, S. Haas, D. E. Charrier, T. Weinzierl, Michael, Simulation of tsunamis with the exascale hyperbolic PDE engine ExaHyPE, in: Environmental Informatics: Techniques and Trends. Adjunct Proceedings of the 32nd edition of the EnviroInfo., 2018.
  • (42) W. Eckhardt, T. Weinzierl, A blocking strategy on multicore architectures for dynamically adaptive PDE solvers, in: Parallel Processing and Applied Mathematics, 8th International Conference, PPAM 2009, Wroclaw, Poland, September 13-16, 2009. Revised Selected Papers, Part I, 2009, pp. 567–575. doi:10.1007/978-3-642-14390-8_59.
  • (43) D. Charrier, B. Hazelwood, T. Weinzierl, Enclave tasking and excessive grid regularisation for on discontinuous Galerkin methods on dynamically adaptive meshes(to be submitted).
  • (44) M. Dumbser, F. Fambri, M. Tavelli, M. Bader, T. Weinzierl, Efficient implementation of ADER Discontinuous Galerkin schemes for a scalable hyperbolic PDE engine, Axioms 7.
  • (45) R. J. LeVeque, Finite-Volume Methods for Hyperbolic Problems, Cambridge University Press, 2002.
  • (46) J. Reinders, Intel Threading Building Blocks, 1st Edition, O’Reilly & Associates, Inc., Sebastopol, CA, USA, 2007.
  • (47) D. E. Charrier, B. Hazelwood, E. Tutlyaeva, M. Bader, M. Dumbser, A. Kudryavtsev, A. Moskovsky, T. Weinzierl, Studies on the energy and deep memory behaviour of a cache-oblivious, task-based hyperbolic PDE solver, The International Journal of High Performance Computing Applications 0 (0) (0) 1094342019842645. doi:10.1177/1094342019842645.
  • (48) S. M. Day, C. R. Bradley, Memory-efficient simulation of anelastic wave propagation., Bulletin of the Seismological Society of America 3 520–531. doi:https://doi.org/10.1785/0120000103.
  • (49) M. Dumbser, D. S. Balsara, A new efficient formulation of the HLLEM Riemann solver for general conservative and non-conservative hyperbolic systems, J. Comput. Phys. 304 (C) (2016) 275–319. doi:10.1016/j.jcp.2015.10.014.
  • (50) L. Rannabauer, M. Dumbser, M. Bader, ADER-DG with a-posteriori finite-volume limiting to simulate tsunamis in a parallel adaptive mesh refinement framework, Computers & Fluidsdoi:https://doi.org/10.1016/j.compfluid.2018.01.031.
  • (51) F. C. Michel, Accretion of Matter by Condensed Objects 15 (1972) 153–160. doi:10.1007/BF00649949.
  • (52) G. Gassner, F. Lörcher, C.-D. Munz, A discontinuous Galerkin scheme based on a space-time expansion II. viscous flow equations in multi dimensions, Journal of Scientific Computing 34 (3) (2008) 260–286. doi:10.1007/s10915-007-9169-1.
  • (53) M. Tavelli, M. Dumbser, A staggered space–time discontinuous Galerkin method for the three-dimensional incompressible Navier–Stokes equations on unstructured tetrahedral meshes, Journal of Computational Physics 319 (2016) 294–323. doi:10.1016/j.jcp.2016.05.009.
  • (54) A. Müller, J. Behrens, F. X. Giraldo, V. Wirth, An adaptive discontinuous Galerkin method for modeling cumulus clouds, in: Fifth European Conference on Computational Fluid Dynamics, ECCOMAS CFD, 2010.
  • (55) Clawpack Development Team, Clawpack software, version 5.4.0 (2017). doi:10.5281/zenodo.262111.
    URL http://www.clawpack.org
  • (56) M. Schreiber, T. Weinzierl, A case study for a new invasive extension of Intel’s Threading Building Blocks, in: HiPEAC 2018—3rd COSH Workshop on Co-Scheduling of HPC Applications, 2018.
  • (57) D. E. Charrier, T. Weinzierl, An experience report on (auto-)tuning of mesh-based PDE solvers on shared memory systems, in: Parallel Processing and Applied Mathematics - 12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017, 2017, pp. 3–13. doi:10.1007/978-3-319-78054-2_1.
  • (58) B. Reps, T. Weinzierl, A complex additive geometric multigrid solver for the Helmholtz equations on spacetrees, ACM Transactions on Mathematical Software 44 (1) (2017) 2:1–2:36.
  • (59) T. Weinzierl, B. Verleye, P. Henri, D. Roose, Two particle-in-grid realisations on spacetrees, Parallel Computing 52 (2016) 42–64.
  • (60) M. Weinzierl, T. Weinzierl, Quasi-matrix-free hybrid multigrid on dynamically adaptive Cartesian grids, ACM Trans. Math. Softw.(in press).

Appendix A Dependencies and prerequisites

ExaHyPE requires the following prerequisites:

  • For sequential simulations, only a C++ compiler is required. The code uses only few C++14 features, but for many older versions enabling those features through --std=c++0x is required.

  • Python 3

  • ExaHyPE’s default build environment uses GNU Make.

Further, ExaHyPE has the following optional dependencies:

  • ExaHyPE user code can also be written in Fortran. In this case, a Fortran compiler is needed.

  • To exploit multi- or manycore computers, Intel’s TBB 2017 is required. It is open source and works with GCC and Intel compilers.

  • To run ExaHyPE on a distributed memory cluster, MPI is needed. ExaHyPE uses only very basic MPI routines (use e.g. MPI 1.3).

  • To use ExaHyPE’s optimised compute kernels, Intel’s libxsmm111Libxsmm is open source: https://github.com/hfp/libxsmm and Python’s module Jinja2222Jinja2 is open source: http://jinja.pocoo.org/ are required. A local installation script is made available.

Appendix B Obtaining ExaHyPE

ExaHyPE is free software is hosted at www.exahype.eu. There a two different options to obtain ExaHyPE, the first is to download a complete snapshot of ExaHyPE, in this case a snapshot of Peano is included. The second option is to clone the repository, in this case Peano has to be added manually. In order to access the repository, the maillist, the bug lists users are asked to to register at http://www.peano-framework.org/hpcsoftware/exahype/. The ExaHyPE guidebook contains documentation, detailed rationale and links to further resources.