1 Introduction
The study of waves has always been an important subject of research. It is not difficult to see why: Earthquakes and tsunamis have a direct and serious impact on the daily lives of millions of people. A better understanding of electromagnetic waves has enabled ever faster wireless communication. The study of gravitational waves has allowed new insight into the composition and history of our Universe. Those physical phenomena, despite arising in different fields of physics and engineering, can be modelled in a similar way from a mathematical perspective: as a system of hyperbolic partial differential equations (PDE). The consortium behind the ExaHyPE project (“An Exascale Hyperbolic PDE Engine”) translated this structural similarity into a software engine for modelling and simulating a wide range of hyperbolic PDE systems. The consortium focuses on two challenging scenarios, longrange seismic risk assessment, see e.g. Duru:2017 ; tavelli:dim ; and the search for gravitational waves emitted by binary neutron stars, see e.g Bishop:2016 ; Tsorkaros:2016 ; dumbser:2017 ; zanotti:2016 ; fambri:2017 .
ExaHyPE implements a highorder discontinuous Galerkin (DG) approach. DG schemes were first introduced by Reed et al. reed:1973 for the neutron transport equation and subsequently extended to general hyperbolic systems in a series of papers by Cockburn et al. shu:1 ; shu:2 ; shu:3 ; shu:4 ; shu:5 . Within our DG framework, higherorder accuracy in time and space is achieved using the arbitrary highorder accurate ADERDG approach first introduced by Toro and Titarev Titarev:2002 . In the original ADER approach, highorder accuracy in time is achieved by using the CauchyKowaleskaya procedure to replace time derivatives with spatial derivatives. This procedure is very efficient for linear problems Gassner:2011:ExplicitOneStep , but becomes cumbersome for nonlinear problems. Moreover, this procedure cannot deal with stiff source terms. To tackle these shortcomings, an alternative ADER approach was introduced by Dumbser et al. Dumbser:2008 . In this approach, the CauchyKowaleskaya procedure is replaced by an implicit solve of a celllocal spacetime weak formulation of the PDE. This removes the problem dependency of the approach and allows the handling of stiff source terms. ExaHyPE provides an implementation of both ADERDG variants.
In nonlinear hyperbolic PDE systems, discontinuities and steep gradients can arise even from smooth initial conditions. Highorder methods may then produce spurious oscillation which decrease the approximation quality and may even render the computed solution unphysical. To remedy this issue, a wide range of limiters for DG schemes have been proposed. Notably, these include approaches based on artificial viscosity hartmann:2002 ; persson:2006 , filtering rezzolla:2011 and WENO or HWENObased reconstruction shu:WENO ; shu:HWENO . The approach taken within the ExaHyPE engine is multidimensional optimalorderdetection (MOOD). This approach was initially applied to finite volume schemes loubere:1 ; loubere:2 ; loubere:3 and has recently been extended to DG schemes loubere:4 . In this approach, the solution is checked aposteriori for certain admissibility or plausibility criteria and is marked troubled if it does not meet them. Troubled cells are then recalculated with a more robust finite volume scheme. The MOOD approach bypasses limitations of apriori detection of troubled zones. We identify problematic regions after each time step and roll back to a more robust scheme on demand. This means that we do not need to reliably know all areas with issues apriori. This approach allows for good resolution of shocks and other discontinuities Dumbser:14:Posteriori .
In the developed engine, problems are discretised on treestructured fully adaptive Cartesian meshes provided by the Peano framework Weinzierl:11:Peano ; Weinzierl:2019 . Dynamical adaptive mesh refinement (AMR) enhances the shockcapturing abilities of the ADERDG scheme further Zanotti:2015:SpaceTimeAMR and allows good resolution of local features.
ExaHyPE comprises the following key features:

Highorder ADER discontinuous Galerkin (ADERDG) with aposteriori subcell limiting and finite volume (FV) schemes;

Dynamic mesh refinement on Cartesian grids in two and three dimensions;

A simple API that allows users to quickly realise complex applications;

Userprovided code can be written in Fortran or C++;

Automatically generated architecture and applicationspecific optimised ADERDG routines;

Shared memory parallelisation through Intel’s Threading Building Blocks (TBB);

Distributed memory parallelisation with MPI.
Furthermore, ExaHyPE offers a wide range of postprocessing and plotting facilities such as support for vtk. The software can be compiled with Intel and GNU compilers and provides a switch for choosing different compilation modes: Our release mode aggressively optimises the application, while the assertion and debug compilation modes activate the assertions within the code and print additional output. Users and developers can write log filters to filter output relevant to them. For continuous testing a Jenkins server jenkins verifies that the code compiles with all parallelisation features in all different compilation modes. The server is currently being extended to also run automatic convergence and scalability studies.
Our paper starts with an overview of the problem setting using several examples. We then briefly sketch the solution methods used in ExaHyPE, focusing on the ADERDG algorithm. Next we use a simple example to demonstrate the workflow when using ExaHyPE and describe the engine architecture. We conclude with a sequence of numerical examples from various application areas to demonstrate the capabilities of the engine.
2 Problem Formulation
We consider hyperbolic systems of balance laws that may contain both conservative and nonconservative terms. They have to be given in the following firstorder form:
(1) 
where
is the state vector of the
conserved variables, is the computational domain, is the material matrix,is the flux tensor, and
represents its nonconservative part. Finally, is the source term and are the given point sources.Hyperbolic systems in the form (1) can be used to model a wide range of applications that involve waves. In the following, we demonstrate the versatility of our formulation by introducing equations from three different application areas. Numerical experiments for these examples are provided in Section 6.
2.1 Waves in Elastic Media
Linear elasticity models velocities, stress and strain of a heterogeneous medium. They are driven by Hooke’s law and the conservation of momentum. Following the form of equation (1) we write them as
where denotes the mass density, the velocity and the stress tensor, which can be written in terms of its six independent components as . In this paper we consider isotropic materials, i.e. the material matrix depends only on the two Lamé constants and of the material. However, the formulation holds also for more general anisotropic materials.
These equations can be used to simulate seismic waves, such as earthquakes. In this context the restriction of ExaHyPE to Cartesian meshes seems to be restrictive. However, adaptive Cartesian meshes can be extended to allow the modelling of complex topography. Two methods have been realised in ExaHyPE to represent complex topographies. The first approach treats ExaHyPE’s adaptive Cartesian mesh as reference domain that is mapped to a complex topography via highorder curvilinear transformations duru:curvilinear . The second approach, a diffuse interface method, represents the topography as a smooth field tavelli:dim . These approaches are described in more detail in Section 6.2.
2.2 Shallow Water Equations
In atmospheric and oceanic modelling of coastal areas, horizontal length scales are typically significantly greater than the vertical length scale. In this case, fluid flow can be modelled with the twodimensional shallow water equations instead of the more complicated threedimensional NavierStokes equations.
Following the form of equation (1) they can be written as
(2) 
where denotes the height of the water column, the horizontal flow velocity, the gravity and the bathymetry.
Hyperbolic systems of balance laws have nontrivial equilibrium solutions in which flux and source terms cancel. A wellbalanced numerical scheme is capable of maintaining such an equilibrium state. In Section 3 we describe the ADERDG scheme used in ExaHyPE and in Section 6 we will describe how to keep the scheme well balanced while allowing wetting and drying.
2.3 Compressible NavierStokes Equations
ExaHyPE is extensible to nonhyperbolic equations. As an example of such an extension we show the compressible NavierStokes equations. They are used to model the dynamics of a viscous fluid, and are given by
(3) 
where denotes the density, the momentum, the energy density, the temperature and the pressure (this term includes gravitational effects). The temperature diffusion is given by with constant , these effects depend on the gradient of . In the source term, the vector is the unit vector in direction and is the gravitation of Earth. The viscous effects are modelled by the stress tensor . Thus, the flux from (1) has been modified to allow as input. More details on the implementation of these equations can be found in Krenz:19:ExaCloud .
2.4 General Relativistic MagnetoHydrodynamics
The equations of classical magnetohydrodynamics (MHD) are used to model the dynamics of an electrically ideally conducting fluid with comparable hydrodynamic and electromagnetic forces. When modelling astrophysical objects with strong gravitational fields, e.g. neutron stars, it becomes necessary to model the background spacetime as well. We use the standard split to decompose the four dimensional spacetime manifold into 3D hypersurfaces parameterised by a time coordinate . The background spacetime is introduced into the equations in the form of a nonconservative product.
The curved spacetime is parameterised by several hypersurface variables: lapse , spatial metric tensor shift vector and extrinsic curvature . The spatial metric tensor is given as a vector of its six independent components and has the determinant . Further, is the conserved density, which is related to the rest mass density by the Lorentz factor , is the fluid velocity, is the Maxwell 3energy momentum tensor, is the conserved momentum, is the magnetic field and is the conserved energy density. Finally, is an artificial scalar introduced to ensure a divergencefree magnetic field, and is the characteristic velocity of the divergence cleaning. For more details on this formulation see e.g. koeppel:grmhd ; fambri:grmhd ; rezzolla:2013 .
3 Solver Components
This section briefly summarises the numerical algorithms used in ExaHyPE. For brevity, let us use a pared down form of (1), in which only the flux term is nontrivial:
(4) 
Assume that (4) is subject to appropriate initial and boundary conditions:
3.1 Discretisation
ExaHyPE allows for Cartesian grids in two and three dimensions. The computational domain is divided into a grid using a spacetree construction scheme based on tripartition Weinzierl:11:Peano . Each cell is recursively refined giving an adaptive Cartesian grid as shown in Figure 1. Meshes can be defined with an arbitrary numbers of elements in each direction on the coarsest level. Refinement is based on tripartitioning.
3.2 Finite Volumes
Classically systems of hyperbolic PDE have been solved using finite difference or finite volume schemes. In a finite volume method cell averages are calculated. Volume integrals in a PDE that contain a divergence term are converted to surface integrals using the divergence theorem. These terms are then evaluated as fluxes at the surfaces of each element. However, to achieve high order accuracy in a finite volume scheme, large stencils and expensive recovery or reconstruction procedures are needed. Examples include essentially nonoscillatory (ENO) or weighted ENO (WENO) schemes, see e.g. abgrall:1994 ; dumbser:weno .
ExaHyPE provides access to classical Godunov type schemes as well as MUSCL schemes leer:1997 . Further, users can use parts of the generic compute routines while implementing other parts on their own. For instance, in both the ADERDG and Finite Volume schemes, users can overwrite the Riemann Solver with their own implementation. The Finite volume scheme in ExaHyPE always works on uniform grids of a specific size.
3.3 AderDg
In a DG method the numerical solution for the state vector is represented within each mesh element by basis functions from the space of polynomials. The method operates on a weak form of the equation (4). We use the ADERDG method as proposed by Dumbser et al. in Zanotti:2015 . The ADERDG method, which was introduced by Toro and Titarev in 2002 Titarev:2002 , reaches high order convergence in time and space element locally. First the integration in time is performed only within the element, neglecting the element interfaces and finally a single correction step is performed to take the element interfaces into account.
At the start of timestep the numerical solution of (4) is represented by a piecewise polynomial of degree . The basis of polynomials is constructed using tensor products of Lagrange polynomials over GaussLegendre or GaussLobatto points. The solution can thus be represented by coefficients on each cell , where is the number of quantities.
Let us derive the weak form of (4). To do so we multiply with a spacetime test function from the space of piecewise polynomials as above for the solution . Integrating over each element and over the current time interval then gives the elementlocal weak formulation:
(5) 
There are three phases per ADERDG time step:

Per grid cell and time interval , we first implicitly solve (5) using Picard iterations Dumbser:14:Posteriori . The concurrent solves of (5) do not take into account any information from neighbouring elements and thus yield jumps along the cell faces in and .

The second phase traverses all faces of the grid and computes a numerical normal flux from both adjacent cells. ExaHyPE uses a Rusanov flux by default; users can replace it with any other Riemann solver.

In the third algorithmic phase, we traverse the cells again and solve
(6) for , see Dumbser:2008 . The time step (6) is derived from spatially testing and partially integrating (4). Equation 6 can be easily inverted given that the ansatz and test space typically yield a diagonal mass matrix.
3.4 TimeStep Restrictions
Nonlinear effects and mesh adaptation require adjustments to the time step size during a simulation. This is expressed by the CFL condition, which gives an upper bound on the stable time step size for explicit DG schemes:
(7) 
where and
are the mesh size and the maximum eigenvalue, respectively, and
is a stability factor that depends on the polynomial order Dumbser:2008 .3.5 APosteriori Limiting
The unlimited ADERDG algorithm will suffer from numerical oscillations (Gibbs phenomenon) in the presence of steep gradients or shock waves. Therefore a limiter must be applied. The approach followed in ExaHyPE is based on the aposteriori MOOD method of Loubére et al. loubere:4 . The solution is checked aposteriori for certain admissibility or plausibility criteria and is recalculated with a robust FV scheme if it does not meet them. The FV patch size is chosen to have an order of , this is the smallest cell size that does not violate the CFL condition (7).
In contrast to the original approach, ExaHyPE’s approach incorporates the observation that cells usually require a recalculation with FV multiple time steps in a row after the initial check failed. Therefore, ExaHyPE implements the aposteriori limiting ADERDG method as a hybrid ADERDGFV method (Figure 2).
As aposteriori detection criteria we use

Physical Admissibility Criteria: Depending on the system of PDEs being studied certain physical constraints can be placed on the solution. For the shallow water equations these are positivity of the water height. These criteria are supplied by the user along with the PDE terms.

Numerical Admissibility Criteria: To identify shocks we use a relaxed discrete maximum principle (DMP) in the sense of polynomials, for details see e.g. loubere:4 .
We call DG cells that do not satisfy the above criteria troubled cells. If a cell is flagged as troubled and has not been troubled in the previous time step, the scheme goes back to the old time step and recomputes the solution in all troubled cells (and their direct neighbours) with the FV scheme.
4 Engine Architecture and Programming Workflow
This section briefly walks through the workflow of setting up an application in ExaHyPE using the shallow water equations given in (2) as an example. The architecture of ExaHyPE is illustrated in Figure 3. ExaHyPE is a solver engine, domainspecific code has to be written by the user to obtain simulation code. In Figure 3 a turquoise colour is used to highlight files written by the user. To write an ExaHyPE application users typically start from a specification file. The specification file is passed to the ExaHyPE toolkit, which creates glue code, empty applicationspecific classes and optionally application and architecture tailored core routines. The application specific classes are filled by the user with the PDE terms. This code can be written in C++ or Fortran. The generated glue code and the initially empty templates make up the ExaHyPE user solver.
The ExaHyPE core is realised upon the Peano framework (green), which builds dynamically adaptive Cartesian meshes. In addition, it provides an efficient mesh traversal loop that ExaHyPE’s algorithms plug into. Peano itself is a thirdparty component.
The number of dependencies in the ExaHyPE core is minimal. However, the architecture may not fulfil the requirements of all applications, so the user can extend and connect further software fragments to the ExaHyPE core. In red we show an optional dependency, libxsmm Heinecke:16:LIBXSMM . This package provides efficient kernels for small matrix multiplications, which are used by the optimised ADERDG routines described in Section 5.2. Similarly, further software packages can be added by the user as needed.
In the following section, we will provide a brief summary of the components of the user solver, highlighting the parts of the specification file that need to be modified for each component. The solver components can be set up flexibly by modifying the specification file. Users only need to write applicationspecific code that sets up their PDE system.
4.1 Code Generation and Compilation
After the specification file has been written it is handed over to the ExaHyPE toolkit. A minimal ExaHyPE specification file is shown below.
exahypeproject SWE outputdirectory const = ./SWE computationaldomain dimension const = 2 width = 1.0, 1.0 offset = 0.0, 0.0 endtime = 1.0 end computationaldomain solver ADERDG MySWESolver variables const = h:1,hu:1,hv:1,b:1 order const = 3 maximummeshsize = 0.1 timestepping = global type const = nonlinear terms const = flux,ncp optimisation const = generic, usestack end solver end exahypeproject
To prepare this example for the simulation, run
> ./Toolkit/toolkit.sh Demonstrators/SWE.exahype
The toolkit generates a Makefile, glue code, as well as various helper files. Among them is one C++ class per solver that was specified in the specification file. Within each implementation file, the user can specify
initial conditions, mesh refinement control, etc. For example, to set the eigenvalues the following function is generated in the file MySWESolver.cpp
.
void SWE::MySWESolver::eigenvalues(const double* const Q, const int direction, double* const lambda) { // @todo Please implement/augment if required lambda[0] = 1.0; lambda[1] = 1.0; lambda[2] = 1.0; lambda[3] = 1.0; }
This function can then be filled with the chosen initial conditions. Similarly, the other functions flux, eigenvalues and boundary conditions can be defined in the file. For the shallow water equations this would be:
void SWE::MySWESolver::eigenvalues(const double* const Q, const int direction, double* const lambda) { ReadOnlyVariables vars(Q); Variables eigs(lambda); const double c = std::sqrt(gravity*vars.h()); double u_n = Q[direction + 1] * 1.0/vars.h(); eigs.h() = u_n + c; eigs.hu() = u_n  c; eigs.hv() = u_n; eigs.b() = 0.0; }
In this example we have used named variables to enhance readability. However, ExaHyPE also allows the user to access the vectors directly as can be seen in the generated function above.
The whole build environment is generated. A simple make will create the ExaHyPE executable. ExaHyPE’s specification files always act as both specification and configuration file, i.e when running the code the specification file is passed in again.
> ./ExaHyPESWE ./SWE.exahype
A successful run yields a sequence of vtk files that you can open with Paraview or VisIt. In this example, we plot two quantities. Such an output is shown in Figure 4.
Global metrics such as integrals can also be realised using a plotter with no output variables, i.e. variable const = 0.
5 Parallelisation and Optimisation Features
ExaHyPE relies on the Peano framework for mesh generation. Peano realises cacheefficient, treestructured, adaptive mesh refinement. To traverse the cells or vertices the mesh traversal automaton of Peano runs along the Peano spacefillingcurve (SFC), see Figure 1. The action of a PDE operator is mapped to the SFC traversal by plugging into events triggered by the traversal automaton, these mappings can be generated by a toolkit.
Peano also takes care of the distributedmemory and sharedmemory parallelisation. Domain decomposition for distributedmemory parallel simulations is realised by forking off or merging subtrees. The domain decomposition can be influenced via load balancing callbacks. The sharedmemory parallelisation relies on identifying regular substructures in the tree and employing parallelfor loops in these areas Weinzierl:RunLengthEncoding . Recently, we introduced a runtime tasking system to Peano and ExaHyPE that introduces additional multithreading concurrency Charrier:18:Enclave in highly adaptive mesh regions and allows overlapping computations with MPI communication.
ExaHyPE builds on the Peano framework and it inherits a user model from Peano: Our engine removes the responsibility for algorithmic issues from the user. The time step, the mesh traversal and the parallelisation are implemented in a generic way. The user only controls the PDE system being solved, by specifying how many quantities are needed, which terms are required, and ultimately what all terms look like. Our goal is to allow users to focus on the physics only and to hide away as many implementation details as possible.
5.1 HighLevel Optimisations
ExaHyPE realises few high level algorithmic optimisations that users can switch on and off at code startup through the specification file. To gain access to these optimisations, we add the following optional section to the minimal specification file shown in Section 4.
globaloptimisation fusealgorithmicsteps = all spawnpredictorasbackgroundthread = on spawnamrbackgroundthreads = on end globaloptimisation
A discussion of each individual algorithmic tuning is beyond scope of this paper. Some techniques are:

Step fusion: Each time step of an ExaHyPE solver consists of three phases: computation of local ADERDG predictor (and projection of the prediction onto the cell faces), solve of the Riemann problems at the cell faces, and update of the predicted solution. We may speed up the code if we fuse these four steps into one grid traversal.

Spawn background jobs: TBB has a thread pool in which idle threads wait for tasks, as soon as new tasks are spawned the threads in this pool start this task. We refer to these threads as background threads. Since spawning and scheduling these tasks has a overhead we wait until a certain number of tasks has accumulated and schedule these tasks together. Certain spacetimepredictor computations can be spawned as a background job. Costly AMR operations such as the imposition of initial conditions and evaluation of refinement criteria can also be performed as a background jobs.

Modify storage precision: ExaHyPE internally can store data in lessthanIEEE double precision. This can reduce the memory footprint of the code significantly.
Most users will not modify these options while they prototype. When they start production runs, they can tweak the engine instantiation through them. The rationale behind exposing these control values is simple: It is not clear at the moment which option combinations robustly improve performance. An autotuning/machine learning approach to find the optimal parameter combinations could be considered..
ExaHyPE allows running multiple simulations on the same computational mesh. This can be facilitated by adding multiple solver descriptions to the specification file. Each solver uses its own base grid and refinement criterion.
5.2 Optimised ADERDG Routines
One of ExaHyPE’s key ideas is to use tailored, extremely optimised code routines whenever it comes to the evaluation of the ADERDG scheme’s steps and other computationally expensive routines. AMR projections and the spacetime predictor (5) are typically the dominant steps. If the user decides to use these optimised variants in the specification file, the toolkit calls a specific code generator Python3 module and links to its output in the generated glue code so that the calls to the generic routines are replaced by calls to the generated optimised ones.
SIMD operations, notably introduced with AVX512 on KNL and Skylake, become increasingly critical to fully exploit the potential of modern CPUs. Therefore, on Intel machines, the optimised routines’ main goal is to either directly use SIMD or enable as much autovectorisation from the compiler as possible. To that end the optimised routines use Intel’s libxsmm Heinecke:16:LIBXSMM
, which is the second thirdparty building block in the basic ExaHyPE architecture. We map all tensor operations required by the ADERDG algorithm to general matrix multiplications (gemms) and apply architecturespecific vectorised matrix operations. Furthermore, the code generation allows the introduction of data padding and alignment to get the most out of the compiler autovectorisation.
While improving SIMD vectorisation is our current prime use case for the optimised kernels, we provide alternative optimisation dimensions: Instead of a vectorisation of the native ADERDG loops, the optimised routines can also be configured to work with vectorised PDE formulations. This implies that the PDE terms are computed through SIMD operations instead of scalar ones and requires some work from the user. This has been found to greatly improve performance Dumbser:2018 .
Instead of the standard Picard iteration, we provide an alternative realisation which uses an explicit Finite Volume scheme to yield an initial guess to the Picard solve, leading to a much lower number of required iterations.
6 Numerical Results
In this section, we show the ExaHyPE engine in action. The applications in this section are taken from our introductory discussion in Section 2. Only the Euler equations are added as an alltime classic starting point for solvers of hyperbolic systems. All specification and source files used to generate the results in this section are made publicly available, and all of the tests themselves can be run on a standard laptop. The scaling tests shown require the use of a larger cluster.
6.1 Euler Equations
The nonlinear compressible Euler equations model the flow of an inviscid fluid with constant density. Solutions of the Euler equations are sometimes used as approximations to real fluids problem, e.g. the lift of a thin airfoil. They are given by
(8) 
where denotes the mass density, denotes the momentum density, denotes the energy density, denotes the fluid pressure, to be given by an equation of state, and denotes the outer product.
The sod shock tube problem is a wellknown benchmark for the accuracy of hyperbolic system solvers book:leveque . Initially, the domain contains gases at two different densities at rest, separated at . This could be two different gases, or simply one gas at different temperatures. More precisely,
The time evolution consists of a shock moving into the gas at lower pressure and a rarefaction wave expanding into the gas at higher pressure. A contact discontinuity moves at the interface between the two gases.
This very simple test, which is a special case of the Riemann problem, nevertheless allows us to test several features of the code at once. We test the ability of the code to capture and resolve shocks and contact discontinuities, as well as to accurately capture the density profile of the rarefaction wave. Due to Gibbs phenomenon the shock wave cannot be accurately captured by the high order DG solution. However, the FV limiter can indeed deal with the shock.
In Figure 5, we show the density at time . We see that with only elements in each direction the solution with polynomial order is already very accurate. Further, we note that the adaptive mesh refinement around the areas of discontinuity allows us to accurately capture the solution even with low polynomial order.
To demonstrate the sharedmemory scalability of the code we use a variant of the Sod shock tube problem, the circular, spherical explosion problem, also referred to as a “multidimensional Sod shock tube”. This test case uses the following initial conditions
and wall boundary conditions.
In Figure 6 we show the sharedmemory scalability in two and three dimensions. This scaling test was run on the cores of an Intel Xeon E5v3 processor with GHz core frequency (SuperMUC Phase 2). The tests were run on one processor with 14 cores using Intel’s TBB Reinders:2007 for parallelisation. We consider a regular base grid and allow a dynamic refinement criterion to add up to two additional levels of cells around the shock front. In these tests all kernel level optimisations that are available in ExaHyPE have been used. As a result, scalability is more challenging, however, we are interested mainly in reducing the overall runtime.
The scalability improves as the work per element increases, i.e. it improves with higher polynomial degree or for increasing dimension. In general good scalability can be observed on up to 14 cores at higher orders. All ExaHyPE codes employ a hybrid parallelisation strategy with at least two MPI ranks per node CharrierUndEkaterinasPaper2019 . Results for this hybrid parallelisation strategy are provided in Section 6.6.
6.2 Elastic Wave Equation
The restriction of ExaHyPE to Cartesian meshes seems restrictive in the context of most seismic applications. In this section, we introduce two methods for incorporating complex geometries into such a mesh.
6.2.1 Diffuse Interface Approach
In our diffuse interface approach the geometry is represented using a volume fraction of the solid medium present in a control volume tavelli:dim . Diffuse interfaces completely avoid the problem of mesh generation, since all that is needed for the definition of the complex surface topography is to set a scalar colour function to unity inside the regions covered by the solid and to zero outside.
The diffuse interface model is given by:
where , , , and the stress tensor are defined as in the Section 2.1.
The material parameters are assumed to remain constant, i.e.
Physically, represents the volume fraction of the solid medium present in a control volume. The equations become nonlinear in those regions in which .
This formulation can be extended to allowing moving materials. Instead of solving , we can solve
In this way the free surface boundary is allowed to move according the local velocity field tavelli:dim .
To verify the accuracy of this method we solve the layer over homogeneous halfspace (LOH.1) benchmark problem described by Day et al. Day:LOH1 . This problem is a wellknown reference benchmark for seismic wave propagation in numerical codes. The LOH.1 benchmark considers way propagation in a hexahedral geometry filled with two materials that are stacked on top of each other. The first material is characterised by a lower density and smaller seismic wave speeds. The exact material parameters are defined in Table 1. The parameters and of the equation can be derived from these values.
A point source is placed km below the surface at the centre of the domain, such that the resulting wave propagates through the change of material.
In the diffuse interface method we limit the free surface to be able to resolve the local discontinuity of . The left picture in Figure 7 shows the resulting distribution of limited (red) and unlimited areas (blue) and indicates the transition of both (light blue and red). The right picture shows the velocity field in x direction at . Our implementation of the diffuse interface method is able to successfully resolve both the changing material parameters and the absorbing surface boundary conditions.
6.2.2 Curvilinear meshes
Our second approach models complex topographies by applying a curvilinear transformation to the elements of an adaptive Cartesian mesh. Taking this mapping into account, the linear elastic wave equations can be written as:
where is the Jacobian matrix of the mapping from mesh to topography. This approach allows us to model meshes with complex topographies including faults and inner branches duru:curvilinear .
In Figure 8 shows a snapshot of an experiment that simulates propagation of seismic waves in the area around the mountain Zugspitze, in Germany. Both our curvilinear approach and the diffuse interface method are able to resolve the complicated topographies in this benchmark scenario.
Figure 9 shows the sharedmemory scalability of the linear ADERDG implementation on SuperMUC’s Phase 2 when running the LOH.1 benchmark comparing the scalability of the two meshing approaches. We consider a regular base grid with cells and allow a dynamic refinement criterion to add one additional level of cells to the layer over the halfspace. Memory constraints prevented further refinement. As in the nonlinear test case the scalability improves as the work per element increases. However, due to the lower amount of total work in the linear setting a higher polynomial degree needs to be reached to attain scalability in this case. This means that the overall scalability of the diffuse interface approach is higher, however, this comes at the price of a higher overall computational cost due to the nonlinearity of the formulation.
The question of which method should be used is highly problem dependent. The curvilinear method is purely linear and as such computationally cheap. The nonlinear diffuse interface approach, on the other hand, requires additional limiting on the surface. However, the time step size for the DIM is independent of the simulated topography, while in the curvilinear method it is highly dependent on the distortion introduced by the topography. Simulations with a relatively smooth, flat surface are expected to be faster with the curvilinear method, while highly varying topographies are better discretised with the DIM. DIM has the added advantage of allowing moving free surface boundaries, a feature that is difficult to realise with curvilinear meshes.
6.3 Shallow Water Equations
We demonstrate the dynamic mesh refinement capabilities of ExaHyPE via the shallow water equations (2). Even more importantly, we present a nontoy problem which uses ExaHyPE’s 2D facilities. To solve (2), we use the aposteriori limiting ADERDG method and equip its FV limiter with a recently developed HLLEM Riemann solver Dumbser:2016:HLLEM . The solver allows for wetting and drying. This also demonstrates that users can inject their own Riemann solvers in ExaHyPE.
The same scenario is available in ExaHyPE using a Finite Volume scheme instead of an ADERDG scheme with FV limiter rannabauer:swe:2018 ; Rannabauer:2018 . In both implementations the same Riemann solver can be used. The Riemann solver considers a dry tolerance below which cells are marked as dry. The constant is necessary to avoid negative water heights, which usually lead to unphysical nonlinear effects. In the case of two adjacent wet cells the solver reduces to the well known Rusanov flux. If one of the cells is flooded the jump in bathymetry is taken into account by the Riemann solver. When reconstructing the DG solution from the FV limiter the water level is reconstructed first and then the bathymetry is subtracted. This ensures that the nonlinear reconstruction process does not produce artificial waves. In tsunami simulations, this scheme allows us to simulate areas close to the coast with the FV subcell limiter, while areas in the deep ocean are processed by the high order ADERDG method.
To test the shallow water equations given in (2) we use the oscillating lake scenario. Here, a water droplet travels in circular motion over a dry basin. The analytical solution, which is used as an initial condition is given by
where and denotes the gravitational constant. The initial conditions are supplemented by wall boundary conditions. In this setup we use polynomial order and a base grid of elements. We adaptively refine mesh elements where the water height is small but not zero. We allow up to two levels of adaptive refinement.
This scenario provides a challenging test case for numerical codes due to the continuous wetting and drying of cells. There exists an analytical solution for this benchmark scenario allowing us to verify the wellbalancedness of a scheme, as well as in the resolution of drying or inundated cells.
In the top row of Figure 10, we plot the water height at . Despite the frequent wetting and drying the scheme performs well. In those areas in which the DG solution is used, we note the accuracy of the higher order scheme on a coarse grid. In those areas in which a FV solution is necessary, the dynamic AMR is useful to retain accuracy and avoid diffusion. The bottom row of Figure 10 shows the locations in which the FV limiter is active and adaptive mesh refinement is used. Furthermore, it shows the error in the water height. We observe that the FV limiter is used only in those cells in which it is required, the limiter locations move with the solution. The error remains below in all cells.
In Figure 11, we show the sharedmemory scalability for the oscillating lake. This scaling test was run on one GHz core Intel Xeon E5v3 processor of SuperMUC Phase 2. We consider a regular base grid and allow a dynamic refinement criterion to add up to two additional levels of cells around the water droplet. In this area the FV limiter is active. As in the previous test case the scalability improves for higher polynomial degree .
6.4 General Relativistic MagnetoHydrodynamics
#cells  Error  Order  Error  Order  Error  Order 

2.56e02  2.31e02  3.23e02  
4.22e03  1.65  2.27e03  2.11  2.28e05  6.45  
4.66e04  2.01  3.09e05  3.91  6.24e08  5.52  
4.06e05  2.21  4.47e07  3.86  1.09e11  7.88 
To uncover and validate ExaHyPE’s high convergence order, we require a test case which does not contain discontinuities, as the limiter reduces the convergence order. Furthermore, the analytical solution has to be known. We simulate the spherical accretion onto a stationary black hole. This wellknown test case has a known analytical solution michel:1972 . Details of the setup in the context of ADERDG methods can also be found in fambri:2017 . This setup is challenging since GRMHD is a nonlinear equation of variables containing both fluxes and nontrivial nonconservative products. Further complexity arises from the the closeness of the singularity at the critical radius to the computational domain.
We use the analytical solution as the external state vector for the Riemann solver at the boundary of the domain . The test is performed in KerrSchild coordinates on the spatial domain , away from the critical radius.
In Figure 12 we show the solution at time and the convergence of the error in terms of norm for three different polynomial degrees and . Table 2 lists errors and convergence order for each polynomial degree. This shows that our scheme converges to the expected order of .
For the next test, we move to a simulation for which the analytical solution is not known. In this test, a stable nonrotating and nonmagnetised neutron star is simulated in three space dimensions by solving the GRMHD equations in the Cowling approximation, i.e. assuming a static background spacetime. The initial state has been obtained as a solution to the TolmanOppenheimerVolkoff (TOV) equations. The corresponding fluid and metric variables are compatible with the Einstein field equations. We set the magnetic field to zero for TOV stars.
The TOV equations constitute a nonlinear ODE system in the radial space coordinate, that has been solved numerically by means of a fourth order RungeKutta scheme on a very fine grid with step size . The parameters of the problem have been chosen to be: , adiabatic exponent and a constant atmospheric pressure .
The star sits at the origin of the domain . We apply reflection boundary conditions at the three simulation boundary surfaces , and . At the surfaces , and , we apply exact boundary conditions, evaluating the initial data at the boundary.
In Figure 13, we show the shared memory of scalability of the the nonlinear hybrid ADERDG  FV implementation of the GRMHD equations for the TOV star run on SuperMUC phase 2. We consider a regular base grid with cells and allow a dynamic refinement criterion to add one additional level of cells along a sphere around the TOV star.
6.5 Compressible NavierStokes
The compressible NavierStokes equations (3) demonstrate that ExaHyPE is not only able to solve hyperbolic PDEs, but is also capable of handling viscous effects. We use the numerical flux of Gassner:2008:ViscousDG to integrate this into our ADERDG framework.
To test the equations, we utilise the ArnoldBeltramiChildress (ABC) flow
where the constant governs the Mach number and is the viscosity Tavelli:2016:StaggeredDG . This equation has an analytical solution in the incompressible limit. We impose this solution at the boundary. In Figure 14 we show a comparison between the analytical solution and a simulation with order on a regular mesh of cells. We see a good agreement with the analytical solution a time .
In addition, we evaluate our scenario for the colliding bubbles scenario of Muller:2010:Cumulus . The initial conditions of this scenario are obtained by an atmosphere with a background state that is in hydrostatic balance, i.e. where
Initially, the domain has the same potential temperature . This is then perturbed by
where is the Euclidean distance from the centre of the bubble and the spatial position . We have two bubbles, with constants
The simulation runs for and uses a viscosity of for regularisation. We use order and a mesh with up to two adaptive AMR levels. The results of this simulation are shown in Figure 15 (left). There is an excellent agreement with the reference solution of Muller:2010:Cumulus . For further details on the setup, we refer to Krenz:19:ExaCloud . In Figure 15 (right), we show the sharedmemory scalability using the ABCflow on SuperMUC phase 2. In this test case the shared memory scalability is very good for all tested polynomial orders.
6.6 Hybrid Parallelisation Strategy
In the previous sections we showed the shared memory scalability of various test cases in two and three dimensions. In general we observed that the scalability improves as the work per element increases, i.e. it improves with higher polynomial degree , for increasing dimension, or PDE complexity. In this section we demonstrate the effectiveness of a hybrid parallelisation strategy using the GRMHD TOV star setup described in Section 6.4. We consider a regular base grid of cells and allow up to two levels of adaptive mesh refinement. As before all kernel level optimisations have been switched on.
This scaling test was run on up to nodes of SuperMUC Phase 2 consisting of two GHz core Intel Xeon E5v3 processors each. The tests were run using both Intel’s TBB and MPI for parallelisation. As a reference value we use a test run on cores and calculate the corresponding parallel efficiency compared to this baseline. Table 3 contains the results of this strong scaling test. Here, efficiency measures the speedup obtained divided by the expected optimal speedup. The efficiency is over in some cases can be explained by the chosen baseline of cores.
regular  

# cores (nodes)  Threads  time  efficiency  time  efficiency  time  efficiency 
56 (2)  14  91.93  1.00  139.43  1.00  395.90  1.00 
112 (4)  7  37.68  1.22  57.03  1.29  151.11  1.31 
224 (8)  4  30.88  0.77  34.17  0.89  139.42  0.93 
448 (16)  2  17.67  0.43  26.66  0.65  57.02  0.85 
784 (28)  1  13.88  0.32  22.89  0.44  34.17  0.67 
7 Conclusions and Future Work
This paper introduces a software engine that allows users to write higherorder ADERDG codes for hyperbolic PDE systems with both conservative and nonconservative terms and aposteriori limiters. The engine, ExaHyPE, has been designed to work on a wide range of computer systems from laptops to large highperformance compute clusters. Its core vision is to provide an engine rather than a framework: Technical details both on the computer science and numerics side are more or less completely hidden from the user. A specification file plus very few routines realising the PDE terms are typically the only customisation points modified by user codes. Users can focus on the physics. To achieve this high abstraction, writing codes in ExaHyPE requires the use of a certain set of numerical methods. The code thus clearly stands in the tradition of software packages such as Clawpack Clawpack:2017 , as opposed to more generic, generalpurpose software packages such as AMReX, deal.II or DUNE.
There are three canonical directions of travel for future work. First, the sketched application areas have to make an impact in their respective domain. This comprises applicationspecific performance engineering and studies of realworld setups and data. It notably also comprises the coupling of engine applications with other code building blocks. Examples for first work into this direction are duru:curvilinear ; tavelli:dim . Second, we have to continue to investigate and to invest into the methods under the hood of ExaHyPE. Examples for such new ingredients on our agenda are accelerator support, local time stepping, and more dynamic load balancing and autotuning Schreiber:18:Invasic ; Charrier:17 . Finally, we plan extensions of the core paradigm of the engine. The development of ExaHyPE from scratch has been possible as we restricted ourself to ADERDG only. In the future, we plan to elaborate to which degree we are able to add particlebased (ParticleinCell methods or simple tracers) algorithms or multigrid solvers (for elliptic subterms) on top of ExaHyPE. This will open new user communities to the engine. First feasibility studies for this Reps:15:Helmholtz ; Weinzierl:16:PIC ; Weinzierl:18:BoxMG already exist.
8 Acknowledgements and Funding
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671698, www.exahype.eu.
The authors gratefully acknowledge the support by the Leibniz Supercomputing Centre (www.lrz.de), which also provided the computing resources on SuperMUC (grant no. pr48ma) .
We would especially like to thank the many people who have made contributions to ExaHyPE, in particular our previous team members Benjamin Hazelwood, Angelika Schwarz, Vasco Varduhn and Olindo Zanotti.
References
References
 (1) K. Duru, A.A. Gabriel, H. Igel, A new discontinuous Galerkin spectral element method for elastic waves with physically motivated numerical fluxes, arXiv:1802.06380 (2017).
 (2) M. Tavelli, M. Dumbser, D. E. Charrier, L. Rannabauer, T. Weinzierl, M. Bader, A simple diffuse interface approach on adaptive Cartesian grids for the linear elastic wave equations with complex topography, Journal of Computational Physics 386 (2019) 158–189.
 (3) N. T. Bishop, L. Rezzolla, Extraction of gravitational waves in numerical relativity, Living Reviews in Relativity 19 (1) (2016) 2. doi:10.1007/s4111401600019.
 (4) A. Tsokaros, B. C. Mundim, F. Galeazzi, L. Rezzolla, K. b. o. Uryū, Initialdata contribution to the error budget of gravitational waves from neutronstar binaries, Phys. Rev. D 94 (2016) 044049. doi:10.1103/PhysRevD.94.044049.
 (5) M. Dumbser, F. Guercilena, S. Koeppel, L. Rezzolla, O. Zanotti, A strongly hyperbolic firstorder CCZ4 formulation of the Einstein equations and its solution with discontinuous Galerkin schemes, ArXiv eprintsarXiv:1707.09910.
 (6) O. Zanotti, M. Dumbser, Efficient conservative ADER schemes based on WENO reconstruction and spacetime predictor in primitive variable, Computational Astrophysics and Cosmology 3.
 (7) F. Fambri, M. Dumbser, O. Zanotti, Spacetime adaptive ADERDG schemes for dissipative flows: Compressible NavierStokes and resistive MHD equations, Computer Physics Communications 219.
 (8) W. Reed, T. R. Hill, Triangular mesh methods for the neutron transport equation.
 (9) B. Cockburn, C.W. Shu, The RungeKutta local projection  discontinuousGalerkin finite element method for scalar conservation laws, ESAIM: Mathematical Modelling and Numerical Analysis 25 (3) (1991) 337–361.
 (10) B. Cockburn, C.W. Shu, TVB RungeKutta local projection discontinuous Galerkin finite element method for conservation laws II: General framework, Mathematics of Computation 52 (186) (1989) 411–435.
 (11) B. Cockburn, S.Y. Lin, C.W. Shu, TVB RungeKutta local projection discontinuous Galerkin finite element method for conservation laws III: Onedimensional systems, Journal of Computational Physics 84 (1) (1989) 90 – 113.
 (12) B. Cockburn, S. Hou, C.W. Shu, The RungeKutta local projection discontinuous Galerkin finite element method for conservation laws. IV. The multidimensional case, Mathematics of Computation 54 (1990) 545–581.
 (13) B. Cockburn, C.W. Shu, The Runge–Kutta discontinuous Galerkin method for conservation laws V: Multidimensional systems, Journal of Computational Physics 141 (2) (1998) 199 – 224.
 (14) V. A. Titarev, E. F. Toro, ADER: Arbitrary High Order Godunov Approach, Journal of Scientific Computing 17 (1) (2002) 609–618. doi:10.1023/A:1015126814947.
 (15) G. Gassner, M. Dumbser, F. Hindenlang, C.D. Munz, Explicit onestep time discretizations for discontinuous Galerkin and finite volume schemes based on local predictors, Journal of Computational Physics 230 (11) (2011) 4232–4247. doi:10.1016/j.jcp.2010.10.024.
 (16) M. Dumbser, C. Enaux, E. F. Toro, Finite volume schemes of very high order of accuracy for stiff hyperbolic balance laws, J. Comput. Phys. 227 (8) (2008) 3971–4001.
 (17) R. Hartmann, P. Houston, Adaptive discontinuous Galerkin finite element methods for the compressible Euler equations, Journal of Computational Physics 183 (2) (2002) 508 – 532.
 (18) P.O. Persson, J. Peraire, Subcell shock capturing for discontinuous Galerkin methods, in: 44th AIAA Aerospace Sciences Meeting and Exhibit, p. 112.
 (19) D. Radice, L. Rezzolla, Discontinuous Galerkin methods for generalrelativistic hydrodynamics: Formulation and application to spherically symmetric spacetimes, Phys. Rev. D 84 (2011) 024010.
 (20) J. Qiu, C. Shu, Runge–Kutta discontinuous Galerkin method using WENO limiters, SIAM Journal on Scientific Computing 26 (3) (2005) 907–929.
 (21) J. Qiu, C.W. Shu, Hermite WENO schemes and their application as limiters for rungekutta discontinuous Galerkin method: onedimensional case 193 (2003) 115–135.
 (22) R. Loubère, M. Dumbser, S. Diot, A new family of high order unstructured MOOD and ADER finite volume schemes for multidimensional systems of hyperbolic conservation laws, Communications in Computational Physics 16 (3) (2014) 718–763. doi:10.4208/cicp.181113.140314a.
 (23) S. Diot, R. Loubère, S. Clain, The MOOD method in the threedimensional case: Veryhighorder finite volume method for hyperbolic systems.
 (24) S. Diot, S. Clain, R. Loubère, Multidimensional optimal order detection (MOOD) – a very highorder finite volume scheme for conservation laws on unstructured meshes.
 (25) M. Dumbser, O. Zanotti, R. Loubère, S. Diot, A Posteriori Subcell Limiting of the Discontinuous Galerkin Finite Element Method for Hyperbolic Conservation Laws, J. Comput. Phys. 278 (C) (2013) 47–75.
 (26) M. Dumbser, O. Zanotti, R. Loubère, S. Diot, A posteriori subcell limiting of the discontinuous Galerkin finite element method for hyperbolic conservation laws, Journal of Computational Physics 278 (2014) 47–75.
 (27) T. Weinzierl, M. Mehl, Peano – A Traversal and Storage Scheme for OctreeLike Adaptive Cartesian Multiscale Grids, SIAM J. Sci. Comput. 33 (5) (2011) 2732–2760. doi:10.1137/100799071.
 (28) T. Weinzierl, The Peano software—parallel, automatonbased, dynamically adaptive grid traversals, ACM Transactions on Mathematical SoftwareIn press.
 (29) O. Zanotti, F. Fambri, M. Dumbser, A. Hidalgo, Space–time adaptive ADER discontinuous Galerkin finite element schemes with a posteriori subcell finite volume limiting, Computers & Fluids 118 (2015) 204–224.
 (30) The jenkins project, https://jenkins.io/.
 (31) K. Duru, L. Rannabauer, A.A. Gabriel, On energy stable discontinuous Galerkin approximations for scattering problems in complex elastic media with adaptive curvilinear meshes, Computer Methods in Applied Mechanics and Engineering.
 (32) L. Krenz, L. Rannabauer, M. Bader, A highorder discontinuous Galerkin solver with dynamic adaptive mesh refinement to simulate cloud formation processes (2019). arXiv:1905.05524.
 (33) S. Köppel, Towards an exascale code for GRMHD on dynamical spacetimes, ArXiv eprintsarXiv:1711.08221.
 (34) F. Fambri, M. Dumbser, S. Köppel, L. Rezzolla, O. Zanotti, ADER discontinuous Galerkin schemes for generalrelativistic ideal magnetohydrodynamics, Monthly Notices of the Royal Astronomical Society 477 (4) (2018) 4543–4564. arXiv:/oup/backfile/content_public/journal/mnras/477/4/10.1093_mnras_sty734/1/sty734.pdf, doi:10.1093/mnras/sty734.
 (35) L. Rezzolla, O. Zanotti, Relativistic Hydrodynamics, Oxford University Press, Oxford UK, 2013.
 (36) R. Abgrall, On essentially nonoscillatory schemes on unstructured meshes: Analysis and implementation, Journal of Computational Physics 114 (1) (1994) 45 – 58.
 (37) M. Dumbser, A. Hidalgo, O. Zanotti, High order space–time adaptive ADERWENO finite volume schemes for nonconservative hyperbolic systems, Computer Methods in Applied Mechanics and Engineering 268 (2014) 359 – 387.
 (38) B. van Leer, Towards the ultimate conservative difference scheme, Journal of Computational Physics 135 (2) (1997) 229 – 248. doi:https://doi.org/10.1006/jcph.1997.5704.
 (39) O. Zanotti, F. Fambri, M. Dumbser, A. Hidalgo, Space–time adaptive ADER discontinuous Galerkin finite element schemes with a posteriori subcell finite volume limiting, Computers & Fluids 118 (2015) 204 – 224. doi:https://doi.org/10.1016/j.compfluid.2015.06.020.
 (40) A. Heinecke, G. Henry, M. Hutchinson, H. Pabst, LIBXSMM: Accelerating small matrix multiplications by runtime code generation, in: SC16: International Conference for High Performance Computing, Networking, Storage and Analysis(SC), Vol. 00, 2016, pp. 981–991. doi:10.1109/SC.2016.83.
 (41) L. Rannabauer, S. Haas, D. E. Charrier, T. Weinzierl, Michael, Simulation of tsunamis with the exascale hyperbolic PDE engine ExaHyPE, in: Environmental Informatics: Techniques and Trends. Adjunct Proceedings of the 32nd edition of the EnviroInfo., 2018.
 (42) W. Eckhardt, T. Weinzierl, A blocking strategy on multicore architectures for dynamically adaptive PDE solvers, in: Parallel Processing and Applied Mathematics, 8th International Conference, PPAM 2009, Wroclaw, Poland, September 1316, 2009. Revised Selected Papers, Part I, 2009, pp. 567–575. doi:10.1007/9783642143908_59.
 (43) D. Charrier, B. Hazelwood, T. Weinzierl, Enclave tasking and excessive grid regularisation for on discontinuous Galerkin methods on dynamically adaptive meshes(to be submitted).
 (44) M. Dumbser, F. Fambri, M. Tavelli, M. Bader, T. Weinzierl, Efficient implementation of ADER Discontinuous Galerkin schemes for a scalable hyperbolic PDE engine, Axioms 7.
 (45) R. J. LeVeque, FiniteVolume Methods for Hyperbolic Problems, Cambridge University Press, 2002.
 (46) J. Reinders, Intel Threading Building Blocks, 1st Edition, O’Reilly & Associates, Inc., Sebastopol, CA, USA, 2007.
 (47) D. E. Charrier, B. Hazelwood, E. Tutlyaeva, M. Bader, M. Dumbser, A. Kudryavtsev, A. Moskovsky, T. Weinzierl, Studies on the energy and deep memory behaviour of a cacheoblivious, taskbased hyperbolic PDE solver, The International Journal of High Performance Computing Applications 0 (0) (0) 1094342019842645. doi:10.1177/1094342019842645.
 (48) S. M. Day, C. R. Bradley, Memoryefficient simulation of anelastic wave propagation., Bulletin of the Seismological Society of America 3 520–531. doi:https://doi.org/10.1785/0120000103.
 (49) M. Dumbser, D. S. Balsara, A new efficient formulation of the HLLEM Riemann solver for general conservative and nonconservative hyperbolic systems, J. Comput. Phys. 304 (C) (2016) 275–319. doi:10.1016/j.jcp.2015.10.014.
 (50) L. Rannabauer, M. Dumbser, M. Bader, ADERDG with aposteriori finitevolume limiting to simulate tsunamis in a parallel adaptive mesh refinement framework, Computers & Fluidsdoi:https://doi.org/10.1016/j.compfluid.2018.01.031.
 (51) F. C. Michel, Accretion of Matter by Condensed Objects 15 (1972) 153–160. doi:10.1007/BF00649949.
 (52) G. Gassner, F. Lörcher, C.D. Munz, A discontinuous Galerkin scheme based on a spacetime expansion II. viscous flow equations in multi dimensions, Journal of Scientific Computing 34 (3) (2008) 260–286. doi:10.1007/s1091500791691.
 (53) M. Tavelli, M. Dumbser, A staggered space–time discontinuous Galerkin method for the threedimensional incompressible Navier–Stokes equations on unstructured tetrahedral meshes, Journal of Computational Physics 319 (2016) 294–323. doi:10.1016/j.jcp.2016.05.009.
 (54) A. Müller, J. Behrens, F. X. Giraldo, V. Wirth, An adaptive discontinuous Galerkin method for modeling cumulus clouds, in: Fifth European Conference on Computational Fluid Dynamics, ECCOMAS CFD, 2010.

(55)
Clawpack Development Team, Clawpack software,
version 5.4.0 (2017).
doi:10.5281/zenodo.262111.
URL http://www.clawpack.org  (56) M. Schreiber, T. Weinzierl, A case study for a new invasive extension of Intel’s Threading Building Blocks, in: HiPEAC 2018—3rd COSH Workshop on CoScheduling of HPC Applications, 2018.
 (57) D. E. Charrier, T. Weinzierl, An experience report on (auto)tuning of meshbased PDE solvers on shared memory systems, in: Parallel Processing and Applied Mathematics  12th International Conference, PPAM 2017, Lublin, Poland, September 1013, 2017, 2017, pp. 3–13. doi:10.1007/9783319780542_1.
 (58) B. Reps, T. Weinzierl, A complex additive geometric multigrid solver for the Helmholtz equations on spacetrees, ACM Transactions on Mathematical Software 44 (1) (2017) 2:1–2:36.
 (59) T. Weinzierl, B. Verleye, P. Henri, D. Roose, Two particleingrid realisations on spacetrees, Parallel Computing 52 (2016) 42–64.
 (60) M. Weinzierl, T. Weinzierl, Quasimatrixfree hybrid multigrid on dynamically adaptive Cartesian grids, ACM Trans. Math. Softw.(in press).
Appendix A Dependencies and prerequisites
ExaHyPE requires the following prerequisites:

For sequential simulations, only a C++ compiler is required. The code uses only few C++14 features, but for many older versions enabling those features through std=c++0x is required.

Python 3

ExaHyPE’s default build environment uses GNU Make.
Further, ExaHyPE has the following optional dependencies:

ExaHyPE user code can also be written in Fortran. In this case, a Fortran compiler is needed.

To exploit multi or manycore computers, Intel’s TBB 2017 is required. It is open source and works with GCC and Intel compilers.

To run ExaHyPE on a distributed memory cluster, MPI is needed. ExaHyPE uses only very basic MPI routines (use e.g. MPI 1.3).

To use ExaHyPE’s optimised compute kernels, Intel’s libxsmm^{1}^{1}1Libxsmm is open source: https://github.com/hfp/libxsmm and Python’s module Jinja2^{2}^{2}2Jinja2 is open source: http://jinja.pocoo.org/ are required. A local installation script is made available.
Appendix B Obtaining ExaHyPE
ExaHyPE is free software is hosted at www.exahype.eu. There a two different options to obtain ExaHyPE, the first is to download a complete snapshot of ExaHyPE, in this case a snapshot of Peano is included. The second option is to clone the repository, in this case Peano has to be added manually. In order to access the repository, the maillist, the bug lists users are asked to to register at http://www.peanoframework.org/hpcsoftware/exahype/. The ExaHyPE guidebook contains documentation, detailed rationale and links to further resources.