A comparison of semi-Lagrangian discontinuous Galerkin and spline based Vlasov solvers in four dimensions

03/06/2018
by   Lukas Einkemmer, et al.
Universität Tübingen
0

The purpose of the present paper is to compare two semi-Lagrangian methods in the context of the four-dimensional Vlasov--Poisson equation. More specifically, our goal is to compare the performance of the more recently developed semi-Lagrangian discontinuous Galerkin scheme with the de facto standard in Eulerian Vlasov simulation (i.e. using cubic spline interpolation). To that end, we perform simulations for nonlinear Landau damping and a two-stream instability and provide benchmarks for the SeLaLib and sldg codes (both on a workstation and using MPI on a cluster). We find that the semi-Lagrangian discontinuous Galerkin scheme shows a moderate improvement in run time for nonlinear Landau damping and a substantial improvement for the two-stream instability. It should be emphasized that these results are markedly different from results obtained in the asymptotic regime (which favor spline interpolation). Thus, we conclude that the traditional approach of evaluating numerical methods is misleading, even for short time simulations. In addition, the absence of any All-to-All communication in the semi-Lagrangian discontinuous Galerkin method gives it a decisive advantage for scaling to more than 256 cores.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

02/17/2022

Augmented Lagrangian approach to deriving discontinuous Galerkin methods for nonlinear elasticity problems

We use the augmented Lagrangian formalism to derive discontinuous Galerk...
03/22/2016

A mixed precision semi-Lagrangian algorithm and its performance on accelerators

In this paper we propose a mixed precision algorithm in the context of t...
12/16/2019

Arbitrary Lagrangian-Eulerian hybridizable discontinuous Galerkin methods for incompressible flow with moving boundaries and interfaces

We present a class of Arbitrary Lagrangian-Eulerian hybridizable discont...
03/01/2019

A massively parallel semi-Lagrangian solver for the six-dimensional Vlasov-Poisson equation

This paper presents an optimized and scalable semi-Lagrangian solver for...
11/27/2019

High Order Semi-Lagrangian Discontinuous Galerkin Method Coupled with Runge-Kutta Exponential Integrators for Nonlinear Vlasov Dynamics

In this paper, we propose a semi-Lagrangian discontinuous Galerkin metho...
02/07/2020

An Eulerian-Lagrangian discontinuous Galerkin method for transport problems and its application to nonlinear dynamics

We propose a new Eulerian-Lagrangian (EL) discontinuous Galerkin (DG) me...
08/02/2021

Modeling and simulation of thin sheet folding

The article addresses the mathematical modeling of the folding of a thin...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many problems in computational plasma physics require the solution of kinetic equations. Due to the high dimensionality of these models, particle methods have been, and still are, heavily used to conduct these simulations. However, due to the rapid increase in computational performance, numerical methods based on the discretization of the entire phase space (the so-called Eulerian approach) have gained increasing attention in recent years (at least for four-dimensional and certain five-dimensional problems). These Eulerian methods offer superior accuracy and the possibility to resolve regions of the phase space with low density, but are extremely demanding from a computational point of view. Consequently, it is important to develop efficient numerical methods that can be parallelized to state of the art supercomputers.

While standard techniques from fluid dynamics can be applied to kinetic equations, these usually suffer from a severe restriction in the time step size due to a Courant–Friedrichs–Lewy (CFL) condition. On the other hand, the use of implicit methods would significantly increase the memory footprint, which is already a scarce resource for high-dimensional simulations. This is an even more pronounced issue for accelerators such as graphic processing units (GPUs) and the Intel Xeon Phi (which in all likelihood will become more pervasive in the future). Semi-Lagrangian methods offer a viable alternative. These methods follow the characteristics backward in time. In many kinetic models (such as the Vlasov–Poisson and Vlasov–Maxwell equations) applying a splitting procedure results in sub-steps for which the characteristics can be determined analytically (if this is not the case an explicit ordinary differential equation solver is usually used). For the Vlasov–Poisson equation this approach has been first suggested in the seminal paper of Cheng & Knorr

cheng1976 . Later it was extended in various ways to the Vlasov–Maxwell equation crouseilles2015hamiltonian ; valis ; mangeney2002 . However, since the endpoint of a characteristic curve does not necessarily coincide with the grid used, an interpolation procedure has to be employed. An attractive choice is to reconstruct the desired function by spline interpolation (see, for example, sonnendrucker1999 ; filbet2003 ), which according to sonnendruecker2011 is still considered the de facto standard in Vlasov simulations. The main downside of this approach is that a tridiagonal linear system of equations has to be solved to construct the spline. While this can be done very efficiently in a sequential implementation, for modern supercomputers (which require scaling to hundreds of thousands of cores) such global dependencies seriously limit the utility of this algorithm.

On the other hand, the semi-Lagrangian discontinuous Galerkin method employs a piecewise polynomial approximation in each cell of the computational domain (see, for example, rossmanith2011 ; crouseilles2011 ; einkemmer2014 ; qiu2011 ). In the case of an advection equation the discretized function is translated and then projected back to the appropriate subspace of piecewise polynomial functions. This method, per construction, is mass conservative and only accesses two adjacent cells in order to compute the necessary projection (this is true independent of the order of the approximation). The local nature of this method also makes it easy to incorporate block structured mesh refinement and employing limiters to maintain positivity. Positivity limiters were also investigated for spline interpolation (see zerroukat2005 ; zerroukat2006 ; crouseilles2010 ), but these schemes are much more involved. For the semi-Lagrangian discontinuous Galerkin scheme mathematical rigorous convergence results are available in einkemmer2014convergence ; einkemmer2014 . The method is fully explicit (i.e. no linear system has to be solved to advance the solution in time) and thus it is easier to implement (especially on parallel architectures) and shows a more favorable communication pattern. We should note that some measures have been taken to improve the parallel scalability of cubic spline interpolation crouseilles2009 . However, even for this approach a relatively large communication overhead is incurred. This is due to the fact that the boundary condition for the local spline reconstruction requires a large stencil if the desirable properties of the global cubic spline interpolation are to be preserved (the method derived in crouseilles2009 requires a centered stencil of size ). In addition, a very efficient mixed precision implementation is available for the semi-Lagrangian discontinuous Galerkin method einkemmer2016 . This is particular useful for modern accelerators (such as GPUs) and helps to alleviate the problem of memory scarcity on such systems. Furthermore, it is worth mentioning that a filamentation filtration strategy has been developed einkemmer2014recurrence (for problems where numerical recurrence is an issue) and that the error propagation is superior to other semi-Lagrangian schemes steiner2013semi ; einkemmer2015error (although to our knowledge no proof of this fact is available for the Vlasov equation).

Before proceeding, let us note that a range of other methods have been used to solve the Vlasov equation as well. A method that is related to the discontinuous Galerkin approach and which has been widely employed in plasma simulations is the van Leer scheme (see, for example, fijalkow1999 ; mangeney2002 ; galeotti2005 ; califano2006 ). What distinguishes the discontinuous Galerkin method from the van Leer scheme is that the coefficients in the corresponding basis expansion are stored directly in computer memory. In contrast, the van Leer scheme replaces these coefficients by performing an approximation using suitable differences on an equidistant grid. Its implementation and performance characteristics are thus more closely related to other finite difference schemes.

Despite the many advantages of the semi-Lagrangian discontinuous Galerkin method outlined above, there is certainly no universal agreement that this method will be a viable alternative for large scale high-dimensional kinetic simulation. Specifically, it is often argued, both by reviewers and in informal conversation, that these methods require more degrees of freedom to achieve the same precision (compared to say spline interpolation) and that this is not maintainable in a higher-dimensional setting. This argument is certainly not easily dismissed. Let us assume that instead of

degrees of freedom per direction our numerical approach requires , with . Then the memory footprint and run time are increased by a factor , where is the dimensionality of the problem. For example, even a moderate increase, for example , results in a substantial increase of memory footprint/run time (a factor of for and factor of for ).

Of course, the important question then becomes if the semi-Lagrangian discontinuous Galerkin method actually requires more degrees of freedom and if so by how much. Although this is a common problem for classic (i.e. non semi-Lagrangian) discontinuous Galerkin methods and has, for example, lead to so-called hybrid schemes (see, for example, lehrenfeld2016high ), the situation considered here is quite different. Specifically, the problem is reduced to a sequence of one-dimensional advection equations which are then solved using the method of characteristics. In the present paper we will investigate if the semi-Lagrangian discontinuous Galerkin method requires more degrees of freedom compared to spline interpolation. This will be done in the context of the Vlasov–Poisson equation

(1)

However, before we proceed and discuss the comparative efficiency of the semi-Lagrangian discontinuous Galerkin method (section 3) and the four-dimensional simulations conducted (sections 4 and 5), let us dwell a bit more on the argument (against the discontinuous Galerkin method) outlined above. To do that we will consider nonlinear Landau damping as a test problem and, for the time being, still restrict ourselves to the two-dimensional case (for reasons that will be apparent shortly; the four-dimensional variant of this problem is discussed in section 4). That is, we consider the initial value

where and , on the domain . A splitting procedure is employed to obtain a sequence of advection equations which are discretized using either a semi-Lagrangian approach based on spline interpolation or the semi-Lagrangian discontinuous Galerkin approach (both numerical methods will be discussed in some detail in section 2). The error committed by these two numerical schemes is computed by comparing it to a reference solution (with a sufficiently fine resolution).

Many publications (see, for example, einkemmer2017study ; crouseilles2011 ; cheng2014discontinuous ), including work by the authors of this paper, have reported such results. To facilitate the present discussion. we have computed the error in the particle-density at time as a function of the degrees of freedom (per direction). These results are shown on the left in Figure 1. We clearly observe that, at best, the discontinuous Galerkin method struggles in a direct comparison with spline interpolation. The sixth-order discontinuous Galerkin method (dG6) requires approximately 50% more degrees of freedom to obtain the same accuracy. Thus, it seems that the discussion has come to a close and the criticism outlined above seems to be completely justified. However, the major caveat here is that in practical applications no one is interested in solving nonlinear Landau damping only up to . Now, if we look at the error at (right-hand side of Figure 1), the error does not appreciably decrease at all (up to degrees of freedom per direction) for any of the numerical methods considered. To run a simulation with degrees of freedom in four dimensions would require at least TB of memory, which would stress even the world’s most powerful supercomputers (the number one system on the present TOP500 supercomputing list has TB of memory; thus such a simulation would be possible in principle, but certainly not very practical). Thus, while the argument provided is valid for very small times, it is all but pointless for realistic simulations. The question that we should ask instead is how well does a numerical method resolve the physics of the problem under consideration (for reasonable problem sizes).

Figure 1: The error, as a function of the number of degrees of freedom (per direction), for nonlinear Landau damping is shown at (left) and (right). The error is computed in the infinity norm and is normalized by the norm of the reference solution. The reference solution is computed using spline interpolation with grid points (per direction). For all simulations a Strang splitting procedure with time step size has been used.

The example considered here is two-dimensional and thus should not be considered the gold standard for higher-dimensional simulations. In fact, the reason why we consider this in two dimensions is that it is prohibitively expensive to run simulations with or even more degrees of freedom. This paper will show that for the four-dimensional case the performance of a numerical method in the asymptotic regime is markedly different from the performance of interest in realistic simulations. More specifically, we will show that even though the semi-Lagrangian discontinuous Galerkin method is inferior to spline interpolation for small times, depending on the problem, similar or even superior performance can be achieved for more realistic simulations.

Of course, the importance of designing numerical methods that faithfully resolve the physics (as opposed to primarily decreasing the error in some norm) has been considered extensively in the context of long time integration (see, for example, hairer2006geometric ). The difference here is that we certainly would not consider for nonlinear Landau damping a long-time result. In fact, to determine the long-time behavior of nonlinear Landau damping, numerical simulations, mostly in two dimensions, for at least up to have been considered (see, for example, brunetti2000 and manfredi1997 ). In the literature much effort has been dedicated to numerical methods which conserve certain invariants (such as mass or energy) exactly and to methods which have good long time properties with respect to these quantities. This is also an important aspect for the Vlasov equation and many works exist that consider conservation of invariants (see, for example, crouseilles2010 ; filbet2003 for spline interpolation and einkemmer2017study ; crouseilles2011 for the discontinuous Galerkin approach). In addition, since the Vlasov equation is often used to simulate plasma instabilities, the time evolution of certain averaged quantities (most notably the total electric energy) are of significant interest. Both of these aspects will be considered in sections 4 and 5.

2 Numerical methods

To advance the numerical solution in time we use the splitting approach introduced in cheng1976 . That is, we first consider the free-streaming sub-flow

(2)

The corresponding solution at time is then denoted by

For the second part of the splitting algorithm we consider the acceleration sub-flow

(3)

where the electric field is determined from the (charge) density and is thus taken constant during that step. This property actually follows from equation (3) as the translation is only in the -direction and thus leaves the density invariant. We denote the corresponding solution by

Note that, contrary to what the notation suggests, is a nonlinear operator as the electric field depends on . Using this notation we can easily formulate a time step of the splitting algorithm. In all our numerical results we will use the second-order Strang splitting given by

where is an approximation of and is the time step size. Let us note, however, that higher order splitting methods have been constructed as well (see, for example, crouseilles2011high ).

The main computational advantage of the splitting scheme is that the resulting sub-flows, given by equations (2) and (3), are in the form of an advection equation, where the advection speed is independent of the variables being advected. Thus, we can immediately solve equation (2) to obtain

and solve equation (3) to obtain

Thus, by using the splitting approach outlined above we have reduced the task of computing a numerical approximation to the four-dimensional Vlasov–Poisson equation to computing two-dimensional translations in phase space. However, this can be further simplified by splitting equation (2) into an advection in the direction

and an advection in the direction

Let us denote the corresponding flows at time by and , respectively. Then since (i.e. the two operators commute) we have

and no further splitting error is incurred. A very similar procedure can be carried out for the acceleration sub-flow. Thus, the four-dimensional Vlasov–Poisson equation has been reduced to a sequence of one-dimensional advections. This has significant advantages from a computational point of view and, in particular, implies that we only have to formulate our semi-Lagrangian scheme in a single dimension. We should remark that this approach can be extended to more complicated models, such as the Vlasov–Maxwell equations crouseilles2015hamiltonian .

We will now describe the two competing approaches considered in this paper. First, for the semi-Lagrangian method based on spline interpolation an equidistant grid is used. However, for any grid point the translations , , etc., in general, do not coincide with a grid point (here denotes the first component of the electric field). Thus, we have to use an interpolation scheme in order to evaluate , . This is done by cubic spline interpolation. Note that only splines in one space dimension have to be constructed. This is essential for the computational efficiency of the scheme. Further computational ramifications are discussed in more detail in section 3.

Second, the semi-Lagrangian discontinuous Galerkin scheme divides the domain into cells , where the are one-dimensional intervals of length . We further assume that a function is given such that , i.e. the restriction of to the th cell, is a polynomial of degree . Then the function lies in the approximation space (note that we do not enforce a continuity constraint across cell interfaces). However, in general, this is not true for the translated function (for the other three translations we proceed in exactly the same way)

Thus, we perform an approximation by applying a projection operator and obtain . In order to turn this into a sensible numerical scheme, it must hold that is a polynomial of degree for any function . Then constitutes the sought-after approximation of . In the present case the operator is the projection on the (finite dimensional) subspace of cell-wise polynomials of degree . Note that this projection is the only approximation made. There is some freedom in selecting appropriate degrees of freedom for this method. Two obvious choices are to use the coefficients in a Legendre expansion or the value of function evaluations at the Gauss–Legendre points. In our implementation we use the latter. Since the value at the beginning of each time step is (per assumption) a cell-wise polynomial of degree , we can use Gauss–Legendre quadrature in order to analytically compute the coefficients required to implement the projection. The performance implications will be discussed in section 3.

3 Computational consideration & performance results

The goal of this section is to discuss the performance of the cubic spline and discontinuous Galerkin based semi-Lagrangian methods on modern computer architectures. We will do this both from a theoretical and a practical (i.e. measuring specific implementations of these algorithms) point of view.

As has been discussed in the previous section, only a set of one-dimensional advection equations have to be solved for both algorithms. Thus, to simplify the notation, we only consider the one-dimensional advection

(4)

in this section. This is, in fact, exactly how the numerical algorithm is implemented on a computer. A function is provided that solves the one-dimensional advection given by equation (4). This function is then executed multiple times with different parameters.

For spline interpolation, the function values are given at the grid points . The index runs from to , where is the number of degrees of freedom. From this a spline that covers the entire domain has to be constructed. Since it would be wasteful to represent a spline as a collection of piecewise polynomials (as most of the coefficients of these polynomials are fixed by the continuity constraints), a B-spline basis is used instead. That is, the spline interpolation of is represented as

where is given by

and is the grid spacing. The coefficients are uniquely determined by the function values and can be obtained by solving the following linear system of equations (assuming periodic boundary conditions)

Note that the matrix is tridiagonal and symmetric and thus efficient methods exist to solve this system. Once the spline is constructed it is used to solve equation (4) as follows

Thus, the numerical algorithm proceeds in two steps. First, the spline is constructed by solving a linear system and this intermediate representation is then used to evaluate the numerical solution at the grid points. Both of these steps require memory and arithmetic operations, where is the number of degrees of freedom. In the construction step this is due to the fact that linear complexity solvers are available for tridiagonal systems (such as the Thomas algorithm) and for the evaluation step this is true since the support of is compact and independent of .

On modern computer hardware both of these steps are memory bound. That is, performance is dictated by how fast data can be read and written to memory (as opposed to how many arithmetic operations the computer can perform). In the present case we have to at least read all and write all . In addition, we have to write and read all at least once. The latter is more subtle than one might expect. Since in a sequential implementation only one one-dimensional array of s needs to exist at any given time, this array could conceivable be kept in cache (except for extremely large problem sizes) and then reads and writes to it would not count as a memory operation (in the sense that those do not negatively impact performance).

For the semi-Lagrangian discontinuous Galerkin scheme the degrees of freedom are given by function values , where is the cell-index (which runs from to ), the left cell interface, and is the th Gauss–Legendre node scaled to the interval . The indices run from to (the degree of the polynomial approximant). The resulting scheme has order and we will henceforth denote it as dG(). It is clear that to perform the projection on the th cell only data from at most two adjacent cells is required. This is illustrated in Figure 2. We denote the index of these two cells by and , respectively. Since the projection is a linear operator we can write the resulting numerical scheme as follows

(5)

where and are matrices that only depend on . The matrices can be precomputed at the beginning of each time step and thus their cost will not be a major concern except for extremely coarse space discretizations. The derivation of their exact form is somewhat lengthy and we thus refer the interested reader to einkemmer2015 .

For the implementation equation (5) is employed. The size of the matrices is negligible in comparison to the cache size and we therefore do not have to count reads and writes to them as memory accesses. The present algorithm has to read all entries in and write to all entries in ; it thus requires memory operations, where is the number of degrees of freedom. Technically, the algorithm requires arithmetic operations and thus does not scale linearly in the degrees of freedom. However, since, for reasonable values of , this is still an extremely memory bound problem this is of no consequence (i.e. performance is dictated by the memory accesses and not by amount of arithmetic operations that need to be performed).

Figure 2: Illustration of the semi-Lagrangian discontinuous Galerkin scheme. The piecewise polynomial function of degree (left picture) is translated by (right picture). The translated function is then projected back to the approximation subspace (which requires data from at most two adjacent cells).

In an idealized setting both algorithms could achieve very similar levels of performance (both require us to read once and write to once). Of course, achieving this in an actual implementation is not a trivial task. However, it is our believe that the local nature of the discontinuous Galerkin approach simplifies this task, at least, to some extent. In the following we will compare the performance of two Vlasov solvers. SeLaLib selalib is a library that facilitate kinetic simulations based on the semi-Lagrangian approach. It has been used extensively (see, for example, crouseilles2014new ; guisset2016lagrangian ; barsamian2017verification ). The library is written in Fortran and provides an efficient implementation of the cubic spline based semi-Lagrangian discontinuous Galerkin method. SeLaLib is compared with our own implementation (called sldg) of the semi-Lagrangian discontinuous Galerkin approach. The code is written in C++ and due to the use of templates facilitates implementations that work in arbitrary dimensions, while still maintaining good performance. This architecture is described in einkemmer2015 . In addition to solving the Vlasov equation, it has been used in non-traditional applications of semi-Lagrangian methods (such as for the Kadomtsev–Petviashvili equation that requires the solution of a nonlinear advection einkemmer2017semi ).

Another major difference in the implementation of these two codes is how data is presented to the routines that perform the one-dimensional advections. For SeLaLib the corresponding arrays are always contiguous in memory. That is, for the advection in the -direction the elements in the direction are laid out sequentially in memory, for the advection in the -direction the elements in the direction are laid out sequentially in memory, etc. This requires that the four dimensional array is converted (i.e. transposed) to the appropriate form for each advection. This approach has the advantage that the same code can be used for all advection steps and cache locality is ensured almost automatically. It also fits very well with spline interpolation, as the global algorithm for constructing the spline is pointed at one contiguous array of memory. The disadvantage of this approach is that the transpositions do not have a very favorable memory access pattern and thus incur a significant computational cost.

On the other hand, the semi-Lagrangian discontinuous Galerkin implementation uses the ability of abstractions in C++ to hide the actual memory layout from the function that performs the advection. Thus, it is still possible to write the one-dimensional advection code as if the arrays are contiguous, even though the actual memory layout is more complicated. As a consequence, no transpositions have to be carried out in this implementation. The main concern here is that cache locality has to be taken into account. However, the local nature of the discontinuous Galerkin approach makes this less problematic. To further alleviate this issue we use cache blocking for the dimension with the largest stride.

We have measured both the performance and the amount of memory used for these two codes. The results are shown in Table 1. For both methods one dual socket node of the described computer systems is used. Both codes are run using parallel threads. We observe that the discontinuous Galerkin implementation is both faster (by approximately a factor of on the newer Haswell CPUs and approximately a factor of on the Ivy Bridge CPUs) and uses significantly less memory (by approximately a factor of ), if the same number of degrees of freedom are used.

Workstation

Time per step [s] Memory used [GB]
dof=64 dof=128 dof=64 dof=128
SeLaLib 0.194 3.25 0.85 13.0
sldg (dG4) 0.105 1.62 0.28 4.2

VSC3

Time per step [s]
dof=64 dof=128
SeLaLib 0.219 4.02
sldg (dG4) 0.151 2.31
Table 1: Comparison of performance and memory usage between SeLaLib (cubic spline interpolation) and sldg (fourth order semi-Lagrangian discontinuous Galerkin method). For both codes the nonlinear Landau damping initial value of section 4 is used and and degrees of freedom (dof) per direction are considered. The Workstation consists of two Intel Xeon E5-2630 v3 CPUs and 32 GB of DDR4 memory. One node of the VSC3 consists of two Intel Xeon E5-2650v2 and 64GB of DDR3 memory. The memory consumption is measured using the /usr/bin/time utility.

Both codes support parallelization using MPI. This is essential even to obtain good performance on a single node (i.e. for the results presented in Table 1

). However, both codes can also be run on distributed memory clusters (supercomputers). In this case the issue of how to distribute the computational domain to the different MPI processes becomes important. Although this choice of data distribution is identical for both codes it is instructive to discuss it. The main issue here is that most FFT libraries (in particular, FFTW, which is used in both codes) only allow a so-called pencil decomposition. Thus it is not straightforward to parallelize the Fourier transforms (without incurring a penalty associated with additional data transfer). Both codes avoid this problem by only parallelizing in the velocity directions. Then, the Fourier transforms can be conducted independently on each of the processes. For four/six dimensional simulations there are still two/three dimensions available for parallelization. The main difference in the two codes is that while sldg only performs halo communication (due to the local nature of the semi-Lagrangian discontinuous Galerkin scheme), SeLaLib requires two calls to

MPI_Alltoall in each time step (due to the global data dependency inherent in constructing the spline).

We now investigate the scaling of both codes on the Vienna Scientific Cluster 3 (VSC3). For more information and details on the hardware of that system we refer the reader to http://vsc.ac.at/systems/vsc-3/. On the left hand side of Figure 3 we present strong scaling results (i.e. the problem size is kept constant as the number of cores is increased) for the problem considered in section 4. We observe that the discontinuous Galerkin scheme maintains its advantage independently of the number of cores the program is run on (i.e. both codes show a similar behavior with respect to scaling). On the right hand side of Figure 3 we present weak scaling results (i.e. the problem size increases proportionally to the number of cores). In this case we observe that sldg significantly outperforms SeLaLib (in particular, as we consider more than cores). It is interesting to investigate the reason for this discrepancy in performance. To that end we have subtracted the time required for the calls to MPI_Alltoall, which are used to map the velocity grid into contiguous memory on each process, in the SeLaLib code from the total run time (note that even on a single node this operation requires a significant amount of time, as computing the transpose is a memory intensive operation). The resulting line is very similar to the run time of the sldg code. That is, almost all of the difference in performance between the two codes is explained by the global communication required for the spline interpolation.

Figure 3: Strong and weak scaling for SeLaLib (cubic spline interpolation) and sldg (fourth order semi-Lagrangian discontinuous Galerkin method). For strong scaling the nonlinear Landau damping initial value of section 4 with degrees of freedom per direction is used. For weak scaling degrees of freedom, where is the number of nodes employed, are used. The simulations are conducted on the VSC3, each node of which consists of two Intel Xeon E5-2650v2 and 64GB of DDR3 memory.

4 Nonlinear Landau damping

In this section we consider nonlinear Landau damping in four dimensions. That is, we impose the initial value

(6)

on the domain . We set and . This is precisely the setup used in filbet2003 . Strang splitting with step size is used for the time integration. We are interested in resolving the dynamics of nonlinear Landau damping up to . More precisely, the characteristic growth and decay of the electric energy should be indistinguishable (in a plot) from the exact solution until . To accomplish this for spline interpolation we need grid points per direction (see Figure 1). Note that despite the fact that, as the results in Figure 1 show, the error in the density function is on the order of unity, the physically important quantity is resolved very well. Therefore, there is simply no advantage of running the integrator in the asymptotic regime (which in any case would not be feasible in four dimensions).

In the introduction second, fourth, and sixth order discontinuous Galerkin schemes were considered. It was already clear at that point that the second order scheme is not competitive. However, the order plots (for ) presented there, seem to indicate that the sixth order discontinuous Galerkin approach performs best. However, this is misleading as numerical simulations have shown that for (where the plots in the introduction give no sensible indication) the fourth order method should actually be favored. Therefore, in this section we will exclusively use dG4. Before proceeding, let us note, however, that for long time integration this is not necessarily the case. For example einkemmer2017study found by performing numerical simulations in two dimensions that for long times (both for nonlinear Landau damping and for a bump on tail instabilities) the sixth order method can be significantly more efficient.

If we run the same simulation for the discontinuous Galerkin approach we see that the solution is somewhat worse. In particular, between and the growth rate of the electric energy is slightly increased compared to the reference solution. In order to remedy this we can increase the number of degrees of freedom per direction to (corresponding to ). This means that in four dimensions 60% more degrees of freedom are required by the discontinuous Galerkin approach to achieve the same accuracy. However, as we have seen in section 3, the implementation of the discontinuous Galerkin scheme is also more efficient (by a factor of on Haswell CPUs) and uses less memory (by a factor of ). Thus, the discontinuous Galerkin scheme retains an advantage in performance (approximately 25%). In addition, the amount of memory required is reduced by a factor of .

Figure 4: The electric energy (top left), the energy error (top right), the norm (bottom left), and the norm (bottom right) are shown as a function of time for the nonlinear Landau damping problem given in equation (6). The reference solution is computed using spline interpolation and grid points (per direction).

In addition, Figure 4 shows the important physical invariants of energy, norm, and norm. Although the main concern for the nonlinear Landau damping is the time evolution of the electric energy, these invariants give additional diagnostics of how well physical properties of the exact solution are respected by a numerical method.

The error in energy is almost identical for both methods and is on the order of (after a transient phase). This is due to the fact that the energy error for the present problem is dominated by the time integration error.

For collisionless plasma simulations the error is an important diagnostic. A decrease in the norm implies that our numerical scheme adds numerical diffusion to the simulation. Although most numerical methods introduce a certain amount of diffusion, it is clearly preferable to limit this as much as possible. For nonlinear Landau damping the discontinuous Galerkin method outperforms spline interpolation in this metric (in particular, between and ).

Finally, we look at the norm. For the exact solution mass and norm are equal in value and the norm is thus conserved. However, if a mass conservative numerical scheme commits a numerical error that introduces negative values, the result is a change in the norm. This is, in particular, a problem for schemes that are susceptible to oscillations (such as spectral methods). However, most numerical methods for the Vlasov equation experience this behavior to some extent. Let us emphasize that (contrary to say the Euler equations) this neither leads to numerical instabilities nor necessarily diminishes averaged quantities (such as the electric energy). However, since negative values make the physical interpretation of the density function difficult, it is still desirable to limit this behavior as much as possible. For the nonlinear Landau damping the discontinuous Galerkin scheme shows worse performance with respect to conservation compared to spline interpolation.

Overall we conclude that the discontinuous Galerkin implementation has a moderate advantage with respect to performance and conservation as well as a significant advantage with respect to memory consumption. On the other hand, spline interpolation has a moderate advantage with respect to conservation. If we assume that the present results generalize to the six-dimensional case (i.e.  as above), then the performance of both methods would be evenly matched. This is due to the fact that more degrees of freedom would be required for the discontinuous Galerkin scheme, which exactly cancels the factor of advantage the corresponding implementation provides.

5 Two-stream instability

In this section we consider the two-stream instability in four dimensions. That is, we impose the initial value

(7)

with

where , , and . The computational domain is . Strang splitting with step size is used for the time integration.

This is an interesting problem as the electric energy is small (even decreases by three orders of magnitude compared to the initial value) during a quiescent phase (from to approximately ). However, eventually a physical instability develops that results in an exponential increase of the electric energy (approximately to ). This is followed by nonlinear saturation. In this nonlinear phase the electric energy oscillates around a constant value with an amplitude that decreases in time.

It turns out that to obtain agreement between a numerical and the exact solution for the quiescent phase is challenging. In fact, simulations conducted for a similar two-dimensional problem indicate that for spline interpolation this is only achieved for grid points per direction (which is extremely expensive). Nevertheless, simulations with fewer degrees of freedom are still able to give pertinent information on the plasma instability. In particular, at which time the instability occurs, the growth rate of the instability, and the behavior of the electric energy in the nonlinear regime (i.e. after saturation).

Figure 5 shows a comparison of the fourth order discontinuous Galerkin method and spline interpolation for , , and degrees of freedom (per direction). For the dG4 method already degrees of freedom are sufficient in order to accurately predict the time and growth of the instability. On the other hand, this is obviously not true for spline interpolation (which shows an instability that is too late and does not reach the peak for the electric energy). While increasing the number of degrees of freedom for spline interpolation results in significant improvements with respect to these metrics, the discontinuous Galerkin approach always retains an advantage in the nonlinear phase. More specifically, spline interpolation exhibits a decay of the electric energy that is not present in the exact solution.

Figure 5: The time evolution of the electric energy for the two-stream instability given by equation (7) is shown for (top left), (top right), and (center left) degrees of freedom. In addition, the energy error (center right), the norm (bottom left), and the norm (bottom right) are shown as a function of time. The reference solution is computed using spline interpolation and grid points (per direction).

This behavior can be explained by looking at the time evolution of the norm (see Figure 5). We observe that the norm for dG4 is significantly better conserved compared to spline interpolation once we enter the nonlinear phase. Thus, more diffusion is introduced by spline interpolation which results in the corresponding (unphysical) decrease in electric energy observed (and the absence of the same artefact for the discontinuous Galerkin approach).

For the present example, the difference between performance in the asymptotic regime (valid only for small times) and performance for time scales of physical interest, can be highlighted very clearly. In fact, if we look at the plots in Figure 5, we see that in the quiescent phase, the numerical solution obtained with spline interpolation follows the exact solution somewhat longer. However, even for degrees of freedom agreement is only achieved up to approximately (for spline interpolation). Nevertheless, the dG4 method is clearly superior in terms of both predicting the onset of the instability and in providing qualitatively correct predictions for the nonlinear phase (i.e. in obtaining the physics of interest).

Let us further remark that the norm is significantly better conserved by the discontinuous Galerkin approach compared to spline interpolation. Both methods conserve energy quite well (at least up to ). Although for the present simulation there is some advantage for spline interpolation.

We conclude that for the two-stream instability considered here, the discontinuous Galerkin approach is clearly the superior choice compare to spline interpolation.

References

  • [1] SeLaLib. http://selalib.gforge.inria.fr/.
  • [2] Y. Barsamian, J. Bernier, S. Hirstoaga, and M. Mehrenberger. Verification of 2Dx2D and two-species Vlasov-Poisson solvers. preprint, hal-01668744, 2017.
  • [3] M. Brunetti, F. Califano, and F. Pegoraro. Asymptotic evolution of nonlinear Landau damping. Phys. Rev. E, 62(3):4109, 2000.
  • [4] F. Califano, L. Galeotti, and A. Mangeney. The Vlasov-Poisson model and the validity of a numerical approach. Phys. Plasmas, 13(8):082102, 2006.
  • [5] C. Cheng and G. Knorr. The integration of the Vlasov equation in configuration space. J. Comput. Phys., 22(3):330–351, 1976.
  • [6] Y. Cheng, I.M. Gamba, F. Li, and P.J. Morrison. Discontinuous Galerkin Methods for the Vlasov–Maxwell Equations. SIAM J. Numer. Anal., 52(2):1017–1049, 2014.
  • [7] N. Crouseilles, L. Einkemmer, and E. Faou. Hamiltonian splitting for the Vlasov–Maxwell equations. J. Comput. Phys., 283:224–240, 2015.
  • [8] N. Crouseilles, E. Faou, and M. Mehrenberger. High order Runge-Kutta-Nyström splitting methods for the Vlasov-Poisson equation. preprint, inria-00633934, 2011.
  • [9] N. Crouseilles, P. Glanc, S.A. Hirstoaga, E. Madaule, M. Mehrenberger, and J. Pétri. A new fully two-dimensional conservative semi-Lagrangian method: applications on polar grids, from diocotron instability to ITG turbulence. Eur. Phys. J. D, 68(9):252, 2014.
  • [10] N. Crouseilles, G. Latu, and E. Sonnendrücker. A parallel Vlasov solver based on local cubic spline interpolation on patches. J. Comput. Phys., 228(5):1429–1446, 2009.
  • [11] N. Crouseilles, M. Mehrenberger, and E. Sonnendrücker. Conservative semi-Lagrangian schemes for Vlasov equations. J. Comput. Phys., 229(6):1927–1953, 2010.
  • [12] N. Crouseilles, M. Mehrenberger, and F. Vecil. Discontinuous Galerkin semi-Lagrangian method for Vlasov-Poisson. In ESAIM: Proceedings, volume 32, pages 211–230. EDP Sciences, 2011.
  • [13] E. Sonnendrücker. Numerical methods for the Vlasov equation. http://icerm.brown.edu/html/programs/sp_f11/schedules/slides_workshop_1_tutorial/LectureNumericalVlasov.pdf, 2011.
  • [14] L. Einkemmer. A mixed precision semi-Lagrangian algorithm and its performance on accelerators. In High Performance Computing and Simulation (HPCS), International Conference on, 2016.
  • [15] L. Einkemmer. High performance computing aspects of a dimension independent semi-Lagrangian discontinuous Galerkin code. Comput. Phys. Commun., 202:326–336, 2016.
  • [16] L. Einkemmer. A study on conserving invariants of the Vlasov equation in semi-Lagrangian computer simulations. J. Plasma Phys., 83(2), 2017.
  • [17] L. Einkemmer and A. Ostermann. A strategy to suppress recurrence in grid-based Vlasov solvers. Eur. Phys. J. D, 68(7):197, 2014.
  • [18] L. Einkemmer and A. Ostermann. Convergence analysis of a discontinuous Galerkin/Strang splitting approximation for the Vlasov–Poisson equations. SIAM J. Numer. Anal., 52(2):757–778, 2014.
  • [19] L. Einkemmer and A. Ostermann. Convergence analysis of Strang splitting for Vlasov-type equations. SIAM J. Numer. Anal., 52(1):140–155, 2014.
  • [20] L. Einkemmer and A. Ostermann. On the error propagation of semi-Lagrange and Fourier methods for advection problems. Comput. Math. Appl., 69(3):170–179, 2015.
  • [21] L. Einkemmer and A. Ostermann. A semi-Lagrangian discontinuous Galerkin scheme for Burgers’ equation with application to the Kadomtsev–Petviashvili equation. preprint, arXiv:1701.05602, 2017.
  • [22] E. Fijalkow. Numerical solution to the Vlasov equation: The 1D code. Comput. Phys. Commun., 116(2):329 – 335, 1999.
  • [23] F. Filbet and E. Sonnendrücker. Comparison of Eulerian Vlasov solvers. Comput. Phys. Commun., 150(3):247–266, 2003.
  • [24] L. Galeotti and F. Califano. Asymptotic Evolution of Weakly Collisional Vlasov-Poisson Plasmas. Phys. Rev. Lett., 95:015002, 2005.
  • [25] S. Guisset, M. Gutnic, P. Helluy, M. Massaro, L. Navoret, N. Pham, and M. Roberts. Lagrangian/Eulerian solvers and simulations for Vlasov-Poisson. ESAIM: proceedings and surveys, 53:120–132, 2016.
  • [26] E. Hairer, C. Lubich, and G. Wanner. Geometric numerical integration: structure-preserving algorithms for ordinary differential equations, volume 2nd edition. Springer, 2006.
  • [27] C. Lehrenfeld. High order exactly divergence-free hybrid discontinuous Galerkin methods for unsteady incompressible flows. Comput. Methods Appl. Mech. Eng., 307:339–361, 2016.
  • [28] G. Manfredi. Long-time behavior of nonlinear Landau damping. Phys. Rev. Lett., 79(15):2815, 1997.
  • [29] A. Mangeney, F. Califano, C. Cavazzoni, and P. Travnicek. A numerical scheme for the integration of the Vlasov–Maxwell system of equations. J. Comput. Phys., 179(2):495–538, 2002.
  • [30] J.M. Qiu and C.W. Shu. Positivity preserving semi-Lagrangian discontinuous Galerkin formulation: theoretical analysis and application to the Vlasov–Poisson system. J. Comput. Phys., 230(23):8386–8409, 2011.
  • [31] J.A. Rossmanith and D.C. Seal. A positivity-preserving high-order semi-Lagrangian discontinuous Galerkin scheme for the Vlasov–Poisson equations. J. Comput. Phys., 230(16):6203–6232, 2011.
  • [32] N.J. Sircombe and T.D. Arber. VALIS: A split-conservative scheme for the relativistic 2D Vlasov–Maxwell system. J. Comput. Phys, 228:4773–4788, 2009.
  • [33] E. Sonnendrücker, J. Roche, P. Bertrand, and A. Ghizzo. The semi-Lagrangian method for the numerical resolution of the Vlasov equation. J. Comput. Phys., 149(2):201–220, 1999.
  • [34] C. Steiner, M. Mehrenberger, and D. Bouche. A semi-Lagrangian discontinuous Galerkin convergence. preprint, hal-00852411, 2013.
  • [35] M. Zerroukat, N. Wood, and A. Staniforth. A monotonic and positive-definite filter for a semi-Lagrangian inherently conserving and efficient (SLICE) scheme. Q.J.R. Meteorol. Soc., 131(611):2923–2936, 2005.
  • [36] M. Zerroukat, N. Wood, and A. Staniforth. The parabolic spline method (PSM) for conservative transport problems. Int. J. Numer. Meth. Fluids, 51(11):1297–1318, 2006.