# Parallel Algorithms for Successive Convolution

In this work, we consider alternative discretizations for PDEs which use expansions involving integral operators to approximate spatial derivatives. These constructions use explicit information within the integral terms, but treat boundary data implicitly, which contributes to the overall speed of the method. This approach is provably unconditionally stable for linear problems and stability has been demonstrated experimentally for nonlinear problems. Additionally, it is matrix-free in the sense that it is not necessary to invert linear systems and iteration is not required for nonlinear terms. Moreover, the scheme employs a fast summation algorithm that yields a method with a computational complexity of 𝒪(N), where N is the number of mesh points along a direction. While much work has been done to explore the theory behind these methods, their practicality in large scale computing environments is a largely unexplored topic. In this work, we explore the performance of these methods by developing a domain decomposition algorithm suitable for distributed memory systems along with shared memory algorithms. As a first pass, we derive an artificial CFL condition that enforces a nearest-neighbor communication pattern and briefly discuss possible generalizations. We also analyze several approaches for implementing the parallel algorithms by optimizing predominant loop structures and maximizing data reuse. Using a hybrid design that employs MPI and Kokkos for the distributed and shared memory components of the algorithms, respectively, we show that our methods are efficient and can sustain an update rate > 1×10^8 DOF/node/s. We provide results that demonstrate the scalability and versatility of our algorithms using several different PDE test problems, including a nonlinear example, which employs an adaptive time-stepping rule.

## Authors

• 5 publications
• 2 publications
• 1 publication
• 1 publication
04/01/2022

### Computational stability analysis of PDEs with integral terms using the PIE framework

The Partial Integral Equation (PIE) framework was developed to computati...
09/09/2020

### KNN-DBSCAN: a DBSCAN in high dimensions

Clustering is a fundamental task in machine learning. One of the most su...
03/02/2020

### High Performance Parallel Sort for Shared and Distributed Memory MIMD

We present four high performance hybrid sorting methods developed for va...
09/12/2019

### The Fast and Free Memory Method for the efficient computation of convolution kernels

We introduce the Fast Free Memory method (FFM), a new fast method for th...
11/28/2018

### An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation

We propose a novel application of coded computing to the problem of the ...
02/10/2015

### Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

In this era of large-scale data, distributed systems built on top of clu...
03/02/2018

### Specialized Interior Point Algorithm for Stable Nonlinear System Identification

Estimation of nonlinear dynamic models from data poses many challenges, ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In this work, we develop parallel algorithms using novel approaches to represent derivative operators for linear and nonlinear time-dependent partial differential equations (PDEs). We chose to investigate algorithms for these representations due to the stability properties observed for a wide range of linear and nonlinear PDEs. The approach considered here uses expansions involving integral operators to approximate spatial derivatives. Here, we shall refer to this approach as the Method of Lines Transpose (MOL) though this can be more broadly categorized within a larger class of successive convolution methods. The name arises because the terms in the operator expansions, which we describe later, involve convolution integrals whose operand is recursively or successively defined. Despite the use of explicit data in these integral terms, the boundary data remains implicit, which contributes to both the speed and stability of the representations. The inclusion of more terms in these operator expansions, when combined with a high-order quadrature method, allow one to obtain a high-order discretization in both space and time. Another benefit of this approach is that extensions to multiple spatial dimensions are straightforward as operators can be treated in a line-by-line fashion. Moreover, the integral equations are amenable to fast-summation techniques, which reduce the overall computational complexity, along a given dimension, from to , where is the number of discrete grid points along a dimension.

High-order successive convolution algorithms have been developed to solve a range of time-dependent PDEs, including the wave equation [causley2014method], heat equation (e.g., Allen-Cahn [causley2016method] and Cahn-Hilliard equations [causley2017method]), Maxwell’s equations [cheng2017asymptotic], Vlasov equation [christlieb2016weno], degenerate advection-diffusion (A-D) equation [christlieb2020_NDAD], and the Hamilton-Jacobi (H-J) equation [christlieb2019kernel, christlieb2020nonuniformHJE]. In contrast to these papers, this work focuses on the performance of the method in parallel computing environments, which is a largely unexplored area of research. Specifically, our work focuses on developing effective domain decomposition strategies for distributed memory systems and building thread-scalable algorithms using the low-order schemes as a baseline. By leveraging the decay properties of the integral representation, we restrict the calculations to localized non-overlapping subsets of the spatial domain. The algorithms presented in this work consider dependencies between nearest-neighbors (N-N), but, as we will see, this restriction can be generalized to include additional information, at the cost of additional communication. Using a hybrid design that employs MPI and Kokkos [kokkos2014] for the distributed and shared memory components of the algorithms, respectively, we show that our methods are efficient and can sustain an update rate DOF/node/s. While experimentation on graphics processing units (GPUs) shall be left to future work, we believe choosing Kokkos will provide a path for a more performant and seamless integration of our algorithms with new computing hardware.

Recent developments in successive convolution methods have focused on extensions to solve more general nonlinear PDEs, for which an integral solution is generally not applicable. This work considers discretizations developed for degenerate advection-diffusion (A-D) equations [christlieb2020_NDAD], as well as the Hamilton-Jacobi (H-J) equations [christlieb2019kernel, christlieb2020nonuniformHJE]. The key idea of these papers exploited the linearity of a given differential operator rather than the underlying equations, allowing derivatives in nonlinear problems to be expressed using the same representations developed for linear problems. For linear problems, it was demonstrated that one could couple these representations for the derivative operators with an explicit time-stepping method, such as the strong-stability-preserving Runge-Kutta (SSP-RK) methods, [gottlieb2001strong] and still obtain schemes which maintain unconditional stability [christlieb2019kernel, christlieb2020_NDAD]

. To address shock-capturing and control non-physical oscillations, the latter two papers introduced quadratures that use WENO reconstructions, along with a nonlinear filter to further control oscillations. In

[christlieb2020nonuniformHJE], the schemes for the H-J equations were extended to enable calculations on mapped grids. This paper also proposed a new WENO quadrature method that uses a basis that consists of exponential polynomials that improves the shock capturing capabilities.

Our choice in discretizing time, first, before treating the spatial quantities, is not a new idea. A well-known approach is Rothe’s method [salazar2000theoretical, schemann1998adaptive] in which a finite difference approximation is used for time derivatives and an integral equation solver is developed for the resulting sequence of elliptic PDEs (see e.g., [biros_navier_stokes, biros_stokes_2004, chiu_moore_quaife_transport_2020, quaife_krop_Rothe, quaife_moore_viscous_erosion, huang_stokes_2007, biros_BIE_elliptic_3d]). The earlier developments for successive convolution methods, such as [causley2014method], are quite similar to Rothe’s method in the treatment of the time derivatives. However, successive convolution methods differ from Rothe’s method considerably in the treatment of spatial derivatives for nonlinear problems, such as those considered in more recent work on successive convolution (see e.g., [christlieb2019kernel, christlieb2020_NDAD, christlieb2020nonuniformHJE]), as Newton iteration can be avoided on nonlinear terms. Additionally, these methods do not require solutions to linear systems. In contrast, Nyström methods, which are used to discretize the integral equations in Rothe’s method, result in dense linear systems, which are typically solved using an iterative method such as GMRES [saad_GMRES_1986]. Despite the fact that the linear systems are well-conditioned, the various collective operations that occur in distributed GMRES solves can become quite expensive on large computing platforms.

Similarly, in [bruno_lyon_pt_1, bruno_lyon_pt_2], Bruno and Lyon introduced a spectral method, based on the FFT, for computing spatial derivatives of general, possibly non-periodic, functions known as Fourier-Continuation (FC). They combined this representation with the well-known Alternating-Direction (AD) methods, e.g., [douglasADIheat, douglasADI3space, Peaceman_Rachford], dubbed FC-AD, to develop implicit solvers suitable for linear equations. This resulted in a method capable of computing spatial derivatives in a rapidly convergent and dispersionless fashion. A domain decomposition technique for the FC method is described in [FC-CNS_pt_1] and weak scaling was demonstrated to 256 processors, using 4 processors per node, but larger runs are not considered. Another related transform approach was developed to solve the linear wave equation [anderson_hybrid_2020]

. This work introduced a windowed Fourier methodology and combined this with a frequency-domain boundary integral equation solver to simulate long-time, high-frequency wave propagation. While this particular work does not focus on parallel implementations, they suggest several generic strategies, including a trivially parallelizable approach that solves a collection of frequency-domain integral equations in parallel. However, a purely parallel-in-time approach may not be appropriate for massively parallel systems, especially if few frequencies are required across time. This issue may be further complicated by the parallel implementation of the frequency-domain integral equation solvers, which, as previously mentioned, require the solution of dense linear systems. Therefore, it may be a rather difficult task to develop robust parallel algorithms, which are capable of achieving their peak performance.

This paper is organized as follows: In section 2, we provide an overview of the numerical scheme used to formulate our algorithms. To this end, we first illustrate the connections among several characteristically different PDEs through the appearance of common operators in section 2.1. Using these fundamental operators, we define the integral representation in section 2.2, which is used to approximate spatial derivatives. Once the representations have been introduced, we briefly discuss the complications associated with boundary conditions and provide the relevant information, in section 2.3, for implementing boundary conditions used in our numerical tests. Sections 2.5 and 2.4 briefly review the spatial discretization process (including the fast-summation method and the quadrature) and the coupling with time integration methods, respectively. Section 3 provides the details of our new domain decomposition algorithm, beginning with the derivation of the so-called N-N conditions in section 3.1. Using these conditions, we show how this can be used to enforce boundary conditions, locally, for first and second derivative operators (section 3.2 and section 3.3, respectively). Details concerning the implementation of the parallel algorithms are contained entirely in section 4. This includes the introduction of the shared memory programming model (section 4.1), the definition of a certain performance metric (section 4.2) used in both loop optimization experiments and scaling studies (section 4.3), the presentation of shared memory algorithms (section 4.4), and, lastly, implementation details concerning the distributed memory algorithms (section 4.5). Section 5 contains the core numerical results which confirm the convergence (section 5.1), as well as, the weak and strong scalability (section 5.2 and section 5.3, respectively) of the proposed algorithms. In section 5.4, we examine the impact of the restriction posed by the N-N conditions. Finally, we summarize our findings with a brief conclusion in section 6.

## 2 Description of Numerical Methods

In this section, we outline the approach used to develop unconditionally stable solvers making use of knowledge for linear operators. We will start by demonstrating the connections between several different PDEs using operator notation, which will allow us to reuse, or combine, approximations in several different ways. Once we have established these connections, we define an appropriate “inverse” operator and use this to develop the expansions used to represent derivatives. The representations we develop for derivative operators are motivated by the solution of simple 1-D problems. However, in multi-dimensional problems, these expressions are still valid approximations, in a certain sense, even though the kernels in the integral representation may not be solutions to the PDE in question. While these approximations can be made high-order in both time and space, the focus of this work is strictly on the scalability of the method, so we will limit ourselves to formulations which are first-order in time. Note that the approach described in sections 4 and 3, which considers first-order schemes, is quite general and can be easily extended for high-order representations. Once we have discussed our treatment of derivative terms, we describe the fast summation algorithm and quadrature method in section 2.4. Despite the fact that this work only considers smooth test problems, we include the relevant modifications required for non-smooth problems for completeness. In section 2.5, we illustrate how the representation of derivative operators can be used within a time stepping method to solve PDEs.

### 2.1 Connections Among Different PDEs

Before introducing the operators relevant to successive convolution algorithms, we establish the operator connections appearing in several linear PDE examples. This process helps identify key operators that can be represented with successive convolution. Specifically, we shall consider the following three prototypical linear PDEs:

• [topsep=1em,itemsep=1em]

• Diffusion equation: ,

• Wave equation: .

Next, we apply an implicit time discretization to each of these problems. For discussion purposes, we shall consider lower-order time discretizations, i.e., backward-Euler for the and a second-order central difference for . If we identify the current time as , the new time level as , and , then we obtain the corresponding set of semi-discrete equations:

• [topsep=1em,itemsep=1em]

• Diffusion equation: ,

• Wave equation: .

Here, we use to denote the identity operator, and, in all cases, each of the spatial derivatives are taken at time level to keep the schemes implicit. The key observation is that the operator arises in each of these examples. Notice that

 (I−1α2∂xx)=(I−1α∂x)(I+1α∂x),

where is a parameter that is selected according to the equation one wishes to solve. For example, in the case of diffusion, one selects

 α=β√νΔt,

while for the linear advection and wave equations, one selects

 α=βcΔt.

The parameter , which does not depend on , is then used to tune the stability of the approximations. For test problems appearing in this paper, we always use . In section 2.2, we demonstrate how the operator can be used to approximate spatial derivatives. We remark that for second derivatives, one can also use to obtain a representation for second order spatial derivatives, instead of factoring into “left” and “right” characteristics.

Next, we introduce the following definitions to simplify the notation:

 LL≡I−1α∂x,LR≡I+1α∂x. (2.1)

Written in this manner, these definitions indicate the left and right-moving components of the characteristics, respectively, as the subscripts are associated with the direction of propagation. For second derivative operators, which are not factored into first derivatives, we shall use

 L0≡I−1α2∂xx. (2.2)

In order to connect these operators with suitable expressions for spatial derivatives, we need to define the corresponding “inverse” for each of these linear operators on a 1-D interval . These definitions are given as

 L−1L[ ⋅ ;α](x) ≡α∫bxe−α(s−x)(⋅)ds+Be−α(b−x), (2.3) ≡IL[ ⋅ ;α](x)+Be−α(b−x), L−1R[ ⋅ ;α](x) ≡α∫xae−α(x−s)(⋅)ds+Ae−α(x−a), (2.4) ≡IR[ ⋅ ;α](x)+Ae−α(x−a).

These definitions can be derived in a number of ways. In appendix A, we demonstrate how these definitions can be derived for the linear advection equation using the integrating factor method. In these definitions, and are constants associated with the “homogeneous solution” of a corresponding semi-discrete problem and are used to satisfy the boundary conditions. In a similar way, one can compute the inverse operator for definition (2.2), which yields

 L−10[ ⋅ ;α](x) ≡α2∫bae−α|x−s|(⋅)ds+Ae−α(x−a)+Be−α(b−x), (2.5) ≡I0[ ⋅ ;α](x)+Ae−α(x−a)+Be−α(b−x).

In these definitions, we refer to “” as the operand and, again, is a parameter selected according to the problem being solved. Although it is a slight abuse of notation, when it is not necessary to explicitly indicate the parameter or the point of evaluation, we shall place the operand inside a pair of parenthesis.

If we connect these definitions to each of the linear semi-discrete equations mentioned earlier, we can determine the update equation through an analytic inversion of the corresponding linear operator(s):

• [topsep=1em,itemsep=1em]

• Linear advection equation: , or ,

• Diffusion equation: , or ,

• Wave equation: , or ,

with the appropriate choice of for the problem being considered. We note that each of these methods can be made high-order following the work in [christlieb2019kernel, christlieb2020_NDAD, christlieb2020nonuniformHJE, causley2016method, causley2014higher], where it was demonstrated that these approaches lead to methods that are unconditionally stable to all orders of accuracy for these linear PDEs, even with variable wave speeds or diffusion coefficients. Since the process of analytic inversion yields an integral term, a fast-summation technique should be used to reduce the computational complexity of a naive implementation, which would otherwise scale as . Some details concerning the spatial discretization and the fast-summation method are briefly summarized in section 2.4 (for full details, please see [causley2016method]). Next we demonstrate how the operator can be used to approximate spatial derivatives.

### 2.2 Representation of Derivatives

In the previous section, we observed that characteristically different PDEs can be described in-terms of a common set of operators. The focus of this section shall be on manipulating these approximations to obtain a high-order discretization in time through certain operator expansions. The process begins by introducing an operator related to , namely,

 D∗≡I−L−1∗, (2.6)

where can be , , or . The motivation for these definitions will become clear soon. Additionally, we can derive an identity from the definitions (2.6). By manipulating the terms we quickly find that

 L∗≡(I−D∗)−1, (2.7)

again, where can be , , or . The purpose of the identity (2.7) is that it connects the spatial derivative to an expression involving integrals of the solution rather than derivatives. In other words, it allows us to avoid having to use a stencil operation for derivatives.

To obtain an approximation for the first derivative in space, we can use , , or both of them, which may occur as part of a monotone splitting. If we combine the definition of the left propagating first derivative operator in equation (2.1) with the definition (2.6) and identity (2.7), we can define the first derivative in terms of the operator. Observe that

 ∂+x =α(I−LL), =α(LLL−1L−LL), =αLL(L−1L−I), =−αLL(I−L−1L), =−α(I−DL)−1DL, =−α∞∑p=1DpL, (2.8)

where, in the last step, we used the fact that the operator is bounded by unity in an operator norm. We use the convention to indicate that this is a right-sided derivative. Likewise, for the right propagating first derivative operator, we find that the complementary left-biased derivative is given by

 ∂−x=α∞∑p=1DpR. (2.9)

Additionally, the second derivative can be expressed as

 ∂xx=−α2∞∑p=1Dp0. (2.10)

As the name implies, each power of is successively defined according to

 Dk∗≡D∗(Dk−1∗). (2.11)

In previous work, [christlieb2020_NDAD], for periodic boundary conditions, it was established that the partial sums for the left and right-biased approximations to satisfy

 ∂+x=−α(n∑p=1DpL+O(1αn+1)),∂−x=α(n∑p=1DpR+O(1αn+1)). (2.12)

Similarly for second derivatives, with periodic boundaries, retaining terms leads to a truncation error with the form

 ∂xx=−α2(n∑p=1Dp0+O(1α2n+2)). (2.13)

In both cases, the relations can be obtained through a repeated application of integration by parts with induction. These approximations are still exact, in space, but the integral operators nested in will eventually be approximated with quadrature. From these relations, we can also observe the impact of on the size of the error in time. In particular, if we select in (2.12) and in (2.13), each of the approximations should have an error of the form . The results concerning the consistency and stability of these higher order approximations were established in [christlieb2019kernel, christlieb2020_NDAD, causley2014higher].

As mentioned earlier, this work only considers approximations which are first-order with respect to time. Therefore, we shall restrict ourselves to the following operator representations:

 ∂+x≈−αDL,∂−x≈αDR,∂xx≈−α2D0. (2.14)

This is a consequence of retaining a single term from each of the partial sums in equations (2.8), (2.9), and (2.10). Consequently, computing higher powers of is unnecessary, so the successive property (2.11

) is not needed. However, it indicates, clearly, a possible path for higher-order extensions of the ideas which will be presented here. For the moment, we shall delay prescribing the choice of

used in the representations (2.14), in order to avoid a problem-dependent selection. As alluded to at the beginning of this section, an identical representation is used for multi-dimensional problems, where the operators are now associated with a particular dimension of the problem. The operators along a particular dimension are constructed using data along that dimension of the domain, so that each of the directions remains uncoupled. This completes the discussion on the generic form of the representations used for spatial derivatives. Next, we provide some information regarding the treatment of boundary conditions, which determine the constants and appearing in the operators.

### 2.3 Comment on Boundary Conditions

The process of prescribing values of and , inside equations eqs. 2.5, 2.4 and 2.3, which are required to construct , is highly dependent on the structure of the problem being solved. Previous work has shown how to prescribe a variety of boundary conditions for linear PDEs (see e.g., [causley2014method, causley2016method, causley2017method, causley2014higher]). For example, in linear problems, such as the wave equation, with either periodic or non-periodic boundary conditions, one can directly enforce the boundary conditions to determine the constants and . The situation can become much more complicated for problems which are both nonlinear and non-periodic. For approximations of at most third-order accuracy, with nonlinear PDEs, one can use the techniques from [christlieb2019kernel] for non-periodic problems. To achieve high-order time accuracy, the partial sums for this case were modified to eliminate certain low-order terms along the boundaries. We note that the development of high-order time discretizations, subject to non-trivial boundary conditions, for nonlinear operators, is still an open area of research for successive convolution methods.

As this paper concerns the scalability of the method, we shall consider test problems that involve periodic boundary conditions. For periodic problems defined on the line interval , the constants associated with the boundary conditions for first derivatives are given by

 A=IR[v;α](b)1−μ,B=IL[v;α](a)1−μ, (2.15)

where and were defined in equations (2.3) and (2.4). Similarly, for second derivatives, the constants can be determined to be

 A=I0[v;α](b)1−μ,B=I0[v;α](a)1−μ, (2.16)

with the definition of coming from (2.5). In the expressions (2.15) and (2.16) provided above, we use

 μ≡e−α(b−a),

where is the appropriately chosen parameter. Note that the function denotes the generic operand of the operator . This helps reduce the complexity of the notation when several applications of are required, since they are recursively defined. As an example, suppose we wish to compute , where is some known function. For this, we can use the first-order scheme for second derivatives (see (2.14)) and take in the expressions (2.16) for the boundary terms.

### 2.4 Fast Convolution Algorithm and Spatial Discretization

To perform a spatial discretization over , we first create a grid of points:

 xi=a+iΔxi,i=0,⋯,N,

where

 Δxi=xi+1−xi.

A naive approach to computing the convolution integral would lead to method of complexity , where is the number of grid points. However, using some algebra, we can write recurrence relations for the integral terms which comprise :

 IR[v;α](xi)=e−αΔxi−1IR[v;α](xi−1)+JR[v;α](xi),IR[v;α](x0)=0, IL[v;α](xi)=e−αΔxiIL[v;α](xi+1)+JL[v;α](xi),IL[v;α](xN)=0.

Here, we have defined the local integrals

 JR[v;α](xi) =α∫xixi−1e−α(xi−s)v(s)ds, (2.17) JL[v;α](xi) =α∫xi+1xie−α(s−xi)v(s)ds. (2.18)

By writing the convolution integrals this way, we obtain a summation method which has a complexity of . Note that the same algorithm can be applied to compute the convolution integral for the second derivative operator by splitting the integral at a point . After applying the above algorithm to the left and right contributions, we can recombine them through “averaging” to recover the original integral. While a variety of quadrature methods have been proposed to compute the local integrals (2.18) and (2.17) (see e.g., [christlieb2020nonuniformHJE, causley2014method, causley2016method, causley2017method, causley2014higher, causley2013method]), we shall consider sixth-order quadrature methods introduced in [christlieb2020_NDAD]

, which use WENO interpolation to address both smooth and non-smooth problems. In what follows, we describe the procedure for

, since the reconstruction for is similar. This approximation uses a six-point stencil given by

 S(i)={xi−3,⋯,xi+2},

which is then divided into three smaller stencils, each of which contains four points, defined by for . We associate with the shift in the stencil. A graphical depiction of this stencil is provided in fig. 1. The quadrature method is developed as follows:

1. [topsep=1em,itemsep=1em]

2. On each of the small stencils ., we use the approximation

 J(r)R[v;α](xi)≈α∫xixi−1e−α(xi−s)pr(s)ds=3∑j=0c(r)−3+r+jv−3+r+j, (2.19)

where is the Lagrange interpolating polynomial formed from points in and are the interpolation coefficients, which depend on the parameter and the grid spacing, but not .

3. In a similar way, on the large stencil we obtain the approximation

 JR[v;α](xi)≈α∫xixi−1e−α(xi−s)p(s)ds. (2.20)
4. When function is smooth, we can combine the interpolants on the smaller stencils, so they are consistent with the high-order approximation obtained on the larger stencil, i.e.,

 JR[v;α](xi)=2∑r=0drJ(r)R[v;α](xi), (2.21)

where are called the linear weights, which form a partition of unity. The problems we consider in this work involve smooth functions, so this is sufficient for the final approximation. For instances in which the solution is not smooth, the linear weights can be mapped to nonlinear weights using the notion of smoothness. We refer the interested reader to our previous work [christlieb2019kernel, christlieb2020_NDAD, christlieb2020nonuniformHJE] for details concerning non-smooth data sets.

In appendix C, we provide the expressions used to compute coefficients for a uniform grid, although the non-uniform grid case can be done as well [shu2009high]. In the case of a non-uniform mesh, the linear weights would become locally defined in the neighborhood of a given point and would need to be computed on-the-fly. Uniform grids eliminate this requirement as the linear weights, for a given direction, can be computed once per time step and reused in each of the lines pointing along that direction.

### 2.5 Coupling Approximations with Time Integration Methods

Here, we demonstrate how one can use this approach to solve a large class of PDEs by coupling the spatial discretizations in section 2.4 with explicit time stepping methods. In what follows, we shall consider general PDEs of the form

 ∂tU=F(t,U),

where is a collective term for spatial derivatives involving the solution variable . Possible choices for might include generic nonlinear advection and diffusion terms

 F(t,U)=∂xg1(U)+∂xxg2(U),

or even components of the HJ equations

 F(t,U)=H(U,∂xU).

To demonstrate how one can couple these approaches, we start by discretizing a PDE in time, but, rather than use backwards Euler, we use an -stage explicit Runge-Kutta (RK) method, i.e.,

 un+1=un+s∑i=1biki,

where the various stages are given by

 k1 =F(tn,un), k2 =F(tn+c2Δt,un+Δta21k1), ⋮ ks =F(tn+csΔt,un+Δts∑j=1asjkj).

As with a standard Method-of-Lines (MOL) discretization, we would need to reconstruct derivatives within each RK-stage. To illustrate, consider the nonlinear A-D equation,

 F(t,u)=∂xg1(u)+∂xxg2(u).

For a term such as , we would use a monotone Lax-Friedrichs flux splitting, i.e., , where with . Hence, a particular RK-stage can be approximated using

 F(t,u)≈−12αs∑p=1DpL[g+1(u);α]+12αs∑p=1DpR[g−1(u);α]+α2νs∑p=1Dp0[g2(u);αν].

The resulting approximation to the RK-stage can be shown to be accurate. Another nonlinear PDE of interest to us is the H-J equation

 F(t,u)=H(∂xu).

In a similar way, we would replace the Hamiltonian with a monotone numerical Hamiltonian, such as

 ^H(v−,v+)=H(v−+v+2)+r(v−,v+)v−−v+2,

where . Then, the left and right derivative operators in the numerical Hamiltonian can be replaced with

 ∂−xu=αs∑p=1DpR[u;α],∂+xu=−αs∑p=1DpL[u;α],

which, again, yields an approximation.

In previous work [christlieb2019kernel, christlieb2020_NDAD], for linear forms of , it was shown that the resulting methods are unconditionally stable when coupled to explicit RK methods, up to order 3. Extensions beyond third-order are, indeed, possible, but were not considered. For general, nonlinear problems, we typically couple an -stage RK method to a successive convolution approximation of the same time accuracy, so that the error in the resulting approximation is .

## 3 Nearest-Neighbor Domain Decomposition Algorithm

In this section, we provide the relevant mathematical definitions of our domain decomposition algorithm, which are derived from the key operators used in successive convolution. Our goal is to establish and exploit data locality in the method, so that certain reconstructions, which are nonlocal, can be independently completed on non-overlapping blocks of the domain. This is achieved in part by leveraging certain decay properties of the integral representations. Once we have established some useful definitions, we use them to derive conditions in section 3.1, which restrict the communication pattern to N-Ns. Then in sections 3.3 and 3.2, we illustrate how this condition can be used to enforce boundary conditions, in a consistent manner, for first and second derivative operators, on each of the blocks. We then provide a brief summary of these findings along with additional comments in section 3.4.

Maintaining a localized stencil is often advantageous in parallel computing applications. Code which is based on N-Ns is generally much easier to write and maintain. Additionally, messages used to exchange data owned by other blocks may not have to travel long distances within the network, provided that the blocks are mapped physically close together in hardware. Communication, even on modern computing systems, is far more expensive than computation. Therefore, an initial strategy for domain decomposition is to enforce N-N dependencies between the blocks. In order to decompose a problem into smaller, independent pieces, we separate the global domain into blocks that share borders with their nearest neighbors. For example, in the case of a 1-D problem defined on the interval we can form blocks by writing

 a=c0

with denoting the width of block . Multidimensional problems can be addressed in a similar way by partitioning the domain along multiple directions. Solving a PDE on each of these blocks, independently, requires an understanding of the various data dependencies. First, we address the local integrals . Depending on the quadrature method, the reconstruction algorithm might require data from neighboring blocks. Reconstructions based on previously described WENO-type quadratures require an extension of the grid in order to build the interpolant. This involves a “halo” region (see fig. 2), which is distributed amongst N-N blocks in the decomposition. On the other hand, more compact quadratures, such as Simpson’s method [causley2013method], do not require this data. In this case, the quadrature communication phase can be ignored.

The major task for this work involves efficiently communicating the data necessary to build each of the convolution integrals . Once the local integrals are constructed through quadrature, we sweep across the lines of the domain to build the convolution integrals. It is this operation which couples the integrals across the blocks. To decompose this operation, we first rewrite the integral operators , assuming we are in block , as

 IL[v;α](x) =α∫bxe−α(s−x)v(s)ds, =α∫ci+1xe−α(s−x)v(s)ds+α∫cNci+1e−α(s−x)v(s)ds,

and

 IR[v;α](x) =α∫xae−α(x−s)v(s)ds, =α∫cic0e−α(x−s)v(s)ds+α∫xcie−α(x−s)v(s)ds.

These relations, which assume is within the interval , elucidate the local and non-local contributions to the convolution integrals within block . Using simple algebraic manipulations, we can expand the non-local contributions to find that

and

 ∫cic0e−α(x−s)v(s)ds =i−1∑j=0∫cj+1cje−α(x−s)v(s)ds, =i−1∑j=0e−α(x−cj+1)∫cj+1cje−α(cj+1−s)v(s)ds,

for the right and left-moving data, respectively. With these relations, each of the convolution integrals can be formed according to

 IL[v;α](x)=α∫ci+1xe−α(s−x)v(s)ds+α(N−1∑j=i+1e−α(cj−x)∫cj+1cje−α(s−cj)v(s)ds), (3.1)

and

 IR[v;α](x)=α(i−1∑j=0e−α(x−cj+1)∫cj+1cje−α(cj+1−s)v(s)ds)+α∫xcie−α(x−s)v(s)ds. (3.2)

From equations (3.1) and (3.2), we observe that both of the global convolution integrals can be split into a localized convolution with additional contributions coming from preceding or successive global integrals owned by other blocks in the decomposition. These global integrals contain exponential attenuation factors, the size of which depends on the respective distances between any pair of sub-domains. Next, we use this result to derive the restriction that facilitates N-N dependencies.

### 3.1 Nearest-Neighbor Criterion

Building a consistent block-decomposition for the convolution integral is non-trivial, since this operation globally couples unknowns along a dimension of the grid. Fortunately, the exponential kernel used in these reconstructions is pleasant in the sense that it automatically generates a region of compact support around a given block. Examining the exponential attenuation factors in (3.1) and (3.2), we see that contributions from blocks beyond N-Ns become small provided that (1) the distance between the blocks is large or (2) is taken to be sufficiently large. Since we have less control over the block sizes e.g., , we can enforce the latter criterion. That is, we constrain so that

 e−αLm≤ϵ, (3.3)

where is some prescribed error tolerance, typically taken as , and denotes the length smallest block. Taking logarithms of both sides and rearranging the inequality, we obtain the bound

 −α≤log(ϵ)Lm.

Our next step is to write this in terms of the time step , using the choice of . However, the bound on the time step depends on the choice of . In section 2.1, we presented two definitions for the parameter , namely

 α≡βcmaxΔt, or α≡β√νΔt.

Using these definitions for , we obtain two conditions depending on the choice of . For the linear advection equation and the wave equation, we obtain the condition

 −βcmaxΔt≤log(ϵ)Lm⟹Δt≤−βLmcmaxlog(ϵ). (3.4)

Likewise, for the diffusion equation, the restriction is given by

 −β√νΔt≤log(ϵ)Lm⟹Δt≤1ν(βLmlog(ϵ))2. (3.5)

Depending on the problem, if the condition (3.4) or (3.5) is not satisfied, then we use the maximally allowable time step for a given tolerance , which is given by the equality component of the relevant condition. If several different operators appear in a given problem and are to be approximated with successive convolution, then each operator will be associated with its own . In such a case, we should bound the time step according to the condition that is more restrictive among (3.4) and (3.5), which can be accomplished through the choice

 Δt≤min⎛⎝−βLmcmaxlog(ϵ),1ν(βLmlog(ϵ))2⎞⎠. (3.6)

As before, when the condition is not met, then we use the equality in (3.6).

Restricting according to (3.4), (3.5), or (3.6) ensures that contributions to the right and left-moving convolution integrals, beyond N-Ns, become negligible. This is important because it significantly reduces the amount of communication, at the expense of a potentially restrictive time step. Note that in section 5.4, we analyze the limitations of such restrictions for the linear advection equation. In our future work, we shall consider generalizations of our approach, which do not require (3.4), (3.5), or (3.6). In sections 3.3 and 3.2, we demonstrate how to formulate block-wise definitions of the global operators using the derived conditions (3.4), (3.5), or (3.6).

### 3.2 Enforcing Boundary Conditions for ∂x

In order to enforce the block-wise boundary conditions for the first derivative , we recall our definitions (2.3) and (2.4) for the left and right-moving inverse operators:

 L−1L[v;α](x) =IL[v;α](x)+Be−α(b−x), (3.7) L−1R[v;α](x) =IR[v;α](x)+Ae−α(x−a). (3.8)

We can modify these definitions so that each block contains a pair of inverse operators given by

 L−1L,i[v;α](x) =IL,i[v;α](x)+Bie−α(ci+1−x), (3.9) L−1R,i[v;α](x) =IR,i[v;α](x)+Aie−α(x−ci), (3.10)

where are defined as

 IL,i[v;α](x)=α∫ci+1xe−α(s−x)v(s)ds,IR,i[v;α](x)=α∫xcie−α(x−s)v(s)ds, (3.11)

and the subscript denotes the block in which the operator is defined. As before, this assumes . To address the boundary conditions, we need to determine expressions for the constants and on each of the blocks in the domain. First, substitute the definitions (3.1) and (3.2) into (3.7) and (3.8):

 L−1L[v;α](x) =α∫ci+1xe−α(s−x)v(s)ds (3.12) +α(N−1∑j=i+1e−α(cj−x)∫cj+1cje−α(s−cj)v(s)ds)+Be−α(cN−x), L−1R[v;α](x) =α(i−1∑j=0e−α(x−cj+1)∫cj+1cje−α(cj+1−s)v(s)ds) (3.13) +α∫xcie−α(x−s)v(s)ds+Ae−α(x−c0).

Since we wish to maintain consistency with the true operator being inverted, we require that each of the block-wise operators satisfy

 L−1L[v;α](x)=L−1L,i[v;α](x),L−1R[v;α](x)=L−1R,i[v;α](x),

which can be explicitly written as

 Bie−α(ci+1−x) =(N−1∑j=i+1e−α(cj−x)IL,j[v;α](cj))+Be−α(cN−x), (3.14) Aie−α(x−ci) =(i−1∑j=0e−α(x−cj+1)IR,j[v;α](cj+1))+Ae−α(x−c0). (3.15)

Evaluating (3.14) at and (3.15) at , we obtain

 Bi =(N−1∑j=i+1e−α(cj−ci+1)IL,j[v;α](cj))+Be−α(cN−ci+1), (3.16) Ai =(i−1∑j=0e−α(ci−cj+1)IR,j[v;α](cj+1))+Ae−α(ci−c0). (3.17)

Modifying according to either (3.4) or, if necessary (3.6), results in the communication stencil shown in fig. 3. More specifically, the terms representing the boundary contributions in each of the blocks are given by

 Bi={B,i=N−1,IR,i+1[v;α](ci+1),i

and

 Ai={A,i=0,IR,i−1[v;α](ci),0

These relations generalize the various boundary conditions set by a problem. For example, with periodic problems, we can select

 B=IL,0[v;α](c0),A=IR,N−1[v;α](cN).

This is the relevant strategy employed by domain decomposition algorithms in this work.

### 3.3 Enforcing Boundary Conditions for ∂xx

The enforcement of boundary conditions on blocks of the domain for the second derivative can be accomplished using an identical procedure to the one described in section 3.2. First, we recall the inverse operator associated with a second derivative (2.5):

 L−10[v;α](x)=I0[v;α](x)+Ae−α(x−a)+Be−α(b−x), (3.18)

and define an analogous block-wise definition of (3.18) as

 L−10,i[v;α](x)=I0,i[v;α](x)+Aie−α(x−c0)+Bie−α(cN−x),

with the localized convolution integral

 I0,i[v;α](x)=α2∫ci+1cie−α|x−s|v(s)ds.

Again, the subscript denotes the block in which the operator is defined and we take . For the purposes of the fast summation algorithm, it is convenient to split this integral term into an average of left and right contributions, i.e.,

 L−10,i[v;α](x)=12(IL,i[v;α](x)+IR,i[v;α](x))+Aie−α(x−ci)+Bie−α(ci+1−x), (3.19)

where are the same integral operators shown in equation (3.11) used to build the first derivative. As in the case of the first derivative, a condition connecting the boundary conditions on the blocks to the non-local integrals can be derived, which, if evaluated at the ends of the block, results in the linear system

 Ai+Bie−αΔci =12N−1∑j=i+1e−α(cj−ci)IL,j[v;α](cj) (3.20) +12i−1∑j=0e−α(ci−cj+1)IR,j[v;α](cj+1) +Ae−α(ci−c0)+Be−α(cN−ci), Aie−αΔci+Bi =12N−1∑j=i+1e−α(cj−ci+1)IL,j[v;α](cj) (3.21) +12i−1∑j=0e−α(ci+1−cj+1)IR,j[v;α](cj+1) +Ae−α(ci+1−c0)+Be−α(cN−ci+1).

Equations (3.20) and (3.21) can be solved analytically to find that

 (AiBi)=11−e−2αΔci(1−e−αΔci−e