# Fast algorithm for computing nonlocal operators with finite interaction distance

Developments of nonlocal operators for modeling processes that traditionally have been described by local differential operators have been increasingly active during the last few years. One example is peridynamics for brittle materials and another is nonstandard diffusion including the use of fractional derivatives. A major obstacle for application of these methods is the high computational cost from the numerical implementation of the nonlocal operators. It is natural to consider fast methods of fast multipole or hierarchical matrix type to overcome this challenge. Unfortunately the relevant kernels do not satisfy the standard necessary conditions. In this work a new class of fast algorithms is developed and analyzed, which is some cases reduces the computational complexity of applying nonlocal operators to essentially the same order of magnitude as the complexity of standard local numerical methods.

## Authors

• 10 publications
• 6 publications
10/26/2021

### Fast Solution Methods for Fractional Differential Equations in the Modeling of Viscoelastic Materials

Fractional order models have proven to be a very useful tool for the mod...
07/19/2012

### Exploring the rationality of some syntactic merging operators (extended version)

Most merging operators are defined by semantics methods which have very ...
01/01/2019

### High order numerical schemes for solving fractional powers of elliptic operators

In many recent applications when new materials and technologies are deve...
03/03/2020

01/20/2021

### Exponential-sum-approximation technique for variable-order time-fractional diffusion equations

In this paper, we study the variable-order (VO) time-fractional diffusio...
05/18/2022

### Symbolic-Numeric Factorization of Differential Operators

We present a symbolic-numeric Las Vegas algorithm for factoring Fuchsian...
11/05/2011

### Covariant fractional extension of the modified Laplace-operator used in 3D-shape recovery

Extending the Liouville-Caputo definition of a fractional derivative to ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Nonlocal operators as alternatives to the local differential operators have been widely applied to the modeling of a number of physical and biological processes over the last few years. One example is given by the peridynamics models for brittle materials [25, 3, 10, 14, 15, 26], and another is nonstandard diffusion including the use of fractional derivatives [1, 4, 6, 9, 21, 20]. Moreover, even in the case of solving differential equations, nonlocal integral relaxations to local differentiations are sometimes introduced as a numerical technique, see [11, 19, 22, 23].

While the nonlocal models can provide more accurate descriptions of a physical system, the nonlocality also increases the computational cost compared to classical local models based on partial differential equations (PDEs). Thus it is imperative to develop fast algorithms for the computation of nonlocal models. In this work, we focus on nonlocal models with finite nonlocal interaction distances (see

[28]). Each point in the domain interacts with points within certain “horizon” of it. That is to say the interaction kernel is compactly supported in certain finite region and may also be singular at zero distance. When the interaction region is spherical, we call the radius of the region the horizon radius or just horizon in short. Such kernels make most fast algorithms that utilizing the low rank property of the far field interactions fail to work, because the kernel is typically not regular around the boundary of its support. In the work of [33, 31], FFT based fast algorithms are developed for solving peridynamic models, in which Toeplitz structures of matrices are greatly exploited. However, such methods fail to work for the more generals cases of inhomogeneuous media and variable horizons resulting in non-translation invariant kernels, which is the concern of this work.

The peridynamics (PD) theory of solid mechanics takes into account of large scale deformations and reformulates the internal force of a deformed body by introducing the nonlocal way of interactions of materials points. Let denote the displacement field, the force balance equation in the bond-based peridynamic theory is given by

 −∫Hxf(u(y)−u(x),y,x)dy=b(x), (1.1)

where the integration is limited to the finite region . denotes the body force and

is the force density function, which contains all the constitutive properties. For a prototype micro-elastic material, the force vector

points to the direction of the deformed bond with its magnitude a function of the bond relative elongation(stretch). In general, depends nonlinearly on the bond relative elongation, which gives the model the flexibility to describe changes of material properties. In this work, we will only consider the linear model obtained from the assumption of small deformation, namely,

 f(u(y)−u(x),y,x)=ω(x,y)[e⋅(u(y)−u(x))]e, (1.2)

where is the unit vector in the bond direction , and is the micromodulus function satisfying

 ∫Hx|y−x|2ω(x,y)dy<∞.

For each , is a function with support on

. In practice, this is achieved by choosing a fractional type kernel multiplied with the characteristic function or the conical function on

(see [14]), where both choices create non-smooth transition to zero at the boundary .

As we can see, the discretization of the nonlocal operator in (1.1) results in a matrix with high density, for which fast algorithms must be considered in order to lower the cost of matrix multiplication and inversion. There are several classical algorithms for the treatment of non-sparse matrices, including the fast multipole methods (FMMs) [12, 13, 32] and the hierarchical matrix techniques [16, 17]. More recent developments can be found in [18]. We will first consider the existing fast solvers to see whether they are effective in the computation of nonlocal models like (1.1). In Section 2 we will show that the performance of the fast solvers depends largely on the regularity of the kernel across the boundary . This sends the clear message of the preference of the far field behavior of the kernel function in terms of computation efficiency. Popular choices of kernels in the peridynamics simulation will cause serious problems in the interest of fast computation. We then propose in Section 3 to split the two types of singularities – one at origin and the other at the boundary – of the kernel function, and treat them separately. Once the kernel is decomposed into two, one that is smooth away from origin and the other that is smooth away from the truncation, we then use different fast algorithms that exploit the different types of smoothness of the two kernels. The classical fast algorithms such as FMMs and hierarchical matrix techniques only resolve the first type of singularity, namely that the singularity set is one point (the origin of the kernel in most cases). Notice that the second type of singularity is on the boundary set , and it is a set of codimension . In 2d the sets of singularity are boundary curves, and in 3d they are boundary surfaces. Our algorithm that deals with second type of singularity is basically a new FMM type method for kernels that exhibit singularities on codimension 1 sets. There is a reduction of computational complexity in all dimension with optimal rate for unknowns in 1d. Other potential applications of this work include computing with the retarded potentials raised in time domain boundary integral equations [24], where the potentials are discontinuous functions defined in space-time.

## 2 The role of far field behaviors of kernels

We will use the nonlocal diffusion operator to illustrate our methodology in this section. The nonlocal diffusion operator is given by

 Lu(x)=∫Hxω(x,y)(u(y)−u(x))dy. (2.3)

See more discussions of details in [6]. Here we assume that the kernel is given by

 ω(x,y)=C(x,y)δ(x,y)d+2γ(|y−x|δ(x,y)), (2.4)

where the kernel is a nonnegative function supported on with . is the diffusion coefficient and is the horizon function that satisfy

 C0≤C(x,y)≤C1, and 0<δ(x,y)≤δ1. (2.5)

Here we keep the dependence of and on , so that it could model the inhomogeneous medium, see related works [27, 30, 7].

The common practice in the peridynamics simulation (see [14]) takes to be

 γ(|s|)=c1|s|χ(|s|<1), or γ(|s|)=c2|s|(1−|s|)χ(|s|<1). (2.6)

We distinguish between the two cases:

1. If and only depends on , namely and , then (2.3) is the nonlocal diffusion operator of non-divergence type. As , we have

 Lu(x)→C(x)Δu(x).
2. If and are symmetric, namely and , then (2.3) is the nonlocal diffusion operator of divergence type. As , we have

 Lu(x)→∇(σ(x)∇u(x)), where σ(x)=C(x,x).

The FFT-based methods mentioned earlier are restrictive in the application to heterogeneous media. Here we seek algorithms that can be applied to the more general cases given by (2.4

). We use the recently proposed hierarchical interpolative factorization (HIF)

[18] method to factorize the dense matrix obtained from discretizing the nonlocal operator . The factorization of the matrix can be used to rapidly apply both and . Therefore it can serve as a direct solver or be stored and reused for iterative methods or time-dependent problems. The HIF takes advantage of the low-rank behavior of the off-diagonal entries of the dense matrix, so the far field behavior of the kernel is the central factor on the performance of the the algorithm. We will illustrate in this section that the performance of the HIF algorithm improves greatly as the smoothness of the kernel away from zero enhances. All computations are performed in MATLAB R2014b on a single core of a 1.1GHz Intel Core M CPU on a 64-bit Mac laptop.

### 2.1 Nonlocal diffusion in homogeneous media

In this section, we solve the the nonlocal problem on the interval with and with being the horizon radius of the homogeneous medium. The matrix is obtained form the discretization of the nonlocal operator , which is is based on the asymptotically compatible schemes in [29], which ensures uniform discretization error independent of the horizon function . In Table 1, the storage and the time of matrix-vector multiplication are listed for the original matrix as well as its compression after using the HIF method. The original matrix is stored using the sparse matrix representation with the MATLAB built-in matrix-vector multiplication applied to the test. The kernel has singularity at and is assumed to be smooth expect at origin and ( has singularity at and ). The number of discretization nodes is fixed to be in Table 1. The relative precision parameter for the HIF method is set to be .

The left column of Table 1 lists the different types of regularity (at ) of the kernel and the different choices of the horizon parameter . The discontinuous kernel (denoted as ) is the one that is most often used in peridynamics simulations given by

 γ(|s|)=1|s|χ(|s|<1).

The more regular () kernels are obtained by subtracting with polynomials () of degree . The details of constructing these polynomials will be given in Section 3. When , the matrix is fully dense and it does not see the tail of the kernel. As a result, the HIF works well for all the types of kernels. When gets smaller, the regularity of the kernel at plays a decisive role on the performance of the HIF. We see that when is discontinuous at , no compression of the matrix is observed using the HIF. The storage taken and computation time decrease with increasing regularity of from discontinuous () all the way to three times continuously differentiable (). In the case of , the critical improvement of the computational efficiency happens between kernel functions and kernel functions as shown in Table 1. We remark that the location of critical regularity changes with the precision parameter . When the HIF algorithm runs with higher precision, the higher critical regularity is observed; and when the HIF algorithm runs with lower precision, the lower critical regularity is observed. Similar patterns are also observed in the experiments that will be discussed in Section 2.2 and 2.3. When keeps decreasing, the effect of different regularity of the kernel will diminish. The extreme case is that equals the mesh size when the matrix is in fact tridiagonal. The case of is listed in Table 1 to compare the overhead of using the HIF algorithm with the sparse matrix. The overhead of using the HIF algorithm in this case is due to the fact that HIF does not build into itself the ability to detect the horizon and ignore the zeros outside. It will be future work to explore this possibility depending on application.

To account for the complexity of the HIF, we provide the computation time for different discretization nodes in Table 2. The horizon parameter in this table is fixed to be . The scaling results for the choice of the kernel is compared with the choice of the kernel. The complexity of the matrix-vector multiplication for the original matrix without compression is approximately in both cases. On the other hand, the HIF code behaves quite differently for the kernel and the kernel. The observed complexity of HIF with the kernel is still , while the complexity of HIF with the kernel is just around .

The simulations show effectiveness of the HIF fast algorithm with the improving far field regularity of nonlocal interaction kernels. Finally, we remark that the cost of constructing the factorization is in general larger than the cost of matrix-vector multiplication. This is acceptable since one only need to construct the factorization once and reuse it for an iterative method or in time-dependent problems.

### 2.2 Nonlocal diffusion in heterogeneous media

We test the effectiveness of the HIF algorithm for nonlocal diffusion in heterogeneous media in this section. The kernel is defined as (2.4), where and . Each point has its own horizon radius . In this case, is a nonlocal diffusion operator of non-divergence type. The matrix obtained form the discretization of the nonlocal operator is now non-symmetric, in which case the FFT-based fast algorithm will not work. In Table 3, the storage and the time of matrix-vector multiplication are listed for the original matrix as well as its compression after using the HIF algorithm.

In the numerical experiments we use for so that is a function between and , as shown in Fig 1.

The kernel has singularity at and is assumed to be smooth expect at origin and . The left column of Table 3 lists the types of regularity (at and ) of the kernel and the horizon parameter . The HIF algorithm for nonlocal diffusion in heterogeneous media works similarly as the homogeneous cases. In particular, we tailored Table 3 to the case where because for the extreme cases where or , the variation of regularity of the kernel does not result in any change in the computational cost, which is exactly the same behavior observed in Table 1.

To account for the complexity, we provide the computation time for different numbers of discretization nodes in Table 4. The horizon parameter is fixed to be . Two types of kernels – the kernel and the kernel – are used in the simulation. The complexity of the matrix-vector multiplication for the original matrix without compression is as expected. Similar to the homogeneous case, the observed complexity of HIF in the heterogeneous case is again for the kernel and for the kernel. The numerical examples in Sections 2.1 and 2.2 indicate that the HIF has the same effectiveness for models in heterogeneous media as it has for models in homogeneous media if the kernel functions are sufficiently smooth away from origin.

### 2.3 Two-dimensional test

In this section, we perform 2d tests on the domain to show the effectiveness of the HIF code. We choose and . The homogeneous medium has the same the horizon radius for every point. The kernel is the same as before and has singularity at and is assumed to be smooth expect at origin and .

In Table 5, the storage and the time of matrix-vector multiplication are listed for the original matrix as well as its compression after using the HIF method. The original matrix is again stored using the sparse matrix representation and the MATLAB built-in matrix-vector multiplication is used to record the computation time in the right column of Table 5. The relative precision is chosen to be in the 2d test. We remark that in the case of , the critical improvement of the computational efficiency happens between discontinuous kernel functions and kernel functions as a result of the choice of the precision parameter . Similarly as in 1d, higher critical regularity is observed if is getting smaller.

Table 6 shows the storage and time for different value of with horizon parameter fixed to be . The discontinuous () kernel and the kernel are used in the simulation. Here we remark that we choose the kernel to be compared with the kernel because is right before the critical improvement of efficiency happens with the increase of regularity as seen in Table 5. This is a result of the particular choice of relative precision . As becomes smaller, higher regularity is needed in order for the critical improvement of efficiency to happen. We observe that the complexity of the matrix-vector multiplication for the original matrix without compression is approximately . HIF also gives computation time for matrix-vector multiplication with the discontinuous kernel being used, while in the case of kernel, HIF performs much better and the observed complexity for matrix-vector multiplication is about .

To sum up, the simulations in Sections 2.1, 2.2 and 2.3 show that the popular choice of the kernel in (2.6) is problematic in the interest of fast computation since it has non-smooth truncation at . On the other hand, when the regularity of the kernel gets better the HIF fast algorithm becomes more effective. Those observations motivate us to propose the splitting method that deals kernels with non-smooth truncation in the simulations of the peridynamic model.

## 3 The splitting of singularities

Popular choices of kernels in the peridynamic simulations contain two types of singularities: the singularity at origin and the non-smooth truncation at the boundary . The first type singularity is handled by the HIF algorithm through the idea of far-field compression. In 1d, the compression is performed through a block matrix with small block sizes near the diagonal. The algorithm is thus effective when the kernel is sufficiently smooth away from origin, as we have seen in the previous numerical examples. Next, we propose to split the two types of singularities and treat them respectively. In Fig 2, the left plot is the kernel . The kernel splits into two parts, the middle plot which has singularity at zero but away from zero, and the right plot which is discontinuous at but smooth in the interior. In practice, the kernel is written into

 γ(|s|)=κ(|s|)+p2K(s)χ(|s|<1),

where is regular away from zero, and is an even polynomial of degree that matches and its derivatives at up to the -th order. For the choice of , is explicitly given by the following formulas for .

 p2K(s)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩1K=032−12s2K=1158−54s2+38s4K=2178−2s2+98s4−14s6K=3

As suggested by Section 2, the nonlocal operator corresponding to the kernel can be treated by the HIF fast algorithm effectively. So we only need to deal with the second part corresponding to the kernel . Taking in the -dimensional Euclidean space, we consider the following nonlocal diffusion problem with Dirichlet type boundary

 ⎧⎪⎨⎪⎩−LPu(x):=−∫HxP(x,y)(u(y)−u(x))dy=f(x)x∈Ωu(x)=0x∈Ωc (3.7)

where is some neighborhood of , and is given by

 P(x,y)=C(x,y)δ(x,y)d+2p2K(|y−x|δ(x,y)). (3.8)

For convenience of illustration, we take , , and in the next. The generalization to for is straightforward. Taking into account of the boundary condition in (3.7), the nonlocal operator is to be evaluated at each :

 LPu(x)=C(x)δ(x)d+2(∫Hx∩Ωu(y)dy−|Hx|u(x)). (3.9)

We therefore need only to find a fast algorithm compute . Our algorithm starts by the hierarchical decomposition of the space , which is similar to the one in standard FMM [12]. Assume that the domain is uniformly be the total number of discretization nodes. Let , then from level to , the computation domain is hierarchically subdivided into panels. Each panel in the -th level can be represented by one of the cubes , .

The hierarchical subdivision of forms a tree structure. Fig 3 is the quadtree formed by the subdivision of the two-dimensional domain . The blue dot in Fig 3 represents the original domain , and it has four children represented by the red dots in level . The subdivision of the domains in level results in subdomains in level represented by the black dots. The hierarchical subdivision is performed until reaching the -th level. Similarly, in 1d a binary tree will be formed by the subdivision of and in 3d an octree will be formed by the subdivision of .

We now present the algorithm. It contains the following steps that will be explained in detail.

1. Initialization. Traverse the tree from top to bottom to obtain the decomposition of for each grid point .

2. Traverse the tree from bottom to top to obtain the partial sums for each panel in the -th level.

3. Approximate the integral by using (1) and (2).

Step 1. Initialization step. For each on the grid point, decompose the domain of integration using the panels in (). The decomposition of for each is stored for reuse. More specifically, we traverse the tree from top to bottom to determine whether a panel from level is included or not, and then express , where is the set of indices at the -th to be included in the decomposition. We may do this recursively by calling the function recur(), where recur is defined in Algorithm 1 below.

Let us now discuss on the complexity of performing the recur function when is a -dimensional ball centered at . The main cost of running the recur function comes from the intersection test of balls and cubes. To test whether a cube is fully contained in a ball, we only need to test whether the total of corners of are all contained in the ball. This requires a total of operations, where the constant in is independent of the dimension . Now to test whether a cube has nontrivial intersection with a ball, we use the algorithm given in [2], which is an algorithm that determines whether an axis-aligned bounding box (AABB) intersects a ball. See the function defined in Algorithm 2, where the inputs are three vectors and a positive number . The vector stores the minima of the AABB for each axis, stores the maxima of the AABB for each axis, and and are the center and radius of the ball resepctively. Combining the above discussions, we know that each time calling the recur function requires operations, where is independent of dimension.

Fig 4 shows an example of decomposing a circular region in 2d. Now the total complexity of Step 1 is obviously equal to times the total number of times the recur function is called. Now for each discretization node (), let denote the number of times the recur function is called in order to decompose the domain . The total number of times the recur function is called is then given by . Now observe that is also equal to the number of panels that intersect with the boundary non-trivially, namely

 Nd(xi)=#L⋃l=0{Qjl∈Ll:(Qjl)o∩∂Hxi≠∅}=L∑l=0Nld(xi),

where denotes the number of panels in the -th level that intersect with non-trivially. Assume that for each point , the set has uniform bounded mean curvature, then

can be estimated by the following formula

 Nld(xi)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩O(1)for d=1;O(N(2d)l)1−1/d% for d≥2.

To sum up, the total complexity of Step 1, which equals is given by

 {O(NlogN)for d=1;Cd⋅N2−1/dfor d≥2, (3.10)

where the dimensional dependent constant is of order .

Step 2. Compute the the partial sums for each tree node . For a given vector , we assign the value to a leaf node. Then traverse the tree bottom to top, we assign each parent node the value of the sum of all its children. The completion of this step gives us the partial sums for every panel . The partial sums obtained can then be used to approximate the integral . We remark that if the kernel function given by (3.8) is generated from the polynomial with , we then need to compute the partial sums in the form of for all , so that they can be used to approximate the integral .

Fig 5 exemplifies the way partial sums are computed for a two-dimensional problem. Each of the black dots on the left figure represents a leaf node that contains the partial sum of the function in the little black box that includes it. Then the partial sums are passed to the coarser level where each parent node presented by the red dots stores the sum of the values from its four children. The total sum of the function on the whole domain is stored in the top tree node presented by the blue dot in the right figure in Fig 5. Therefore, the computational cost of Step 2 depends only on the number of discretization nodes and the degree of the polynomial . To sum up, the complexity of Step 2 is given by , where the constant in is independent of dimension.

Step 3. For each discretization node (), use the decomposition of obtained in Step 1 and the partial sums obtained in step 2 to approximate the integral

 ∫Hxi∩Ωu(y)dy=L∑l=0∑j∈Il(xi)∫Qjlu(y)dy. (3.11)

Finally, the value of is obtained through the relation (3.9). We again remark that in the case of for the polynomial , all the partial sums of the form will be needed to approximate the desired integral. With the precomputed sums for the approximate integrals of the form in Step 2, the number of summations we need to take in (3.11) is essentially equal to defined in Step 1. By the calculations in Step 1 and taking into the account of the general case that being a polynomial of degree , complexity of Step 3 is then given by the following formulas,

 {O(KNlogN)for d=1;O(KN2−1/d)for d≥2, (3.12)

and the constants in the above estimates are independent of dimension.

### 3.1 Numerical tests

We perform numerical tests based on the proposed algorithm.

Example 1. In this example, we take in the 2-dimensional space. Assume , , and . The nonlocal operator to be evaluated is given by

 LPδu(x)=1δ4∫Bδ(x)(u(y)−u(x))dy=1δ4∫Bδ(x)∩Ωu(y)dy−|B1|δ2u(x).

Table 7 contains the experimental data of the new algorithm in comparison with two versions of matrix-vector multiplication for the original matrix. The scaling results are shown in Fig. 6. Since the initialization step (Step 1) is only needed to be performed once in an iterative method or time-dependent problem, the computation time recorded for the new algorithm is given by the cost for Step 2 and Step 3 combined together. The computation time of the new algorithm grows at the rate , which verifies that the the formula given in (3.10) for . In comparison, the two versions of matrix-vector multiplication for the original matrix shows complexity. The first version (Original - 1) is performed with the MATLAB built-in matrix-vector multiplication. Because of the nature of MATLAB, the second version (Original -2), which goes through the evaluation of the nonlocal operator at each node by using a for-loop, is also listed for a fair comparison with the new algorithm.

Example 2. To show that the algorithm can be applied to more generalized cases, we perform the numerical test on the 1d domain and the kernel function is given by (3.8) with , as shown in Fig 1. We take and a polynomial of degree . Now in Step 2 of the algorithm, we not only need to compute the partial sums , but also for all so that they can be used to approximate .

The computation time for the new algorithm is given in Table 8 in comparison with two versions of matrix-vector multiplication for the original matrix, with scaling results shown in Fig. 7. The computation time of the new algorithm grows at the rate which is slightly higher than , which relates well to our theoretical estimate of for .