1 Introduction
Nonlocal operators as alternatives to the local differential operators have been widely applied to the modeling of a number of physical and biological processes over the last few years. One example is given by the peridynamics models for brittle materials [25, 3, 10, 14, 15, 26], and another is nonstandard diffusion including the use of fractional derivatives [1, 4, 6, 9, 21, 20]. Moreover, even in the case of solving differential equations, nonlocal integral relaxations to local differentiations are sometimes introduced as a numerical technique, see [11, 19, 22, 23].
While the nonlocal models can provide more accurate descriptions of a physical system, the nonlocality also increases the computational cost compared to classical local models based on partial differential equations (PDEs). Thus it is imperative to develop fast algorithms for the computation of nonlocal models. In this work, we focus on nonlocal models with finite nonlocal interaction distances (see
[28]). Each point in the domain interacts with points within certain “horizon” of it. That is to say the interaction kernel is compactly supported in certain finite region and may also be singular at zero distance. When the interaction region is spherical, we call the radius of the region the horizon radius or just horizon in short. Such kernels make most fast algorithms that utilizing the low rank property of the far field interactions fail to work, because the kernel is typically not regular around the boundary of its support. In the work of [33, 31], FFT based fast algorithms are developed for solving peridynamic models, in which Toeplitz structures of matrices are greatly exploited. However, such methods fail to work for the more generals cases of inhomogeneuous media and variable horizons resulting in nontranslation invariant kernels, which is the concern of this work.The peridynamics (PD) theory of solid mechanics takes into account of large scale deformations and reformulates the internal force of a deformed body by introducing the nonlocal way of interactions of materials points. Let denote the displacement field, the force balance equation in the bondbased peridynamic theory is given by
(1.1) 
where the integration is limited to the finite region . denotes the body force and
is the force density function, which contains all the constitutive properties. For a prototype microelastic material, the force vector
points to the direction of the deformed bond with its magnitude a function of the bond relative elongation(stretch). In general, depends nonlinearly on the bond relative elongation, which gives the model the flexibility to describe changes of material properties. In this work, we will only consider the linear model obtained from the assumption of small deformation, namely,(1.2) 
where is the unit vector in the bond direction , and is the micromodulus function satisfying
For each , is a function with support on
. In practice, this is achieved by choosing a fractional type kernel multiplied with the characteristic function or the conical function on
(see [14]), where both choices create nonsmooth transition to zero at the boundary .As we can see, the discretization of the nonlocal operator in (1.1) results in a matrix with high density, for which fast algorithms must be considered in order to lower the cost of matrix multiplication and inversion. There are several classical algorithms for the treatment of nonsparse matrices, including the fast multipole methods (FMMs) [12, 13, 32] and the hierarchical matrix techniques [16, 17]. More recent developments can be found in [18]. We will first consider the existing fast solvers to see whether they are effective in the computation of nonlocal models like (1.1). In Section 2 we will show that the performance of the fast solvers depends largely on the regularity of the kernel across the boundary . This sends the clear message of the preference of the far field behavior of the kernel function in terms of computation efficiency. Popular choices of kernels in the peridynamics simulation will cause serious problems in the interest of fast computation. We then propose in Section 3 to split the two types of singularities – one at origin and the other at the boundary – of the kernel function, and treat them separately. Once the kernel is decomposed into two, one that is smooth away from origin and the other that is smooth away from the truncation, we then use different fast algorithms that exploit the different types of smoothness of the two kernels. The classical fast algorithms such as FMMs and hierarchical matrix techniques only resolve the first type of singularity, namely that the singularity set is one point (the origin of the kernel in most cases). Notice that the second type of singularity is on the boundary set , and it is a set of codimension . In 2d the sets of singularity are boundary curves, and in 3d they are boundary surfaces. Our algorithm that deals with second type of singularity is basically a new FMM type method for kernels that exhibit singularities on codimension 1 sets. There is a reduction of computational complexity in all dimension with optimal rate for unknowns in 1d. Other potential applications of this work include computing with the retarded potentials raised in time domain boundary integral equations [24], where the potentials are discontinuous functions defined in spacetime.
2 The role of far field behaviors of kernels
We will use the nonlocal diffusion operator to illustrate our methodology in this section. The nonlocal diffusion operator is given by
(2.3) 
See more discussions of details in [6]. Here we assume that the kernel is given by
(2.4) 
where the kernel is a nonnegative function supported on with . is the diffusion coefficient and is the horizon function that satisfy
(2.5) 
Here we keep the dependence of and on , so that it could model the inhomogeneous medium, see related works [27, 30, 7].
The common practice in the peridynamics simulation (see [14]) takes to be
(2.6) 
We distinguish between the two cases:
The FFTbased methods mentioned earlier are restrictive in the application to heterogeneous media. Here we seek algorithms that can be applied to the more general cases given by (2.4
). We use the recently proposed hierarchical interpolative factorization (HIF)
[18] method to factorize the dense matrix obtained from discretizing the nonlocal operator . The factorization of the matrix can be used to rapidly apply both and . Therefore it can serve as a direct solver or be stored and reused for iterative methods or timedependent problems. The HIF takes advantage of the lowrank behavior of the offdiagonal entries of the dense matrix, so the far field behavior of the kernel is the central factor on the performance of the the algorithm. We will illustrate in this section that the performance of the HIF algorithm improves greatly as the smoothness of the kernel away from zero enhances. All computations are performed in MATLAB R2014b on a single core of a 1.1GHz Intel Core M CPU on a 64bit Mac laptop.2.1 Nonlocal diffusion in homogeneous media
In this section, we solve the the nonlocal problem on the interval with and with being the horizon radius of the homogeneous medium. The matrix is obtained form the discretization of the nonlocal operator , which is is based on the asymptotically compatible schemes in [29], which ensures uniform discretization error independent of the horizon function . In Table 1, the storage and the time of matrixvector multiplication are listed for the original matrix as well as its compression after using the HIF method. The original matrix is stored using the sparse matrix representation with the MATLAB builtin matrixvector multiplication applied to the test. The kernel has singularity at and is assumed to be smooth expect at origin and ( has singularity at and ). The number of discretization nodes is fixed to be in Table 1. The relative precision parameter for the HIF method is set to be .
The left column of Table 1 lists the different types of regularity (at ) of the kernel and the different choices of the horizon parameter . The discontinuous kernel (denoted as ) is the one that is most often used in peridynamics simulations given by
The more regular () kernels are obtained by subtracting with polynomials () of degree . The details of constructing these polynomials will be given in Section 3. When , the matrix is fully dense and it does not see the tail of the kernel. As a result, the HIF works well for all the types of kernels. When gets smaller, the regularity of the kernel at plays a decisive role on the performance of the HIF. We see that when is discontinuous at , no compression of the matrix is observed using the HIF. The storage taken and computation time decrease with increasing regularity of from discontinuous () all the way to three times continuously differentiable (). In the case of , the critical improvement of the computational efficiency happens between kernel functions and kernel functions as shown in Table 1. We remark that the location of critical regularity changes with the precision parameter . When the HIF algorithm runs with higher precision, the higher critical regularity is observed; and when the HIF algorithm runs with lower precision, the lower critical regularity is observed. Similar patterns are also observed in the experiments that will be discussed in Section 2.2 and 2.3. When keeps decreasing, the effect of different regularity of the kernel will diminish. The extreme case is that equals the mesh size when the matrix is in fact tridiagonal. The case of is listed in Table 1 to compare the overhead of using the HIF algorithm with the sparse matrix. The overhead of using the HIF algorithm in this case is due to the fact that HIF does not build into itself the ability to detect the horizon and ignore the zeros outside. It will be future work to explore this possibility depending on application.
To account for the complexity of the HIF, we provide the computation time for different discretization nodes in Table 2. The horizon parameter in this table is fixed to be . The scaling results for the choice of the kernel is compared with the choice of the kernel. The complexity of the matrixvector multiplication for the original matrix without compression is approximately in both cases. On the other hand, the HIF code behaves quite differently for the kernel and the kernel. The observed complexity of HIF with the kernel is still , while the complexity of HIF with the kernel is just around .
The simulations show effectiveness of the HIF fast algorithm with the improving far field regularity of nonlocal interaction kernels. Finally, we remark that the cost of constructing the factorization is in general larger than the cost of matrixvector multiplication. This is acceptable since one only need to construct the factorization once and reuse it for an iterative method or in timedependent problems.
2.2 Nonlocal diffusion in heterogeneous media
We test the effectiveness of the HIF algorithm for nonlocal diffusion in heterogeneous media in this section. The kernel is defined as (2.4), where and . Each point has its own horizon radius . In this case, is a nonlocal diffusion operator of nondivergence type. The matrix obtained form the discretization of the nonlocal operator is now nonsymmetric, in which case the FFTbased fast algorithm will not work. In Table 3, the storage and the time of matrixvector multiplication are listed for the original matrix as well as its compression after using the HIF algorithm.
In the numerical experiments we use for so that is a function between and , as shown in Fig 1.
The kernel has singularity at and is assumed to be smooth expect at origin and . The left column of Table 3 lists the types of regularity (at and ) of the kernel and the horizon parameter . The HIF algorithm for nonlocal diffusion in heterogeneous media works similarly as the homogeneous cases. In particular, we tailored Table 3 to the case where because for the extreme cases where or , the variation of regularity of the kernel does not result in any change in the computational cost, which is exactly the same behavior observed in Table 1.
To account for the complexity, we provide the computation time for different numbers of discretization nodes in Table 4. The horizon parameter is fixed to be . Two types of kernels – the kernel and the kernel – are used in the simulation. The complexity of the matrixvector multiplication for the original matrix without compression is as expected. Similar to the homogeneous case, the observed complexity of HIF in the heterogeneous case is again for the kernel and for the kernel. The numerical examples in Sections 2.1 and 2.2 indicate that the HIF has the same effectiveness for models in heterogeneous media as it has for models in homogeneous media if the kernel functions are sufficiently smooth away from origin.
2.3 Twodimensional test
In this section, we perform 2d tests on the domain to show the effectiveness of the HIF code. We choose and . The homogeneous medium has the same the horizon radius for every point. The kernel is the same as before and has singularity at and is assumed to be smooth expect at origin and .
In Table 5, the storage and the time of matrixvector multiplication are listed for the original matrix as well as its compression after using the HIF method. The original matrix is again stored using the sparse matrix representation and the MATLAB builtin matrixvector multiplication is used to record the computation time in the right column of Table 5. The relative precision is chosen to be in the 2d test. We remark that in the case of , the critical improvement of the computational efficiency happens between discontinuous kernel functions and kernel functions as a result of the choice of the precision parameter . Similarly as in 1d, higher critical regularity is observed if is getting smaller.
Table 6 shows the storage and time for different value of with horizon parameter fixed to be . The discontinuous () kernel and the kernel are used in the simulation. Here we remark that we choose the kernel to be compared with the kernel because is right before the critical improvement of efficiency happens with the increase of regularity as seen in Table 5. This is a result of the particular choice of relative precision . As becomes smaller, higher regularity is needed in order for the critical improvement of efficiency to happen. We observe that the complexity of the matrixvector multiplication for the original matrix without compression is approximately . HIF also gives computation time for matrixvector multiplication with the discontinuous kernel being used, while in the case of kernel, HIF performs much better and the observed complexity for matrixvector multiplication is about .
To sum up, the simulations in Sections 2.1, 2.2 and 2.3 show that the popular choice of the kernel in (2.6) is problematic in the interest of fast computation since it has nonsmooth truncation at . On the other hand, when the regularity of the kernel gets better the HIF fast algorithm becomes more effective. Those observations motivate us to propose the splitting method that deals kernels with nonsmooth truncation in the simulations of the peridynamic model.
3 The splitting of singularities
Popular choices of kernels in the peridynamic simulations contain two types of singularities: the singularity at origin and the nonsmooth truncation at the boundary . The first type singularity is handled by the HIF algorithm through the idea of farfield compression. In 1d, the compression is performed through a block matrix with small block sizes near the diagonal. The algorithm is thus effective when the kernel is sufficiently smooth away from origin, as we have seen in the previous numerical examples. Next, we propose to split the two types of singularities and treat them respectively. In Fig 2, the left plot is the kernel . The kernel splits into two parts, the middle plot which has singularity at zero but away from zero, and the right plot which is discontinuous at but smooth in the interior. In practice, the kernel is written into
where is regular away from zero, and is an even polynomial of degree that matches and its derivatives at up to the th order. For the choice of , is explicitly given by the following formulas for .
As suggested by Section 2, the nonlocal operator corresponding to the kernel can be treated by the HIF fast algorithm effectively. So we only need to deal with the second part corresponding to the kernel . Taking in the dimensional Euclidean space, we consider the following nonlocal diffusion problem with Dirichlet type boundary
(3.7) 
where is some neighborhood of , and is given by
(3.8) 
For convenience of illustration, we take , , and in the next. The generalization to for is straightforward. Taking into account of the boundary condition in (3.7), the nonlocal operator is to be evaluated at each :
(3.9) 
We therefore need only to find a fast algorithm compute . Our algorithm starts by the hierarchical decomposition of the space , which is similar to the one in standard FMM [12]. Assume that the domain is uniformly be the total number of discretization nodes. Let , then from level to , the computation domain is hierarchically subdivided into panels. Each panel in the th level can be represented by one of the cubes , .
The hierarchical subdivision of forms a tree structure. Fig 3 is the quadtree formed by the subdivision of the twodimensional domain . The blue dot in Fig 3 represents the original domain , and it has four children represented by the red dots in level . The subdivision of the domains in level results in subdomains in level represented by the black dots. The hierarchical subdivision is performed until reaching the th level. Similarly, in 1d a binary tree will be formed by the subdivision of and in 3d an octree will be formed by the subdivision of .
We now present the algorithm. It contains the following steps that will be explained in detail.

Initialization. Traverse the tree from top to bottom to obtain the decomposition of for each grid point .

Traverse the tree from bottom to top to obtain the partial sums for each panel in the th level.

Approximate the integral by using (1) and (2).
Step 1. Initialization step. For each on the grid point, decompose the domain of integration using the panels in (). The decomposition of for each is stored for reuse. More specifically, we traverse the tree from top to bottom to determine whether a panel from level is included or not, and then express , where is the set of indices at the th to be included in the decomposition. We may do this recursively by calling the function recur(), where recur is defined in Algorithm 1 below.
Let us now discuss on the complexity of performing the recur function when is a dimensional ball centered at . The main cost of running the recur function comes from the intersection test of balls and cubes. To test whether a cube is fully contained in a ball, we only need to test whether the total of corners of are all contained in the ball. This requires a total of operations, where the constant in is independent of the dimension . Now to test whether a cube has nontrivial intersection with a ball, we use the algorithm given in [2], which is an algorithm that determines whether an axisaligned bounding box (AABB) intersects a ball. See the function defined in Algorithm 2, where the inputs are three vectors and a positive number . The vector stores the minima of the AABB for each axis, stores the maxima of the AABB for each axis, and and are the center and radius of the ball resepctively. Combining the above discussions, we know that each time calling the recur function requires operations, where is independent of dimension.
Fig 4 shows an example of decomposing a circular region in 2d. Now the total complexity of Step 1 is obviously equal to times the total number of times the recur function is called. Now for each discretization node (), let denote the number of times the recur function is called in order to decompose the domain . The total number of times the recur function is called is then given by . Now observe that is also equal to the number of panels that intersect with the boundary nontrivially, namely
where denotes the number of panels in the th level that intersect with nontrivially. Assume that for each point , the set has uniform bounded mean curvature, then
can be estimated by the following formula
To sum up, the total complexity of Step 1, which equals is given by
(3.10) 
where the dimensional dependent constant is of order .
Step 2. Compute the the partial sums for each tree node . For a given vector , we assign the value to a leaf node. Then traverse the tree bottom to top, we assign each parent node the value of the sum of all its children. The completion of this step gives us the partial sums for every panel . The partial sums obtained can then be used to approximate the integral . We remark that if the kernel function given by (3.8) is generated from the polynomial with , we then need to compute the partial sums in the form of for all , so that they can be used to approximate the integral .
Fig 5 exemplifies the way partial sums are computed for a twodimensional problem. Each of the black dots on the left figure represents a leaf node that contains the partial sum of the function in the little black box that includes it. Then the partial sums are passed to the coarser level where each parent node presented by the red dots stores the sum of the values from its four children. The total sum of the function on the whole domain is stored in the top tree node presented by the blue dot in the right figure in Fig 5. Therefore, the computational cost of Step 2 depends only on the number of discretization nodes and the degree of the polynomial . To sum up, the complexity of Step 2 is given by , where the constant in is independent of dimension.
Step 3. For each discretization node (), use the decomposition of obtained in Step 1 and the partial sums obtained in step 2 to approximate the integral
(3.11) 
Finally, the value of is obtained through the relation (3.9). We again remark that in the case of for the polynomial , all the partial sums of the form will be needed to approximate the desired integral. With the precomputed sums for the approximate integrals of the form in Step 2, the number of summations we need to take in (3.11) is essentially equal to defined in Step 1. By the calculations in Step 1 and taking into the account of the general case that being a polynomial of degree , complexity of Step 3 is then given by the following formulas,
(3.12) 
and the constants in the above estimates are independent of dimension.
3.1 Numerical tests
We perform numerical tests based on the proposed algorithm.
Example 1. In this example, we take in the 2dimensional space. Assume , , and . The nonlocal operator to be evaluated is given by
Table 7 contains the experimental data of the new algorithm in comparison with two versions of matrixvector multiplication
for the original matrix. The scaling results are shown in Fig. 6.
Since the initialization step (Step 1) is only needed to be performed once in an iterative method or timedependent problem,
the computation time recorded for the new algorithm is given by the cost for Step 2 and Step 3 combined together.
The computation time of the new algorithm grows at the rate , which verifies that the the formula given in (3.10) for . In comparison, the two versions of matrixvector multiplication
for the original matrix shows complexity. The first version (Original  1) is performed with the MATLAB builtin matrixvector multiplication. Because of the nature of MATLAB, the second version (Original 2),
which goes through the evaluation of the nonlocal operator at each node by using a forloop,
is also listed for a fair comparison
with the new algorithm.
Example 2. To show that the algorithm can be applied to more generalized cases, we perform the numerical test on the 1d domain and the kernel function is given by (3.8) with , as shown in Fig 1. We take and a polynomial of degree . Now in Step 2 of the algorithm, we not only need to compute the partial sums , but also for all so that they can be used to approximate .
The computation time for the new algorithm is given in Table 8 in comparison with two versions of matrixvector multiplication for the original matrix, with scaling results shown in Fig. 7. The computation time of the new algorithm grows at the rate which is slightly higher than , which relates well to our theoretical estimate of for .
Comments
There are no comments yet.