1 Introduction
Nonlocal equations have become the model of choice in applications where the global behavior of the system is affected by longrange forces at small scales. In particular, these equations are preferable to partial differential equations (PDEs) in presence of anomalous behavior, such as superdiffusion and subdiffusion, multiscale behavior, and discontinuities or irregularities in the solution that cannot be captured by classical models. For these reasons, nonlocal models are currently employed in several scientific and engineering applications including surface or subsurface transport
[7, 8, 24, 48, 49], fracture mechanics [31, 34, 51], turbulence [33, 42], image processing [9, 16, 28, 35] and stochastic processes [10, 18, 37, 39, 40].The most general form [19] of a nonlocal operator for a scalar function is given by
where , the kernel, is a compactly supported function over , the ball of radius centered at . We refer to as horizon or interaction radius; this quantity determines the extent of the nonlocal interactions and represents the length scale of the operator. The integral form allows one to catch longrange forces within the length scale and reduces the regularity requirements on the solutions. It also highlights the main difference between nonlocal models and PDEs, i.e., the fact that interactions can occur at distance, without contact. The kernel depends on the application and determines the regularity properties of the solutions; the choice of its parameters or functional form is among the most investigated open questions in the current nonlocal literature [42, 11, 20, 21, 30, 43, 44, 55, 57, 56]. In this work we limit ourselves to smooth integrable kernels since the treatment of more complex functions is not germane to the issues investigated in this paper, as clarified later on.
The integral nature of the operator poses several modeling and numerical challenges including the treatment of nonlocal interfaces [3, 12], the prescription of nonlocal boundary conditions [15, 23] and the design of efficient discretization schemes and numerical solvers [2, 13, 17, 22, 47, 50, 54]. In fact, the numerical solution of nonlocal equations becomes prohibitively expensive when the ratio between the interaction radius and the discretization size increases. Even though the nonlocal literature offers several examples of meshfree, particletype discretizations [14, 45, 46, 50]
, in this paper we focus on finite element (FE) methods. This allows us to easily deal with nontrivial domains, achieve highorder accuracy, and use mesh adaptivity. Furthermore, the nonlocal vector calculus
[26] provides a means for a rigorous stability and convergence analysis of variational methods as it allows us to analyze nonlocal diffusion problems in a similar way as elliptic PDEs [27].When cast in a variational form, the nonlocal problem associated with the operator results in a bilinear form characterized by the following double integral
where we explicitly reported the domain of integration in the inner integral. Here, and are the domain of interest and the corresponding “nonlocal boundary” and is an appropriate test function. Thus, the variational setting introduces further computational challenges. Not only do we have to numerically evaluate a double integral, but the integrand function is discontinuous, due to the compact support of and to the fact that, in FE settings, the tests functions are also compactly supported.
The paper by D’Elia et al.[22] thoroughly describes the challenges associated with nonlocal FE discretizations and proposes approximation techniques for efficient and accurate implementations. In particular, the authors introduce “approximate balls” that facilitate the assembly procedure by substituting the Euclidean ball with suitable polygonal approximations. They also suggest a set of quadrature rules for the outer and inner integration and analyze the convergence properties of the resulting scheme. With the same spirit, in this work, we propose an alternative way to efficiently evaluate the integral above by circumventing the issue of integrating a truncated function. The key idea of this paper is the introduction of a mollifier [41] to approximate the discontinuous kernel function; by doing so, the new, approximate, and parameterized kernel is a smooth function for which standard Gaussian quadrature rules can be employed over every element without compromising their accuracy. Additionally, we introduce adaptive quadrature rules for the numerical integration of the outer integral. In fact, contrary to intuition, sophisticated integration techniques for the outer integral are required in order to prevent the quadrature error from exceeding the FE one [22].
The main contributions of this paper are

The introduction of a parametrized, smooth, approximate kernel, by means of a mollifier, that yields a smooth integrand over every element. This allows us to avoid the tedious and impractical task of determining the intersections of the ball with the elements, and hence represents a major advantage of our method in threedimensional simulations.

The design of adaptive quadrature rules for the outer integral and of a parallel algorithm for efficient simulations.

The theoretical proof and numerical illustration of the convergence of the approximate, mollified solution to the analytic one as the mollified kernel approaches .

The numerical illustration of the convergence of the mollified solution to the exact one as we refine the mesh and a numerical study of the dependence of the convergence behavior with respect to the parameters.

The demonstration via twodimensional and threedimensional numerical tests of the scalability of our algorithm.
Outline of the paper
In the following section we define the notation that is used throughout the paper and recall important results on nonlocal calculus. In Section 3 we introduce the mollifier function and the associated approximate, parametrized weak form of the nonlocal diffusion problem. We also analyze the convergence of the solution of the latter to the original weak solution. In Section 4 we describe the nonlocal FE discretization, with special focus on the assembly procedure, and briefly recall its challenges. In Section 5 we introduce adaptive quadrature rules for the numerical integration of the outer integral. In Section 6 we illustrate our theoretical results via two and threedimensional tests. We also discuss a parallel implementation of the FE assembly procedure and show the corresponding scaling results. Finally, we summarize our contributions in Section 7.
2 Preliminaries
Let be open and bounded, . Given some , we define the interaction domain of as the set of all points not in that are within a distance from points in , i.e.
(1) 
see Figure 1 for a graphical example in . Note that depends on even if it is not explicitly indicated. Let be an integrable nonnegative symmetric kernel that is also radial^{e}^{e}eFor a discussion on nonpositive kernels and nonsymmetric kernels, see [38] and [18], respectively., namely and . We also assume that has bounded support over the ball of radius centered at , i.e. . For a scalar function we define the nonlocal Laplacian as
(2) 
The strong form of a nonlocal Poisson problem is then given by: for , and , find such that
(3) 
where the second condition in (3) is the nonlocal counterpart of a Dirichlet boundary condition for PDEs and it is referred to as Dirichlet volume constraint^{f}^{f}fFor definition and analysis of Neumann volume constraints we refer to [25] and for its numerical treatment we refer to, e.g., [23].. Such condition is required [27] to guarantee the wellposedness of (3). The weak form of the Poisson problem is obtained by multiplying the first equation in (3) by a test function in and by applying the nonlocal first Green’s identity [26]. This yields
(4)  
Then, the weak form of the nonlocal diffusion problem reads as follows. For and , find such that
(5) 
where
(6)  
and where the function spaces are defined as
(7)  
Here, the energy seminorm is defined as
(8) 
the space is the dual space of and is a nonlocal trace space. Note that since the kernel is integrable and translation invariant, the energy seminorm is a norm in the constrained space and satisfies a Poincaré inequality [27]. Furthermore, by construction, the bilinear form defines an inner product on and it is continuous and coercive with respect to the energy norm . Finally, the latter is equivalent to the norm; this allows us to establish an equivalence relationship between and . Together with the continuity of , these facts yield the wellposedness of the weak form (5) [27].
As commonly done in the PDE context, we recast the problem in by simply rewriting the solution as , where and is an extension of to zero into , known in the FE framework as a lifting function. Thus, equation (5) can be rewritten in terms of as follows
(9) 
or, equivalently,
(10) 
The latter is useful for implementation purposes as it allows us to automatically take into account the presence of a nonhomogeneous Dirichlet volume constraint.
3 Weak form approximation
We introduce a parametrized approximation of the bilinear form defined in (6) with the purpose of obtaining a weak problem that is computationally less challenging. Specifically, the approximated bilinear form is associated with a parametrized kernel function that is still integrable, radial, and compactly supported, but not discontinuous in . This fact makes the numerical integration of the inner integral in, e.g., (5), a much simpler task, compared to the case of discontinuous kernel functions.
For simplicity of exposition, we rewrite the “exact” kernel as
(11) 
where is a scaling constant that guarantees that the nonlocal operator associated with is such that as . Clearly, by definition, . Given , we approximate the bilinear form defined in (6) with the parametrized bilinear form obtained by replacing the kernel with
(12) 
where is an appropriately scaled mollifier function. Inspired by the mollifier function introduced in [41], for we define as the following radial function
(13)  
where, for given , is such that the nonlocal operator , associated with , converges to as . The constant is also such that it converges to as . Furthermore, it follows from the definition of that
which, together with the property of , implies that converges pointwise to as . Note that the support of the mollifier is bigger than the one of the original kernel function as it corresponds to . The parametrized bilinear form is therefore defined as follows
(14) 
and consequently the approximate weak formulation of the nonlocal volumeconstrained problem now reads: find such that
(15) 
Note that the weak formulation above is defined over the same function spaces of (5). This is allowed because the parametrized kernel belongs to the same class of kernels as . As a consequence, problem (15) is also wellposed.
3.1 Convergence of the approximate weak solution
We find a bound for the energy norm of the difference between solutions of the weak form (5) and (15). First, we note that by subtracting (5) from (15) we obtain
(16) 
Our ultimate goal is to find a bound for , or, equivalently, for , being and solutions of (5) and (15) respectively. We first consider a generic test function; equality (16) implies
By expanding the product, we have
By switching the order of integration and renaming dummy variables in the second and fourth terms above, we obtain
(17)  
where we used the CauchySchwarz inequality for the outer and inner integral for the first and second term, respectively, and where
We recall that, by definition, pointwise as ; thus, as , for . Furthermore, thanks to the properties of and , the integrals above are welldefined.
To obtain the final estimate, we consider
and recall that for the kernels considered in this work the energy norm is equivalent to the norm. In particular there exists a positive constant such that . Thus, we have the following estimatewhere is obtained from the constants in (17). We finally conclude that
(18) 
where the constant is such that as .
4 Finite element formulation
In this section we introduce a FE discretization of problem (10), highlight the associated computational challenges, and describe how the formulation introduced in the previous section helps circumventing them. Let be a shaperegular triangulation of into finite elements ; the latter can either be triangles and/or quadrilaterals in two dimensions and tetrahedra and/or hexaheadra in three dimensions^{g}^{g}gSee [22] for a description of appropriate triangulation techniques for nonlocal problems.. The parameter represents the size of the triangulation and corresponds to the larger element diameter. Also, let be a finite dimensional subspace of of dimension , proportional to , and let be a basis for . In this work we consider Lagrange basis functions over the triangulation . Thus, we can write the FE solution of equation (10) as . By using this expression and , equation (10) reduces to the algebraic system
(19) 
where
is the vector whose components are the degrees of freedom of the numerical solution
, is such that , and is the stiffness matrix with entries(20) 
4.1 Circumventing computational challenges
The computation of the entries of the stiffness matrix raises several diverse challenges. In this work we specifically focus on the challenges related to the presence of the indicator function in the definition of the kernel. Other challenges, such as the presence of singularities in fractionaltype kernels or peridynamics kernels are not considered here. We point out that our method can be combined with any technique that takes into account the presence of the singularity. In fact, while the singularity is located at the center of the ball, the issues considered in this paper arise at the boundary. To make our description clear, we rewrite (20) by explicitly indicating the domain of integration, i.e.
(21)  
where we split the integrals over the elements with the purpose of using composite quadrature rules. In fact, global quadrature rules used, e.g., over the ball for the inner integration are not convenient due to the basis functions’ bounded support [22].
It is evident that the first challenge that one has to face is the integration over partial elements: when the element is not fully contained in the ball, standard quadrature rules such as Gauss quadrature rules defined over are not suitable due to the presence of the discontinuity induced by the indicator function. Thus, it is necessary to determine the intersection regions and define quadrature rules there. This task, while affordable in two dimensions, becomes extremely complex and impractical in three dimensions. Furthermore, when is a Euclidean ball, such regions are curved so that appropriate approximations or quadrature rules for curved domains must be taken into account [22]. A key observation is that these issues do not arise in case of smooth kernel functions, e.g. functions that do not abruptly jump to zero outside of , but that approach zero smoothly. This would allow the use of quadrature rules defined over the whole element , circumventing the issue of determining intersections or integrating over curved regions.
The parametrized kernel introduced in Section 3 is such that the transition to zero happens smoothly (as an example, for constant kernel functions , the kernel function is a piecewise polynomial in . Thus, the inner integration can be performed over the whole element , using accurate enough quadrature rules, without worrying about the presence of a discontinuity. We then propose to solve the approximate, parametrized problem
(22) 
for which the entries of the stiffness matrix, that, with an abuse of notation, we still denote by , read
(23) 
By avoiding the problem of determining intersecting elements, this approach makes threedimensional implementation a much simpler task.
Remark 1
The convergence of the solution to the continuous solution depends on both the discretization parameter and the mollifying parameter . An adaptive quadrature procedure, introduced in the following section, will further contribute to the overall approximation error, as we discuss and illustrate in Section 6.
5 Adaptive quadrature rules
As already pointed out, the use of the mollifier, in place of the characteristic function, removes the difficulty of integrating discontinuous functions. The transition region of the mollifier has thickness
and it is important to choose quadrature rules that can appropriately capture this region, especially if . Note that the presence of the transition region affects the regularity of both the inner and the outer integrands. As we explain below, only one adaptive rule is necessary, applied to either the outer or inner integral. A quadrature rule with few points can be fast but is also inaccurate, one with many points can be accurate but is also expensive, especially in higher dimensions. To this end, adaptive quadrature rules have been proven to be accurate and efficient [41]. The advantage of using an adaptive scheme is that only the portion of the element overlapping with the transition region needs to be recursively refined, considerably reducing the computational time. Moreover, for fixed , each partitioning has the effect of halving the ratio . Thus, it is always possible to chose a number of adaptive refinements such that and for which a quadrature rule with few points is accurate enough. Note that the adaptive quadrature rule devised here is not standard, because the refinement criterion is controlled by the distance between points in the outer and inner integrals. The details of the algorithm are given below.We recall that we denote by the stiffness matrix corresponding to the parametrized bilinear form . It is convenient to rewrite its entries as , where each term is given by
(24)  
(25)  
(26)  
(27) 
The following proposition allows us to express only as a sum of two of the terms above, as we show in Corollary 1.
Proposition 1
Let and , and let be a symmetric function, then
(28) 
Proof 1
reversing the order  
Corollary 1
The entries of the stiffness matrix satisfy the following equality
.
Proof 2
The proof follows from the definition of and Proposition 1. Namely,
Let and denote sets of quadrature points and associated weights representing two different composite quadrature rules for the numerical integration over the region . To preserve the equality between and , and between and we use and to numerically evaluate the inner and outer integrals as follows:
(29)  
(30)  
(31)  
(32) 
According to Corollary 1, only two terms among the ones above need to be computed. We choose to compute and in (31) and (32). We adopt an adaptive scheme for the outer quadrature and Gaussian composite quadrature rules for the inner integration over the elements that intersect the ball. These choices are empirical, i.e. they have been guided by numerical experiments that showed that for and using an adaptive quadrature for the outer integral is more efficient than using it for the inner one.
For every we define
(33) 
and , i.e. the complement of in . For and let denote the subset of composed of those quadrature points and weights obtained by only considering the quadrature points (and associated weights) that lie within element . Then, integrals (31) and (32) can be rewritten as
(34)  
(35) 
Note that in the equations above the terms in the sum are nonzero only when the support of the basis functions intersects the elements. An adaptive quadrature with midpoint refinement is adopted for . The pseudocode that describes the adaptivity algorithm is reported in Algorithm 1: we employ recursive calls with input arguments , , , , and . In each call and are fixed parameters, and represent the minimum and the maximum level of refinement, with . is the current level of refinement. In the initial call and are the ones defined above and , while in the recursive calls these three arguments are subject to changes as described below.

If , then is split in subelements using the midpoint rule and for each of them the adaptive integration function is called again increasing by one and using the same index set .

If integration is performed on for the outer integral and on each element indexed by for the inner integral, without any further refinement for . The numerical integration is performed using standard Gauss Legendre quadrature rules both for the outer and inner integrals.

If , from the index set two new index sets and are extracted, for which is either integrated or further refined. For any in , the maximum distance from to is computed. If this distance is less than , then is added to , otherwise the minimum distance from to is computed. If this distance is less than then is added to the index set . If is nonempty, integration is performed on for the outer integral and on each element indexed by for the inner integral. If is nonempty, then is split in subelements and for each of them the adaptive integration function is called again increasing by one and using as index set.
Algorithm 1
5.1 Approximate maximum and minimum distances between elements
Evaluating the exact distances between two elements can be computationally expensive, especially for unstructured threedimensional meshes. Hence, in practice, we use conservative distances that are simple to compute in place of the maximum and the minimum. Namely, we first loop over the nodes of each element to find the minimum and maximum coordinates in each dimension, denote them by , and by , . Here and are the vectors containing the minimum and maximum coordinates of the bounding box containing . Similarly, and are the vectors containing the minimum and maximum coordinates of the bounding box containing . For each dimension evaluate the two quantities and