As a nonlocal generalization of the Laplacian , the spectral fractional Laplacian with a fraction arises in many areas of applications, such as anomalous diffusion [Carreras2001, Uchaikin2015], turbulent flows [Shlesinger1987], Lévy processes [Jourdain2005], quantum mechanics [Laskin2000], finance [Tankov2003, Cont2005] and pollutant transport [Vazquez2014]. In this paper, we develop a network-based Ritz method for solving fractional Laplacian equations using the Caffarelli-Silvestre extension.
Let be the dimension of the problem and be a bounded Lipschitz domain in . Also, suppose and is a function defined in , we consider the following spectral fractional Laplacian equation with homogeneous Dirichlet condition,
where is defined by the spectral decomposition of with the same boundary conditions. More precisely, we suppose the countable set
where is the standard inner product in . Then for any ,
We remark that another definition of the fractional Laplacian is formulated by integrals with non-local structures, and these two definitions do not coincide. It is difficult to solve fractional Laplacian equations of either definition directly using numerical methods for regular differential equations (e.g., finite difference method or finite element method) due to the non-local property of the fractional operator [Bonito2019_2, Acosta2017]. Instead, one effective approach is to use the Caffarelli-Silvestre extension [Caffarelli2007]. Specifically, let us introduce a scalar variable and consider the following -dimensional problem
where and . Suppose solves (4), then is a solution of (1) [Nochetto2015]. Consequently, one can solve the extended problem (4) with regular derivatives to avoid addressing spectral fractional differential operators, with the extra cost that (i) the dimension is increased from to ; (ii) the domain is extended from the bounded one to an unbounded cylinder . Several methods have been proposed for (4
), such as the tensor product finite elements[Nochetto2015] and the enriched spectral method using Laguerre functions [Chen2020]. We remark that the Caffarelli-Silvestre extension is exclusively for the spectral fractional Laplacian and not for the integral fractional Laplacian. Moreover, the extension technique can be extended to more general fractional symmetric elliptic operators of the form with being symmetric and positive definite and being positive.
However, conventional linear structures such as finite elements and polynomials are usually incapable of high-dimensional approximation in practice. For example, suppose a tensor product linear structure has
basis functions in each dimension, then the total degree of freedom is, which is a huge number if
is moderately large. Such a curse of dimensionality prevents one from using linear algebra structures in high-dimensional problems with. For the spectral fractional Laplacian, most existing methods based on the Caffarelli-Silvestre extension could solve numerical examples of dimension at most two, mainly due to the limitation of storage. Our primary target is to solve many physically relevant problems that appear in three or higher dimensional situations.
In recent years, deep neural networks (DNNs) are widely studied and utilized in applied problems. As a composition of simple neural functions, the DNN has parameters nonlinearly arrayed in the network structure. For a fully-connected DNN with depth , width and dimension of inputs , the total number of parameters is of . Therefore, the degree of freedom increases linearly with the dimension and DNNs are capable of dealing with high-dimensional approximation in practice. Theoretically, it is shown that DNNs have decent approximation properties for particular function spaces (e.g., Barron space). The seminal work of Barron [Barron1992, Barron1993] deduces -norm and -norm approximations of two-layer sigmoid networks. Recent work [Klusowski2018, E2019, E2020, E2020_2, Siegel2020_1, Siegel2020_2, Caragea2020, Lu2021] considers more variants of the network-based approximation for Barron-type spaces. Generally, given a Barron function , there exists a two-layer neural network with width and common-used activations such that
where is the Barron norm of , and can be , or norm under different hypothesis. It is worth mentioning that the above error bound is independent of the dimension of the input variable; hence the network-based approximation can overcome the curse of dimensionality.
In this paper, we solve (4) by Ritz method in which DNNs are employed to approximate the solution. More precisely, we reformulate (4) as a minimal Ritz energy functional problem and characterize the Sobolev space of the weak solution. Next, as a subset of the solution space, a class of DNNs is taken as the hypothesis space of the minimization. We design a special network structure for the DNN class such that (i) it satisfies the homogeneous boundary condition on in (4); (ii) it decays to zero exponentially as ; and (iii) it has a singularity behaves as for integers at . Note that the second and third properties mentioned above are also satisfied by the true solution. Consequently, the special DNNs have better approximation performance than generic DNNs, which is also observed in our numerical experiments.
Theoretically, under a Barron-type framework, we investigate the approximation error between the special DNN class and the solution space under the Sobolev norm, which has a similar form to (5). Based on that, we estimate the error of the proposed Ritz method using the special DNN class, assuming that the true solution has components in the Barron space.
In the implementation, a combination of stochastic and deterministic methods is employed to compute the integrals in the energy functional. Specifically, we utilize the quasi-Monte Carlo method and the sinc quadrature rule [Lund1992] to evaluate the integrals in terms of and , respectively. For the former, due to the potentially high dimension of , Monte Carlo-type methods are effective and easy to implement. For the latter, although the integrand in terms of is one-dimensional, it has a singular term when . While sinc quadrature is highly accurate for integrals with fractional powers and therefore preferred here. By numerical experiments, we demonstrate that our method can solve model problems up to with desired accuracy. To the best of our knowledge, this is the first attempt to solve high-dimensional fractional Laplacian equations by deep learning methods.
The rest of the paper is organized as follows. In Section 2, we reformulate the problem (4) as the minimization of an energy functional and show their equivalence. In Section 3, the fully connected neural networks are introduced. We characterize the special structures of the hypothesis space and discuss its approximation property. In Section 4, we derive the error estimate for the proposed method. Numerical experiments are presented to show the effectiveness of our method in Section 5. Finally, some concluding remarks are given in Section 6.
2 Minimization of Energy Functional
We solve the regular partial differential equation (4) under the framework of Ritz method. The equation can be transformed to an equivalent minimal functional, and we look for Sobolev weak solutions instead of classical solutions. Similarly, one can also solve (4) using Galerkin method by introducing appropriate test spaces such as in [Nochetto2015, Chen2020]. Since learning-based methods aim to find solutions via optimization, the use of Ritz method can be achieved for building such formulation.
2.1 The space of weak solutions
Let be any region and be a positive weight function. We define the weighted space as
equipped with the inner product
and the induced norm
The weight is omitted from the notations if .
We also define the weighted Sobolev space as
equipped with the norm
It is shown in [Miller2001] the solution of the extended problem (4) has a desirable property that it converges exponentially to zero as . Therefore we can define the solution space as
with the norm
Denote the trace for all by
Moreover, for column vectors or vector-valued functions, we useto denote their Euclidean norm.
2.2 Minimal energy functional
We aim to characterize the solution of (4) as a minimizer of an corresponding energy functional. For this, we define the functional
We have the following result. Assume solves (4). Then
To prove (10), for all , . Then using the fact that and integration by parts we have
Since and ,
Using the inequality , it leads to
On the other hand, suppose (10) holds. Fix any and write
Since for each , takes its minimum at , and thus
Using integration by parts we have
Especially, (18) holds for all , which implies
leading to in . And thus by (18)
Especially, takes over all functions in , which leads to in .
3 Neural Network Approximation
In the numerical computation, one aims to introduce a function set with a finite degree of freedom to approximate the solution space , and minimize in this appropriate set of functions. In many physical relevant problems, it is required to address , causing the dimension of no less than 4. Potentially high dimensions impede the usage of conventional linear structures. However, as a nonlinear structure, DNNs can approximate high-dimensional functions by moderately less degree of freedom. This inspires us to use classes of DNNs to approximate especially when is large.
3.1 Fully connected neural network
In our method, we employ the fully connected neural network (FNN) which is one of the most common neural networks in deep learning. Mathematically speaking, let
be an activation function which is applied entry-wise to a vectorto obtain another vector of the same size. Let and for , an FNN is the composition of simple nonlinear functions as follows
where ; with () and for . Here is called the width of the -th layer and is called the depth of the FNN. is the set of all parameters in
to determine the underlying neural network. Common types of activation functions include the sigmoid function. We remark that, when solving -th order differential equations, many existing network-based methods use the ReLU activation function , so that their networks are functions and can be applied by the differential operators. While in the minimization (21), only regularity is required and therefore ReLU networks suffice.
Denote , then it is clear that . Comparatively, the degree of freedom of linear structures such as finite elements and tensor product polynomials increases exponentially with . Hence FNNs are more practicable in high-dimensional approximations. For simplicity, we consider the architecture for all and denote as the set consisting of all FNNs with depth , width and activation function . In the following passage, all functions involving an FNN will be denoted with the superscript .
3.2 Special structures of the approximate class
Recent work [Lu2020, Shen2021] indicates that deep FNNs can approximate smooth functions in -norm within any desired accuracy as long as the depth or width is large enough. The approximation will be more accurate if the target function has higher regularity. However, it is shown in [Capella2011, Chen2020] that the solution of (4) has a singularity at which behaves as for . Therefore, it is not appropriate to naively use the class of generic FNNs. Instead, We aim to develop a special structure based on FNNs for the approximate class.
In the enriched spectral method [Chen2020], the solution of (18) is approximated by a structure consisting of two parts. One part is a linear combination of smooth basis functions, which approximates the regular component of the solution. The other part is a linear tensor product combination of smooth basis functions and a sequence of artificial terms , which is for the singular component of the solution. Following this idea, we use to denote any function in the approximate class, and build its structure as the combination of two parts,
where , are FNN-based smooth functions and the term is introduced to adapt to the singularity at .
Moreover, since the true solution of the extended problem (4) converges to zero as , the functions in the approximate class should also preserve this property. To realize it, we can introduce exponential terms concerning in the structure of . Specifically, we let
where are FNN-based functions and are two auxiliary scalar parameters ensuring that and converges to zero as exponentially.
In the end, as a subset of , the approximate class should also consist of functions satisfying the boundary condition; namely, . We achieve this by setting
where are generic FNNs and is a smooth function constructed particularly to satisfy on . For example, if is a hypercube , then can be chosen as ; if has a boundary characterized by a level set, say , for some continuous function , then can be chosen as . Generally, we can set , where is a particular analytic function satisfying and represents the distant between and . In practice, we expect to set as smooth as possible so that and preserve the same regularity as and , respectively.
To sum up, we build the following special structure for the approximate class,
where is the set of all trainable parameters. We denote as the class of all neural networks having the structure in (5), namely,
It is clear that is a subset of as long as is Lipschitz continuous and has a polynomial growth bound; namely, for all with a constant and an integer . In our Ritz method, is taken as the approximate class of .
3.3 Approximation property
We will show that functions in can be approximated by the special networks in as . To illustrate the approximation property, we first introduce the Barron space and then derive the error bounds for neural-network approximation, assuming that the target function has components in the Barron space. In this section, we specifically focus on the FNNs with ReLU activation and specify . For simplicity, we concatenate variables and by writing .
3.3.1 Barron space
Let us first quickly review the Barron space and norm. We will focus on the definition discussed in [E2020] which represents infinitely wide two-layer ReLU FNNs. Following Section 3.1, recall the set of two-layer ReLU FNNs without output bias is given by
For a probability measureon , we set the function
given this expression exists. For a function , we use to denote the set of all probability measures such that almost everywhere. Then the Barron norm is defined as
The infimum of the empty set is considered as . The set of all functions with finite Barron norm is denoted by . It is shown in [E2020] that equipped with the Barron norm is a Banach space which is called Barron space.
3.4 Error estimation
Let be a function in , and further assume that converges to zero exponentially as . In the error analysis, we make the following assumption that can be factorized explicitly by components vanishing on and decaying to zero exponentially as .
There exist functions , and some number such that
Especially, if Assumption 3.1 holds, we can normalize such that and . Assumption 3.1 is indeed satisfied in some situations. For example, in the case , and , the solution of (4) is given by . Note that by Taylor series , can be written as (10) with
Now we investigate the approximation error between and . It suffices to consider a special subset given by
Note in only the parameters of are free and trainable, while is fixed. Clearly, is a subset of . The following theorem and proof are referred to the result in [Muller2021] for deep Ritz method in a bounded domain.
Let . If Assumption 3.1 is true with and , then there exists some such that
By the definition of the Barron norm, there exsits some probability measure such that a.e. and . For all , using the facts and we have
Since , we have
While is bounded above since
On the other hand, the mapping
is continuous and hence Bochner measurable. Also, (17) leads to
which implies the integral is a Bochner integral.
We note the fact that if
are independent samples from a random variable, then
By similar argument, for independent samples from , we have
In particular, there exists such that
4 Ritz method and Error Estimation
The extended problem (4) can be solved practically by Ritz method. Thanks to the approximation property of the neural network class discussed in Section 3.3, we can replace the hypothesis space with in the minimization (21), obtaining
Then the solution of (1) will be an approximation to the solution of (21). Same as in Section 3.3, we will estimate the solution error for the two-layer ReLU networks; namely, we consider the case that and . The final error of the original problem (1) will thereafter be presented.
4.1 Error estimation
Let be a minimizer of (21); namely,
We first introduce the following Céa Lemma [Miller2001]. Let be a Hilbert space, any subset and a symmetric, continuous and -coercive bilinear form. For define the quadratic energy and denote its unique minimizer by . Then for every it holds that