# A reduced order Schwarz method for nonlinear multiscale elliptic equations based on two-layer neural networks

Neural networks are powerful tools for approximating high dimensional data that have been used in many contexts, including solution of partial differential equations (PDEs). We describe a solver for multiscale fully nonlinear elliptic equations that makes use of domain decomposition, an accelerated Schwarz framework, and two-layer neural networks to approximate the boundary-to-boundary map for the subdomains, which is the key step in the Schwarz procedure. Conventionally, the boundary-to-boundary map requires solution of boundary-value elliptic problems on each subdomain. By leveraging the compressibility of multiscale problems, our approach trains the neural network offline to serve as a surrogate for the usual implementation of the boundary-to-boundary map. Our method is applied to a multiscale semilinear elliptic equation and a multiscale p-Laplace equation. In both cases we demonstrate significant improvement in efficiency as well as good accuracy and generalization performance.

There are no comments yet.

## Authors

• 14 publications
• 14 publications
• 38 publications
• 21 publications
02/11/2020

### A robust solver for elliptic PDEs in 3D complex geometries

We develop a boundary integral equation solver for elliptic partial diff...
05/25/2021

### Operator Compression with Deep Neural Networks

This paper studies the compression of partial differential operators usi...
11/01/2020

### Manifold Learning and Nonlinear Homogenization

We describe an efficient domain decomposition-based framework for nonlin...
02/20/2022

### Physics-informed neural networks for learning the homogenized coefficients of multiscale elliptic equations

Multiscale elliptic equations with scale separation are often approximat...
10/13/2021

### Solving multiscale steady radiative transfer equation using neural networks with uniform stability

This paper concerns solving the steady radiative transfer equation with ...
05/07/2019

### Variational training of neural network approximations of solution maps for physical models

A novel solve-training framework is proposed to train neural network in ...
11/25/2021

### Low-rank approximation for multiscale PDEs

Historically, analysis for multiscale PDEs is largely unified while nume...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Approximation theory plays a key role in scientific computing, including in the design of numerical PDE solvers. This theory prescribes a certain form of ansatz to approximate a solution to the PDE, allowing derivation of an algebra problem whose solution yields the coefficients in the ansatz. Various methods are used to fine-tune the process of translation to an algebraic problem, but the accuracy of the computed solution is essentially determined by the the underlying approximation theory. New approximation methods have the potential to produce new strategies for numerical solution of PDEs.

During the past decade, driven by some remarkable successes in machine learning, neural networks (NNs) have become popular in many contexts. They are extremely powerful in such areas as computer vision, natural language processing, and games

[52, 42]. What kinds of functions are well approximated by NNs, and what are the advantages of using NNs in the place of more traditional approximation methods? Some studies [10, 51, 27]

have revealed that NNs can represent functions in high dimensional spaces very well. Unlike traditional approximation techniques, the number of NN coefficients needed to represent such functions does not increase exponentially with the dimension; in some sense, they overcome the “curse of dimensionality.” This fact opens up many possibilities in scientific computing, where the discretization of high dimensional problems often plays a crucial role. One example is problems from uncertainty quantification, where many random variables are needed to represent a random field, with each random variable essentially adding an extra dimension to the PDE

[69, 70, 41, 9]. Techniques that exploit intrinsic low-dimensional structures can be deployed on the resulting high-dimensional problem [38, 14, 11, 23, 45]

. Another example comes from PDE problems in which the medium contains structures at multiple scales or is highly oscillatory, so that traditional discretization techniques require a large number of grid points to achieve a prescribed error tolerance. Efficient algorithms must then find ways to handle or compress the many degrees of freedom.

Despite the high dimensionality in these examples, successful algorithms have been developed, albeit specific to certain classes of problems. With the rise of NN approximations, with their advantages in high-dimensional regimes, it is reasonable to investigate whether strategies based on NNs can be developed that may even outperform classical strategies. In this paper, we develop an approach that utilizes a two-layer NN to solve multiscale elliptic PDEs. We test our strategy on two nonlinear problems of this type.

The use of NN in numerical PDE solvers is no longer a new idea. Two approaches that have been developed are to use NN to approximate the solutions ([25, 29, 67, 66, 71, 58, 53, 13]) or the solution map ([36, 35, 59, 37, 54, 68]). Due to the complicated and unconventional nature of approximation theory for NN, it is challenging to perform rigorous numerical analysis, though solid evidence has been presented of the computational efficacy of these approaches.

The remainder of our paper is organized as follows. In Section 2 we formulate the multiscale PDE problem to be studied. We give an overview of our domain decomposition strategy and the general specification of the Schwarz algorithm. In Section 3, we discuss our NN-based approach in detail and justify its use in this setting. We then present our reduced-order Schwarz method based on two-layer neural networks. Numerical evidence is reported in Section 4. Two comprehensive numerical experiments for the semilinear elliptic equation and the -Laplace equation are discussed, and efficiency of the methods is evaluated. We make some concluding remarks in Section 5.

## 2. Domain Decomposition and the Schwarz method for multiscale elliptic PDEs

We start by reviewing some key concepts. Section 2.1 describes nonlinear multiscale elliptic PDEs and discussed the homogenization limit for highly oscillatory medium. Section 2.2 outlines the domain decomposition framework and the Schwarz iteration strategy.

### 2.1. Nonlinear elliptic equation with multiscale medium

Consider the following general class of nonlinear elliptic PDEs with Dirichlet boundary conditions:

 (2.1) {Fϵ(D2uϵ(x),Duϵ(x),uϵ(x),x)=0,x∈Ω,uϵ(x)=ϕ(x),x∈∂Ω,

where is a domain in -dimensional space, represents the small scale, and (where denotes the space of real symmetric matrices) is a smooth function. To ensure ellipticity, we require for all that

 Fϵ(R+Q,p,u,x)≤Fϵ(R,p,u,x),

for all nonnegative semidefinite .

This class of problems has fundamental importance in modern science and engineering, in such areas as synthesis of composite materials, discovery of geological structures, and design of aerospace structures. The primary computational challenges behind all these problems lie in the complicated interplay between the nonlinearity and the extremely high number of degrees of freedom necessitated by the smallest scale. We assume that for an appropriately chosen boundary condition , the PDE (2.1) has a unique viscosity solution . For details on the theory of fully nonlinear elliptic equations, see for example, [15, 49].

To achieve a desired level of numerical error, classical numerical methods require refined discretization strategies with a mesh width , making the leading to at least degrees of freedom in the discretized problem. The resulting numerical cost is prohibitive when is small. The homogenization limit of (2.1) as can be specified under additional assumptions, such as when the medium is pseudo-periodic. Let

 (2.2) Fϵ(R,p,u,x)=F(R,p,u,x,xϵ)

for some that is periodic in the last argument with period . We have the following theorem.

###### Theorem 2.1 ([34], Theorem 3.3).

Suppose that the nonlinear function is uniform elliptic and is nondecreasing. Let be pseudo-periodic as defined in (2.2). The solution to (2.1) converges uniformly as to the unique viscosity solution of the following equation

 (2.3) {¯F(D2u∗(x),Du∗(x),u∗(x),x)=0,x∈Ω,u∗(x)=ϕ(x),x∈∂Ω,

where the homogenized nonlinear function is defined as follows: For a fixed set of , there exists a unique real number for which the following cell problem has a unique viscosity solution for some :

 (2.4) {F(D2yv(y)+R,p,u,x,y)=λ,y∈Rd,v(y+Y)=v(y),y∈Rd,

(where is the period in the last argument of ). We set .

This result can be viewed as the extension of a linear homogenization result [5]. Although the medium is highly oscillatory for small , the solution approaches that of a certain limiting equation with a one-scale structure, as . In practice, the form of the limit is typically unknown, but this observation has led to an exploration of numerical homogenization algorithms, in which one seeks to capture the limit numerically without resolving the fine scale . We view this problem as one of manifold reduction. The solution can be “compressed” significantly; its “information” is stored mostly in , which can be computed from (2.4) using mesh width , in contrast to the required to solve (2.1). In other words, the -dimensional solution manifold can potentially be compressed into an -dimensional solution manifold, up to small homogenization error that vanishes as .

###### Remark 1.

Due to the popularity of the elliptic multiscale problem, the literature is rich. For linear elliptic PDEs, many influential methods have been developed, including the multiscale finite element method (MsFEM) [46, 33, 47], the heterogeneous multiscale method (HMM) [24, 3, 28], the generalized finite element method [8, 7], localization methods [60], methods based on random SVD [17, 16, 18, 19], and many others [2, 1, 62, 63, 61, 12, 43]. Many of these methods adopt an offline-online strategy. In the offline stage, local bases that encode the small-scale information and approximate the local solution manifold (space) with few degrees of freedom are constructed. In the online stage, the offline bases are used to compute global solutions on coarse grids, thus reducing online computation requirements drastically over naive approaches. For nonlinear problems, there is less prior work, and almost all methods can be seen as extensions of classical methods [32, 21, 31, 28, 4, 30, 44, 2, 57]. There is no counterpart on the nonlinear solution manifold for a linear basis, so most classical solvers construct local basis function iteratively, which accounts for a large amount of overhead time. One strategy that avoids repeated online computation of local bases is to adopt an idea from manifold learning [20] based on preparing a dictionary for each local patch in the offline stage to approximate the local solution manifold. The major computational issue for classical multiscale solvers is thus greatly alleviated: Repeated basis computation is reduced to basis searching on the manifold. However, since the method is locally linear, its efficacy depends on the amount of nonlinearity of underlying PDE. Thus far, the approach is difficult to generalize to fully nonlinear elliptic PDEs, and a more universal methodology is needed to approximate the nonlinear solution map.

### 2.2. Domain decomposition and Schwarz iteration

A popular framework for solving elliptic PDEs is domain decomposition, where the problem is decomposed and solved separately in different subdomains, with boundary conditions chosen iteratively to ensure regularity of the solution across the full domain. This approach is naturally parallelizable, with potential savings in memory and computational cost. It essentially translates the inversion of a large matrix into the a composition of inversions of many smaller matrices. The many variants of domain decomposition include the Schwarz iteration strategy that we adopt in this paper. This strategy makes use of a partition-of-unity function that resolves the mismatch between two solutions in adjacent subdomains. We briefly review the method here.

For simplicity we describe the case of and assume throughout the paper that for some . The approach partitions the domain into multiple overlapping subdomains, also called patches. It starts with an initial guess of the solution on the boundaries of all subdomains, and solves the Dirichlet problem on each patch. The computed solutions then serve as the boundary conditions for neighboring patches, for purposes of computing the next iteration. The entire process is repeated until convergence.

In the current setting, the overlapping rectangular patches are defined as follows:

 (2.5) Ω=⋃m∈JΩm,withΩm=(x(1)m1,x(2)m1)×(y(1)m2,y(2)m2),

where is a multi-index and is the collection of the indices

 J={m=(m1,m2):m1=1,…,M1,m2=1,…,M2}.

We plot the setup in Figure 2.1. For each patch we define the associated partition-of-unity function , which has and

 (2.6) χm(x)=0onx∈Ω∖Ωm,∑mχm(x)=1,∀x∈Ω.

We set to be the boundary of patch and denote by the collection of indices of the neighbors of . In this 2D case, we have

 (2.7) N(m)={(m1±1,m2)}∪{(m1,m2±1)}⊂J.

Naturally, indices that are out of range, which correspond to patches adjacent to the boundary , are omitted from .

In the framework of domain decomposition, the full-domain problem is decomposed into multiple smaller problems supported on the subdomains. Define the local Dirichlet problem on patch by:

 (2.8) {Fϵ(D2uϵm(x),Duϵm(x),uϵm(x),x)=0,x∈Ωm,uϵm(x)=ϕm(x),x∈∂Ωm.

For this local problem, we define the following operators:

• is the solution operator that maps local boundary condition to the local solution :

 uϵm=Sϵmϕm.

Denoting by the number of grid points on the boundary and the number of grid points on the subdomain , then maps to .

• denotes the restriction (or trace-taking) operator that restricts the solution within to its part that overlaps with the boundary of , for all . That is,

 Il,muϵm=uϵm|∂Ωl∩Ωm.

Denoting by the number of grid points in , then maps to .

• is the composition of and . It is a boundary-to-boundary operator that maps the local boundary condition to the restricted solution :

 Qϵl,mϕm=Il,mSϵmϕm=uϵm|∂Ωl∩Ωm.

maps to .

• denotes the collection of all segments of boundary conditions that is computed from the full-domain boundary condition :

 (2.9) Qϵmϕm=⨁l∈N(m)ψl,m=⨁l∈N(m)Qϵl,mϕm=⨁l∈N(m)Il,mSϵmϕm.

Letting , maps to .

The Schwarz procedure starts by making a guess of boundary condition on each . At the th iteration, (2.8) is solved for each subdomains (possibly in parallel) and these solutions are used to define new boundary conditions for the neighboring subdomains , . The boundary conditions for at iteration are thus:

 (2.10) ϕ(n+1)m={ψ(n)m,l=Im,lSϵlϕ(n)l,on ∂Ωm∩Ωl,l∈N(m),ϕ|∂Ωm∩∂Ω,on ∂Ωm∩∂Ω.

Note that the physical full-domain boundary condition is imposed on the points in . Each iteration of the Schwarz procedure can be viewed as an application of the map . The procedure concludes by patching up the local solutions from the subdomains. The overall algorithm is summarized in Algorithm 1.

The convergence of classical Schwarz iteration is guaranteed for fully nonlinear elliptic equations; see, for example [56, 55, 39]. Since the computation of solution can be expensive due to the nonlinearity and oscillation of the medium at small scale , the major computational cost for Schwarz iteration comes from the repeated evaluation of the boundary-to-boundary map , which requires solution of an elliptic PDE on each subdomain.

## 3. Reduced order Schwarz method based on neural networks

The major numerical expense in the Schwarz iteration comes from the local PDE solves — one per subdomain per iteration. However, except at the final step where we assemble the global solution, our interest is not in the local solutions per se: It is in the boundary-to-boundary maps that share information between adjacent subdomains on each Schwarz iteration. If we can implement these maps directly, we can eliminate the need for local PDE solves. To this end, we propose an offline-online procedure. In the offline stage, we implement the boundary-to-boundary maps, and in the online stage, we call these maps repeatedly in the Schwarz framework. This approach is summarized in Algorithm 2. In this description, we replace the boundary-to-boundary map by a surrogate , which is neural network parametrized by weights , whose values are found by an offline training process.

Since the online stage is self-explanatory, we focus on the offline stage, and study how to obtain the approximation to .

### 3.1. Two observations

A rigorous approach to preparing the boundary-to-boundary map in the offline stage is not straightforward. In the case of linear PDEs, it amounts to computing all Green’s functions in the local subdomains and confining them on the adjacent subdomain boundaries for the map; see [19]. When the PDEs are nonlinear, there would seem to be no alternative to solving the local PDEs with all possible configurations of the boundary conditions, applying the appropriate restrictions, and storing the results. At the discrete level, would be represented as a high-dimensional function mapping to . To achieve a specified accuracy, both and need to scale as . For brute-force training, at least local PDE solves need to be performed to compute the required approximation to . This is a large amount of computation, and it offsets whatever gains accrue in the online stage from efficient deployment of the approximation to .

To be cost-effective, a method of the form of Algorithm 2 must exploit additional properties, intrinsic to and to the scheme for approximating this mapping. The first such property is a direct consequence of homogenization. As argued in Section 2.1, the solution of the effective equation (2.3) can preserve the ground truth well, with the effective equation independent of . Therefore, the map , though presented as a mapping from to

, is intrinsically of low dimension and can be compressed. To visualize this relation, we plot the relative singular values of the boundary-to-boundary operator

of a linear multiscale elliptic equation (see (4.4)) in Figure 3.1.

With the system being of intrinsically low dimension, we expect that a compression mechanism can be deployed. Even though the data itself is represented in high dimension, the number of parameters in the compressed representation should not grow too rapidly with the order of discretization. We seek an approximation strategy that can overcome the “curse of dimensionality.” These considerations lead us to the use of neural network (NN). NN, unlike other approximation techniques, is powerful in learning functions supported in high dimensional space; the number of parameters that need to be tuned to fit data in a high dimensional space is typically relaxed from the dimension of the data.

Consider a fully connected feedforward neural network (FCNN) representing a function . A 2-layer FCNN with hidden-layer width would thus be required to satisfy

 (3.1) fNN(x)=W2σ(W1x+b1)+b2,x∈Rn,

where , are weight matrices and ,

are biases. The activation function

is applied component-wise to its argument. (The ReLU activation function

is especially popular.) This -layer FCNN already can represent high dimensional functions. A fundamental approximation result [51, 26, 10] is captured in the following theorem.

###### Theorem 3.1 (Barron’s Theorem).

Let be a bounded domain. Suppose a generic function satisfies

 (3.2) Δ(f)=∫Rn∥ω∥21|^f(ω)|dω<∞,

where

is the Fourier transform of the zero extension of

to . Then there exists a two-layer ReLU neural network with

hidden-layer neurons such that

 (3.3) ∥f−fNN∥L2(D)≲Δ(f)√h.

A natural high dimensional extension of the result is as follows.

###### Corollary 3.1.

Let be a bounded domain. Suppose a generic function so that satisfies (3.2), then there exists a two-layer ReLU neural network with hidden-layer neurons such that

 (3.4) ∥f−fNN∥L2(D)≲ ⎷m∑i=1Δ2(fi)h/m≤mΔ(f)√h.

where .

A nice feature of this result is that the approximation error is mostly relaxed from the dimension of the problem, making NN a good fit for our purposes. In our setting, it is the high-dimensional operator that needs to be learned. Theorem 3.1 suggests that if FCNN is used as the representation, the number of neurons required will not depend strongly on this dimension.

### 3.2. Offline training and the full algorithm

The two observations above suggest that using a neural-network approximation for the boundary-to-boundary operator can reduce computation costs and memory significantly. Following (3.1), we define the NN approximation to as follows:

 (3.5) QNNm(θm)ϕm=Wm,2σ(Wm,1ϕm+bm,1)+bm,2,where ϕm∈Rdm.

Here denotes all learnable parameters, with weight matrices and biases . The number of neurons is a tunable parameter that captures the intrinsic dimension of . Theorem 3.1 and the homogenizability of the elliptic equation suggest that can be chosen to satisfy a prescribed approximation error while being independent of both and , and thus of the small scale .

Given a fixed NN architecture and a data set, the identification of optimal

amounts to minimizing a loss function

that measures the misfit between the data and the prediction. One needs to prepare a set of data and corresponding outputs

 (3.6) Ym={ψm,i=Qϵmϕm,i=(ψl,m,i)l∈N(m)=(uϵm,i|∂Ωl∩Ωm)l∈N(m)}Ni=1,

where solves (2.8). The loss function to be minimized is

 (3.7)

where evaluates the mismatch between the first and the second arguments. (This measure could be defined using the norm and / or the norm.) Gradient-based algorithms for minimizing (3.7) have the general form

 (3.8) θ(t+1)m←θ(t)m−ηtGt(∇θmL(θ(t)m),…,∇θmL(θ(1)m)),

where is the learning rate and is based on the all gradients seen so far. For example, for the Adam optimizer [50], the function is a normalized exponentially decaying average of gradients:

 (3.9) Gt(at,…,a1)∝(1−βt1)−1t∑s=1βt−s1(1−β1)as,

for some parameter . The sign means needs to be normalized so that .

Like many optimization processes, the training and tuning of this NN depends on some prior knowledge. We propose a mechanism to select training data that represent well the information in . We also initialize the weights according to a reduced linear problem. These mechanisms are described in the following two sections; their effectiveness in numerical testing is demonstrated in Section 4.

#### 3.2.1. Generating training data

To learn the parameters in the NN approximation to the boundary-to-boundary map, one needs to provide a training set of examples of the map. We generate such examples by adding a boundary margin of width to each interior patch to obtain an enlarged patch , as shown in Figure 3.2. Samples are generated by choosing Dirichlet conditions for the enlarged patch, then solving the equation, and defining the map in terms of restrictions of both input and output conditions to the appropriate boundaries.

Specifically, following [20], we generate i.i.d. samples of the boundary conditions for the enlarged patch according to 111The distribution of the sample is uniform in angle and satisfies a power law in the radius. Letting , we write with uniformly distributed on the unit sphere and distributed in according to the density function . To measure the discrete norm, we employ the formula , where we denote denote , and is the step size., and solve the following equations for :

 (3.10) {Fϵ(D2¯¯¯uϵm,i(x),D¯¯¯uϵm,i(x),¯¯¯uϵm,i(x),x)=0,x∈¯¯¯¯Ωm,¯¯¯uϵm,i(x)=¯¯¯ϕm,i(x),x∈∂¯¯¯¯Ωm.

The boundary-to-boundary map maps each element of to the corresponding element of , where

 (3.11) ϕm,i=¯¯¯uϵm,i|∂Ωm,ψm,i=(ψl,m,i)l∈N(m)=(¯¯¯uϵm,i|∂Ωl∩Ωm)l∈N(m).

This pair of sets — input set and output set — serves as the training data.

#### 3.2.2. Initialization

The training problem of minimizing in (3.7) to obtain the NN approximate operator is nonconvex, so a good initialization scheme can improve the performance of a gradient-based optimization scheme significantly. We can make use of knowledge about the PDE to obtain good starting points. Our strategy is to assign good initial weights and biases for the neural network using a linearization of the fully nonlinear elliptic equation (2.1). Denoting by the boundary-to-boundary operator of a linearized version of , to be made specific below for the numerical examples in Section 4, we initialize in a way that approximately captures . The linear boundary-to-boundary operator has a matrix representation. Denoting by the approximate rank (up to a preset error tolerance), we can write

 (3.12) QLm≈Um,rmΛrmV⊤m,rm=(Um,rm√Λrm)(Vm,rm√Λrm)⊤,

where and have orthonormal columns while is diagonal. As argued in [19], due to the fact that the underlying equation is homogenizable, this rank is much less than , and is independent of and .

To start the iteration of , we compare (3.5) with the form of (3.12). This suggests the following settings of parameters in (3.5): and

 (3.13) Wm,1 =[Vm,rm√Λrm,−Vm,rm√Λrm]⊤, Wm,2 =[Um,rm√Λrm,−Um,rm√Λrm].

Note that . These configurations will be used as the initial iteration in (3.8).

We summarize our offline training method in Algorithm 3. Integratation into the full algorithm yields the reduced order neural network based Schwarz iteration method.

## 4. Numerical results

We present numerical examples using our proposed method to solve a multiscale semilinear elliptic equation and a multiscale -Laplace equation. In both examples, we use domain . To form the partitioning, is divided into equal non-overlapping rectangles, then each rectangle is enlarged by on the sides that do not intersect with , to create overlap. We thus have

 Ωm =[max(m1−1M1−Δxo,0),min(m1M1+Δxo,1)] ×[max(m2−1M2−Δx%o,0),min(m2M2+Δxo,1)],m=(m1,m2)∈J.

The loss function is defined as in (3.7), with parameter . For training to obtain

, we use Pytorch

[64]

. For both examples, each neural network is trained for 5,000 epochs using shuffled mini-batch gradient descent with a batch-size of 5% of the training set size. The Adam optimizer is used with default settings, and the learning rate decays with a decay-rate of 0.9 every 200 epochs.

### 4.1. Semilinear elliptic equations

The first example is the semilinear elliptic equation

 (4.1) {−∇⋅(κϵ(x)∇uϵ(x))+uϵ(x)3=0,x∈Ω,uϵ(x)=ϕ(x),x∈∂Ω,

with oscillatory medium defined by

 (4.2) κϵ(x1,x2)=2+sin(2πx1)cos(2πx2)+2+1.8sin(2πx1/ϵ)2+1.8cos(2πx2/ϵ)+2+sin(2πx2/ϵ)2+1.8cos(2πx1/ϵ).

with . The medium is plotted in Figure 4.1.

The reference solution and the local PDE solves are computed using the standard finite-volume scheme with uniform grid with mesh size and Newton’s method is used to solve the resulting algebraic problem. For our domain decomposition approach, we set to define the patches , with boundary margins to form . The input and output dimensions of are thus .

To obtain the training data, each patch is further enlarged to a buffered patch by adding a margin of to . On each patch , samples are generated with random boundary conditions defined by and . To train the NN, we use the loss function (3.7) with

 (4.3) l(QNNm(θm)ϕm−ψm)= ∥QNNm(θm)ϕm−ψm∥2 +μN∑i=1∑l∈N(m)∥DhQNNl,m(θm)ϕm−Dhψl,m∥2,

where is the discrete version of the derivative operator with step size . The second term measures mismatch in the derivative.

To initialize the neural networks, we take to be the boundary-to-boundary operator of the following linear elliptic equation

 (4.4) −∇⋅(κϵ(x)∇uϵ(x))=0,x∈Ω.

We truncate the rank representation of at rank to preserve all singular values bigger than a tolerance .

#### 4.1.1. Offline training

We show the improvements in the offline process for training due to the two strategies described in Subsection 3.2.2: the use of enlarged patches, and initialization using SVD of a matrix representation of a linearized equation. Figure 4.2 plots number of epochs in the offline training vs the training loss function (4.3) associated with for the patch in four different settings: SVD-initialization on training data with buffer zone, SVD-initialization on training data without buffer zone, and the counterpart without SVD-initialization. The same NN model is used in all four settings. It is immediate that the training process has a much faster decay in error if buffer zone is adopted, and that the SVD initialization gives a much smaller error than random initialization.

To show the generalization performance of the resulting trained NN, we generate a test data set from the same distribution as the buffered training data set with 1,000 samples, for the same patch . Since the NNs trained using non-buffered data produce larger error, we only test the NNs trained with buffered data. The test errors (4.3) in the training process for different models are plotted in Figure 4.3. Again, the use of buffered data along with SVD-initialization yields the best performance.

To demonstrate generalization performance, we plot the predicted outputs for two typical examples in the test set in Figure 4.4. For comparison, we also plot the outputs produced by randomly initialized neural network and the linear operator . It can be seen that the low-rank SVD-initialized neural network has the best performance among all the initialization methods.

We note too that the neural network models initialized by the SVD of linear PDEs tend to be more interpretable. Figure 4.5 shows the final weight matrices for models initialized by different methods. It can be seen that for SVD-initialized model yiels weight matrices with recognizable structure: the parameters for higher modes are near zero, and only the top modes in the positive and negative halves are nontrivial. By comparison, the trained weight matrices using randomly initialized parameters do not show any pattern or structure.

#### 4.1.2. Online phase: Schwarz iteration

We show results obtained by using the NN approximation of the boundary-to-boundary map inside the Schwarz iteration. Table 1 shows the boundary conditions used for the three problems we tested. (The same medium (4.2) is used in all cases.) We use for the tolerance in Algorithm 2, and use the full accuracy local solvers as in the generation of training data set. In Figure 4.6, we plot the ground truth solutions for different boundary conditions and the absolute error of obtained by neural network-based Schwarz iteration. (Note that the scaling of the -axis in the latter is different from the former.) The relative errors obtained for the four variants of NN approximation along with the linear approximation to the boundary-to-boundary map can be found in Tables 2 and 3. Note that the smallest errors are attained by the variant that uses the SVD initialization and buffered patches. To demonstrate the efficiency of our method, we compare the CPU time of neural network based-Schwarz method and the classical Schwarz method, using the same tolerance for the latter. The NNs we used for the test is trained by SVD initialization, and its training data is generated with buffer zone. Since NN-produced local boundary-to-boundary map is only an approximation to the ground truth, for a fair comparison, we also run the reference local solution with a relaxed accuracy requirement. The CPU time, number of iteration and error comparison can be found in Table 4. In all three test cases, the NN approximate executes faster than the conventional local solution technique as a means of implementing the boundary-to-boundary map, while producing errors of the same order.

### 4.2. p-Laplace equations

The second example concerns the multiscale -Laplace elliptic equation [6, 40, 65, 22, 57] defined as follows:

 (4.5) {−∇⋅(κϵ(x)|∇uϵ|p−2∇uϵ(x))=0,x∈Ω,uϵ(x)=ϕ(x),x∈∂Ω,

where we use in this section, and the oscillatory medium is

 (4.6) κϵ</