DeepAI

# Augmented Physics-Informed Neural Networks (APINNs): A gating network-based soft domain decomposition methodology

In this paper, we propose the augmented physics-informed neural network (APINN), which adopts soft and trainable domain decomposition and flexible parameter sharing to further improve the extended PINN (XPINN) as well as the vanilla PINN methods. In particular, a trainable gate network is employed to mimic the hard decomposition of XPINN, which can be flexibly fine-tuned for discovering a potentially better partition. It weight-averages several sub-nets as the output of APINN. APINN does not require complex interface conditions, and its sub-nets can take advantage of all training samples rather than just part of the training data in their subdomains. Lastly, each sub-net shares part of the common parameters to capture the similar components in each decomposed function. Furthermore, following the PINN generalization theory in Hu et al. [2021], we show that APINN can improve generalization by proper gate network initialization and general domain function decomposition. Extensive experiments on different types of PDEs demonstrate how APINN improves the PINN and XPINN methods. Specifically, we present examples where XPINN performs similarly to or worse than PINN, so that APINN can significantly improve both. We also show cases where XPINN is already better than PINN, so APINN can still slightly improve XPINN. Furthermore, we visualize the optimized gating networks and their optimization trajectories, and connect them with their performance, which helps discover the possibly optimal decomposition. Interestingly, if initialized by different decomposition, the performances of corresponding APINNs can differ drastically. This, in turn, shows the potential to design an optimal domain decomposition for the differential equation problem under consideration.

• 6 publications
• 9 publications
• 73 publications
• 50 publications
09/20/2021

### When Do Extended Physics-Informed Neural Networks (XPINNs) Improve Generalization?

Physics-informed neural networks (PINNs) have become a popular choice fo...
03/11/2020

### hp-VPINNs: Variational Physics-Informed Neural Networks With Domain Decomposition

We formulate a general framework for hp-variational physics-informed neu...
07/08/2022

### Adaptive Self-supervision Algorithms for Physics-informed Neural Networks

Physics-informed neural networks (PINNs) incorporate physical knowledge ...
04/20/2021

### Parallel Physics-Informed Neural Networks via Domain Decomposition

We develop a distributed framework for the physics-informed neural netwo...
09/07/2020

### Self-Adaptive Physics-Informed Neural Networks using a Soft Attention Mechanism

Physics-Informed Neural Networks (PINNs) have emerged recently as a prom...
04/27/2022

### SVD Perspectives for Augmenting DeepONet Flexibility and Interpretability

Deep operator networks (DeepONets) are powerful architectures for fast a...

## 1 Introduction

Deep learning has become popular in scientific computing and is widely adopted in solving forward and inverse problems involving partial differential equations (PDEs). The physics-informed neural network (PINN) raissi2019physics is one of the seminal works in utilizing deep neural networks to approximate PDE solutions by optimizing them to satisfy the data and physical laws governed by the PDE. Furthermore, the extended PINN (XPINN) jagtap2020extended is a follow-up work of PINN, which first proposes space-time domain decomposition to partition the domain into several subdomains, where several sub-nets are employed to approximate the solution on their subdomains, while the solution continuity between them is enforced via interface losses. Then, its output is the ensemble of all sub-nets. The theoretical analysis of when XPINNs can improve generalization over PINNs is of great interest. The recent work of Hu et al., hu2021extended analyzes the trade-off in XPINN generalization between the simplicity of decomposed target function in each subdomain and the overfitting effect due to less available training data in each subdomain, which counterbalance each other to determine if XPINN can improve generalization over PINN. However, sometimes the negative overfitting effect incurred by the less available training data in each subdomain dominates the positive effect of simpler partitioned target functions. Furthermore, XPINNs may also suffer from relatively large errors at the interfaces between subdomains, which degrades the overall performance of XPINNs.

In this paper, we propose the Augmented PINN (APINN), which employs a gate network for soft domain partitioning to mimic the hard XPINN decomposition, which can be fine-tuned for a better decomposition. The gate network gets rid of the need for interface losses and weight-averages several sub-nets as the output of APINN, where each sub-net is able to utilize all training samples in the domain in order to prevent overfitting. Moreover, APINN adopts an efficient partial parameter sharing scheme for sub-nets, to capture the similar components in each decomposed function. To further understand the benefits of our APINN, we follow the theory in hu2021extended to theoretically analyze the generalization bound of APINN, compared to those of PINN and XPINN, which justifies our intuitive understanding of the advantages of APINN. Concretely, generalization bounds for APINNs with trainable or fixed gate networks are derived, which show the advantages of soft and trainable domain and function decomposition in APINN. We also perform extensive experiments on several PDEs that validate the effectiveness of our APINN. Specifically, we have examples where XPINN performs similarly to or worse than PINN, so that APINN can significantly improve both. Moreover, we present cases where XPINN is already much better than PINN, but APINN can still slightly improve XPINN. In addition to the superior performance of APINN, we also visualize the optimized gating networks and the optimization trajectories and then relate their shapes with their performances to select the potentially best decomposition. We show that if APINN is initialized by the optimal decomposition, then it can perform even better, which suggests strategies for designing the optimal domain decomposition for a given PDE problem.

## 2 Related Work

The PINN raissi2019physics is one of the pioneering frameworks that employs deep learning techniques to solve forward and inverse problems governed by parametrized PDEs. PINN has been successfully used to solve many problems in the field of computational science since its initial publication; for more information, see raissi2018hidden; yang2019adversarial; jagtap2022deep; haghighat2021physics; jagtap2022deepKNN. The original idea of domain decomposition in the PINN method was proposed in jagtap2020conservative for nonlinear conservation laws and named Conservative PINN (CPINNs). In subsequent work, the same authors proposed XPINN jagtap2020extended for general space-time domain decomposition, where there is a sub-PINN on each sub-domain for fitting the target function on this sub-domain, while the continuity between sub-PINNs is enforced via additional interface losses (penalty terms). The Parallel PINN shukla2021parallel is the follow-up work of CPINN and XPINN, where CPINN and XPINN are trained on multiple GPUs or CPUs simultaneously. Parareal PINN meng2020ppinn decomposes a longer time domain into several short-time subdomains, which can be efficiently solved by a coarse-grained (CG) solver and PINN, so that Parareal PINN shows an obvious speedup over PINN in long-time integration PDEs. The main limitation of Parareal PINN is,  it cannot be applied to all types of PDEs. The hp-VPINN kharazmi2021hp proposes a variational PINN method to decompose the domain when defining a new set of test functions, while the trial functions are still neural networks defined over the whole domain. DDM li2020deep uses the Schwarz method for overlapping domain decomposition and training the sub-nets iteratively rather than in parallel like XPINNs and CPINNs. Also, mercier2021coarse extends DDM through coarse space acceleration for improved convergence across a growing number of domains. li2022deep also uses the Schwarz method, but the sub-nets are multi-Fourier feature networks instead.

The finite basis PINN (FBPINN) moseley2021finite proposes dividing the domain into several small, overlapping sub-domains, with each of them using PINNs. Although FBPINN eliminates the need for interface conditions, our model differs from it in the following aspects: First, our domain decomposition is flexible and trainable, while FBPINN fixes the decomposition. Second, FPINN does not allow parameter sharing for the efficiency of sub-networks. Moreover, the overlapping subdomains in FBPINN can become computationally costly for multi-dimensional problems. The penalty-free neural networks (PFNN) sheng2022pfnn propose the idea of overlapping domain decomposition, which is different from our models for the same reasons as FBPINN. The GatedPINN stiller2020large proposes to adopt the idea of a mixture of experts (MoE) shazeer2017outrageously to modify XPINNs. Although they also use a gate network to weight-average several sub-PINNs, their GatePINN is different from our APINN because the gate function in GatedPINN is randomly initialized, while that in our APINN is pretrained on an XPINN domain decomposition. Furthermore, they do not consider efficient parameter sharing for sub-nets to improve model expressiveness. dong2021local; DWIVEDI2021299 also proposes a similar idea of domain decomposition as in XPINNs. However, they use extreme learning machines (ELMs) to replace the neural networks in XPINNs, where only the parameters at the last layer are trained. Based on variational principles and the deep Ritz method, D3M li2019d3m further combines the Schwarz method for overlapping domain decomposition. Compared to our trainable domain decomposition, the domains in D3M are fixed during optimization. To learn optimal modifications on the interfaces of different sub-domains, taghibakhshi2022learning

proposes using graph neural networks and unsupervised learning.

heinlein2021combining presents a review on the domain decomposition method for numerical PDEs.

The first comprehensive theoretical analysis of PINNs as well as XPINNs for a prototypical nonlinear PDE, the Navier-Stokes equations, is proposed in Ryck2021ErrorAF. The generalization abilities of PINNs and XPINNs are theoretically analyzed in hu2021extended while the generalization and optimization capabilities of deep neural networks have been analyzed in the general field of deep learning in  kawaguchi2018generalization; kawaguchi2022robustness; kawaguchi2016deep; kawaguchi2019depth; xu2021optimization; kawaguchi2021theory; kawaguchi2022understanding.

## 3 Problem Definition and Background

### 3.1 Problem Definition

We consider partial differential equations (PDEs) defined on the bounded domain , with the following form:

 Lu∗(x) =f(x) in Ω,u∗(x)=g(x) on ∂Ω, (1)

For matrix norms, we denote the spectral norm by and norms by . In the following, we introduce the formulations of PINN raissi2019physics and XPINN jagtap2020extended.

### 3.2 PINN and XPINN

The PINN is motivated by optimizing neural networks to satisfy the data and physical laws governed by a PDE to approximate its solution. Given a set of boundary training points and residual training points , the ground truth PDE solution is approximated by the PINN model , by minimizing the training loss containing a boundary loss and a residual loss:

 RS(θ)=1nbnb∑i=1|uθ(xb,i)−g(xb,i)|2+1nrnr∑i=1|Luθ(xr,i)−f(xr,i)|2, (2)

where PINN learns boundary conditions in the first term, while learning the physical laws described by the PDEs in the second term.

The XPINN extends PINN by decomposing the domain

into several subdomains where several sub-PINNs are employed. The continuity between each sub-PINNs is maintained via the interface loss function, and the output of XPINN is the ensemble of all sub-PINNs, where each of them makes predictions on their corresponding subdomains. Concretely, domain

is decomposed into subdomains as . The loss of XPINN contains the sum of the PINN losses of the sub-PINNs, including boundary and residual losses, plus the interface losses using points on the interfaces of different subdomains , where such that to maintain the continuity between the two sub-PINNs and . Specifically, XPINN loss for the -th sub-PINN is

 RiS(θi)+λI∑i,j:∂Ωi∩∂Ωj≠∅RI(θi,θj), (3)

where is the weight controlling the strength of the interface loss, is the parameters for subdomain , and each is the PINN loss for subdomain containing boundary and residual losses, i.e.,

 RiS(θi)=1nb,inb,i∑j=1|uθi(xib,j)−g(xib,j)|2+1nr,inr,i∑j=1|Luθi(xir,j)−f(xir,j)|2, (4)

where and are the number of boundary points and residual points in subdomain respectively, and and are the -th boundary and residual training points in subdomain , respectively. Furthermore, is the interface loss between the -th and -th subdomains based on interface training points

 RI(θi,θj) =1nI,ijnI,ij∑k=1[|uθi(xijI,k)−{{uθavg}}|2+ (5) |(Luθi(xijI,k)−fi(xijI,k))−(Luθj(xijI,k)−fj(xijI,k))|2],

where , is the number of interface points between the -th and -th subdomains, while is the -th interface points between them. The first term is the average solution continuity between the -th and the -th sub-nets, while the second term is the residual continuity condition on the interface given by the -th and the -th sub-nets. We will refer to the XPINN model introduced above as XPINNv1, since it is exactly the model proposed in the original work of jagtap2020extended.

In practice, XPINNv1 may exhibit relatively larger errors near the interface, i.e., the interface losses in XPINNv1 cannot necessarily maintain the continuity between different sub-PINNs. This is because the enforcement of residual continuity conditions for PDEs involving higher-order derivatives is difficult to maintain accurately due to the obvious presence of higher-order derivatives. Therefore, de2022error introduces enforcing the continuity of first-order derivatives between different sub-PINNs to resolve the issue:

 RA(θi,θj)=1nI,ijnI,ij∑k=1d∑m=1∣∣ ∣∣∂uθi(xijI,k)∂xm−∂uθj(xijI,k)∂xm∣∣ ∣∣2, (6)

where is the problem dimension, i.e., . With this additional term on first-order derivatives, we name the corresponding XPINN model as XPINNv2.

## 4 Augmented PINN (APINN)

### 4.1 Parameterization of Augmented PINN

In this section, we introduce the model parameterization of APINN, which is graphically shown in Figure 1. We consider a shared network (blue), where is the input dimension and is the hidden dimension, and sub-nets (red), where each , and a gating network (green) where is the -dimensional simplex, for weight-averaging the outputs of the sub-nets. The output of our augmented PINN (APINN) parameterized by is:

 uθ(x)=m∑i=1(G(x))iEi(h(x)), (7)

where is the -th entry of , and is the collection of all parameters in , and . Both and are trainable in our APINN, while can be either trainable or fixed. If is trainable, we name the model APINN, otherwise we call it APINN-F.

The APINN is a universal approximator. The detailed proof is as follows.

###### Proof.

(The APINN is a universal approximator) Denote the function class of all neural networks as , then it is universal, i.e., for all continuous functions and , there exists a neural network , such that . In addition, we also denote the function class of gating network by

, which collects all vector-value neural networks mapping

to .

Back to the APINN model, and denote the function class of APINN as , which is

 APINN={f∣∣∃E1,⋯Em,h∈NN,G∈G,s.t.,f=m∑i=1G(x)iEi(h(x))}. (8)

If we choose , then APINN degenerates to a vanilla multilayer network since , i.e., . Therefore, since multilayer neural networks are already universal approximator and it is a subset of APINN model, APINN is a universal approximator. ∎

In APINN, is pre-trained to mimic the hard and discrete decomposition of XPINN, which will be discussed in the next subsection. If is trainable, then our model can fine-tune the pre-trained domain decomposition to further discover a better decomposition through optimization. If not, then APINN is exactly the soft version of XPINN with the corresponding hard decomposition. APINN is better than PINN thanks to the adaptive domain decomposition and parameter efficiency.

### 4.2 Explanation of the Gating Network

In this section, we will show how the gating network can be trained to mimic XPINNs for soft domain decomposition. Specifically, in Figure 2 left, XPINN decomposes the entire domain , into two subdomains: the upper one , and the lower one , which is based on the interface . The soft domain decomposition in APINN is shown in Figure 2 (middle and right), which are the pretrained gating networks for the two sub-nets corresponding to the upper and bottom subdomains. Here, is pretrained on and on . Intuitively, the first sub-PINN focuses on where is larger, corresponding to the upper part, while the second sub-PINN focuses on where is smaller, corresponding to the bottom part.

Another example is to decompose the domain into an inner part and an outer part, as shown in Figure 3. In particular, we decompose the entire domain , into two subdomains: the inner one , and the outer one . The soft domain decomposition is generated by the gating functions pretrained on and pretrained on , such that the first sub-net concentrates in the inner part near , while the second sub-bet focuses on the rest of the domain.

The gating network can also be adapted for complex domains like the L-shape domain or even high-dimensional domains by properly choosing the corresponding gating function.

### 4.3 Difference in the position of h

We have three options for building the model of APINN. First, the simplest idea is that if we omit the parameter sharing in our APINN, then the model becomes:

 uθ(x)=m∑i=1(G(x))iEi(x). (9)

The proposed model in this paper is

 uθ(x)=m∑i=1(G(x))iEi(h(x)). (10)

Another method for parameter sharing is to place outside the weighted average of several sub-nets.

 uθ(x)=h(m∑i=1(G(x))iEi(x)). (11)

Compared to the first model, our new model given in equation (10) adopts parameter sharing for each sub-PINN to improve parameter efficiency. Equation (10) generalizes equation (9) by using the same networks and selecting the shared network as identity mapping. Intuitively, the functions learned by each sub-PINNs should have some kind of similarity, since they are parts of the same target function. The prior of network sharing in our model explicitly utilizes intuition and is therefore more parameter efficient.

Compared to the model given in equation (11), our model is more interpretable. In particular, our model in equation 10 is a weighted average of sub-PINNs , so that we can visualize each to observe what functions they are learning. However, for equation (11), there is no clear function decomposition due to the being outside, so that visualization of each function component learned is not possible.

## 5 Theoretical Analysis

### 5.1 Preliminaries

To facilitate the statement of our main generalization bound, we first define several quantities related to the network parameters. For a network , we denote and for fixed reference matrices , where can vary for different networks. We denote its complexity as follows

 Ri(uθ)=(L∏l=1M(l))i+1(L∑l=1N(l)2/3)3/2,i∈{0,1,2}, (12)

where signifies the order of derivative, i.e., denotes the complexity of the -th derivative of the network. We further denote the corresponding , , and quantities of the sub-PINN as , and . We also denote those of the gate network as , and .

The train loss and test loss of a model are the same as that of PINN, i.e.,

 RS(θ) =RS∩∂Ω(θ)+RS∩Ω(θ) (13) =1nbnb∑i=1|uθ(xb,i)−g(xb,i)|2+1nrnr∑i=1|Luθ(xr,i)−f(xr,i)|2. RD(θ) =RD∩∂Ω(θ)+RD∩Ω(θ) =EUnif(∂Ω)|uθ(x)−g(x)|2+EUnif(Ω)|Luθ(x)−f(x)|2.

Since the following assumption holds for a vast variety of PDEs, we can bound the test error by the test boundary and residual losses:

###### Assumption 5.1.

Assume that the PDE satisfies the following norm constraint:

 C1∥u∥L2(Ω)≤∥Lu∥L2(Ω)+∥u∥L2(∂Ω),∀u∈NNL,∀L, (14)

where the positive constant does not depend on but on the domain and the coefficients of the operators , and the function class contains all -layer neural networks.

The following assumption is widely adopted in related works Luo2020TwoLayerNN; hu2021extended.

###### Assumption 5.2.

(Symmetry and boundedness of ). Throughout the analysis in this paper, we assume the differential operator in the PDE satisfies the following conditions. The operator is a linear second-order differential operator in a non-divergence form, i.e., , where all are given coefficient functions and are the first-order partial derivatives of the function with respect to its -th argument (the variable ) and are the second-order partial derivatives of the function with respect to its -th and -th arguments (the variables and ). Furthermore, there exists constant such that for all , and , we have and are all -Lipschitz, and their absolute values are not larger than .

### 5.2 A Tradeoff in XPINN Generalization

In this subsection, we review the tradeoff in XPINN generalization, introduced in hu2021extended. There are two factors that counterbalance each other to affect XPINN generalization, namely the simplicity of the decomposed target function within each subdomain thanks to the domain decomposition, and the complexity and negative overfitting effect due to the lack of available training data. When the former effect is more obvious, XPINN outperforms PINN. Otherwise, PINN outperforms XPINN. When the two factors strike a balance, XPINN and PINN perform similarly.

### 5.3 APINN with Non-Trainable Gate Network

In this section, we state the generalization bound for APINN with a non-trainable gate network. Since the gate network is fixed, the only complexity comes from the sub-PINNs. The following theorem holds for any gate function .

###### Theorem 5.1.

Assume that 5.2 holds for any

, with a probability of at least

over the choice of random samples with boundary points and residual points, we have the following generalization bound for an APINN model :

 RD∩∂Ω(θS) ≤RS∩∂Ω(θS)+~O⎛⎝∑mj=1maxx∈∂Ω∥G(x)j∥∞R0(Ej∘h)n1/2b+√log(4/δ(E))nb⎞⎠. (15) RD∩Ω(θS)

where

Intuition: The first term is the train loss, and the third is the probability term, in which we divide the probability into for a union bound over all parameters in .The second term is the Rademacher complexity of the model. For the boundary loss, the network is not differentiated. So, each contributes , and contributes since it is fixed and is Lipschitz. For the residual loss, the case of the second term is similar. Note that the second-order derivative of APINN is

 ∂2uθS(x)∂x2=2∑i=0m∑j=1∂i(G(x))j∂xi∂2−iEj(h(x))∂x2−i. (16)

Consequently, each contributes , while each contributes since it is fixed.

### 5.4 Explain the Effectiveness of APINN via Theorem 5.1

In this section, we explain the effectiveness of APINNs using Theorem 5.1, which shows that the benefit of APINN comes from (1) soft domain decomposition, (2) getting rid of interface losses, (3) general target function decomposition, and (4) the fact that each sub-PINN of APINN is provided with all the training data, which prevents overfitting.

For the boundary loss of APINN, we can apply Theorem 5.1 to each of the APINN’s soft subdomains. Specifically, for the -th sub-net in the -th soft subdomain of APINN, i.e., the , the bound is

 RD∩Ωk(θS)≤RS∩Ωk(θS)+~O(∑mj=1maxx∈∂Ωk∥G(x)j∥∞R0(Ej∘h)n1/2b,k+√log(4/δ(E))nb,k), (17)

where is the number of training boundary points in the -th subdomain.

If the gate net is mimicking the hard decomposition of XPINN, then we assume that the -th sub-PINN focuses on , in particular for , where approaches zero. Note that Theorem 5.1 does not depend on any requirement on the quantity , and we are making such assumption for illustration. Then, the bound reduces to

 RD∩Ωk(θS) ≤RS∩Ωk(θS)+~O⎛⎜⎝∥G(x)k∥∞R0(Ek∘h)+¯¯c∑j≠kR0(Ej∘h)n1/2b,k+√log(4/δ(E))nb,k⎞⎟⎠ (18) ≈RS∩Ωk(θS)+~O⎛⎜⎝R0(Ek∘h)n1/2b,k+√log(4/δ(E))nb,k⎞⎟⎠,

which is exactly the bound of XPINN if the domain decomposition is hard.

Therefore, APINN has the benefit of XPINN, i.e., it can decompose the target function into several simpler parts in some sub-domains. Furthermore, since APINN does not require the complex interface losses, its train loss is usually smaller than that of XPINN, and it is free from errors near the interface.

In addition to soft domain decomposition, even if the output of does not concentrate on certain sub-domains, i.e., does not mimic XPINN, APINN still enjoys the benefit of general function decomposition, and each sub-PINN of APINN is provided with all training data, which prevents overfitting. Concretely, for boundary loss of APINN, the complexity term of the model is

 ∑mj=1maxx∈∂Ω∥G(x)j∥∞R0(Ej∘h)n1/2b,

which is a weighted average of the complexity of all sub-PINNs. Note that, similar to PINN, if we view APINN on the entire domain, then all sub-PINNs are able to take advantage of all training samples, thus preventing overfitting. Hopefully, the weighted sum of each part is simpler than the whole. To be more specific, if we train a PINN, , the complexity term will be . If APINN is able to decompose the target function into several simpler parts such that their complexity weighted sum is smaller than the complexity of PINN, then APINN can outperform PINN.

### 5.5 APINN with Trainable Gate Network

In this section, we state the generalization bound for APINN with a trainable gate network. In this case, both the gate network and the sub-PINNs contribute to the complexity of the APINN model, influencing generalization at the same time.

###### Theorem 5.2.

Let Assumption 5.2 holds, for any , with probability at least over the choice of random samples with boundary points and residual points, we have the following generalization bound for an APINN model :

 RD∩∂Ω(θS) ≤RS∩∂Ω(θS)+~O⎛⎝R0(G)+∑mj=1R0(Ej∘h)n1/4b+√log(4/δ(G,E))nb⎞⎠. (19) RD∩Ω(θS) ≤RS∩Ω(θS)+~O⎛⎜ ⎜⎝∑2i=0(Ri(G)+∑mj=1R2−i(Ej∘h))n1/4r+√log(4/δ(G,E))nr⎞⎟ ⎟⎠,

where

Intuition: It is somehow similar to that of Theorem 5.1. Here, we treat the APINN model as a whole. Now, will contribute its complexity, , rather than its infinity norm, since it is trainable rather than fixed.

### 5.6 Explain the Effectiveness of The APINN via Theorem 5.2

By Theorem 5.2, besides the benefits explained by Theorem 5.1, a good initialization of soft decomposition inspired by XPINN helps generalization. If this is the case, the trained gate network’s parameters will not deviate significantly from their initialization. Consequently, quantities for all and will be smaller, and thus will be smaller, decreasing the right hand side of the bound stated in Theorem 5.2, which means good generalization.

## 6 Computational Experiments

### 6.1 The Burgers Equation

The one-dimensional viscous Burgers equation is given by

 ut+uux−0.01πuxx=0,x∈[−1,1],t∈[0,1]. (20) u(0,x)=−sin(πx). u(t,−1)=u(t,1)=0.

The difficulty of the Burgers equation is in the steep region near where the solution changes rapidly, which is hard to capture by PINNs. The ground truth solution is visualized in Figure 4 left. In this case, XPINN performs badly near the interface. Thus, APINN improves XPINN, especially in the accuracy near the interface, both by getting rid of the interface losses and by improving the parameter efficiency.

#### 6.1.1 PINN and Hard XPINN

For the PINN, we use a 10-layer tanh network of 20-width with 3441 neurons, and provide 300 boundary points and 20000 residual points. We use 20 as the weight on the boundary and 1 as the weight for the residual. We train PINN by the Adam optimizer with 8e-4 learning rate for 100k epochs. XPINNv1 decomposes the domain based on whether

. The weights for boundary loss, residual loss, interface boundary loss, and interface residual loss are 20, 1, 20, 1, respectively. XPINNv2 shares the same decomposition as XPINNv1, but its weights for boundary loss, residual loss, interface boundary loss, and interface first-order derivative continuity loss are 20, 1, 20, 1, respectively. The sub-nets are 6-layer tanh networks of 20-width with 3522 neurons in total, and we provide 150 boundary points and 10000 residual points for all sub-nets in XPINN. The number of interface points is 1000. The training points of XPINNs are visualized in Figure 4 right. We train XPINNs by the Adam optimizer with 8e-4 learning rate for 100k epochs. Both models are finetuned by the L-BFGS optimizer until convergence after Adam optimization.

#### 6.1.2 Apinn

To mimic the hard decomposition based on whether , we pretrain the gate net on the function , so that the first sub-PINN focuses on where is larger and the second sub-PINN focuses on where is smaller.The corresponding model is named APINN-X. In addition, we pretrain the gate net on to mimic multi-level PINN (MPINN) anonymous2022multilevel, where the first sub-net focuses on the majority part, while the second one is responsible for the minority part. The corresponding model is named APINN-M. All networks have a width of 20. The numbers of layers in the gate network, sub-PINN networks, and shared network are 2, 4, and 3, respectively, with 3462 / 3543 parameters depending on whether the gate network is trainable. All models are finetuned by the L-BFGS optimizer until convergence after Adam optimization.

#### 6.1.3 Results

The results for the Burgers equation are shown in Table 1. The reported relative errors are averaged over 10 independent runs, which are the best errors among their whole optimization processes. The error plots of XPINNv1 and APINN-X are visualized in Figures 6 left and right, respectively.

• XPINN performs much worse than PINN, due to the large error near the interface, where the steep region is located.

• APINN-X performs the best because its parameters are more flexible than those of PINN, and it does not require interface conditions like in XPINN, so it can model the steep region well.

• APINN-M performs worse than APINN-X, which means that MPINN initialization is worse than the XPINN one in this Burgers problem.

• APINN-X-F with a fixed gate function performs slightly worse than PINN and APINN, which justifies the flexibility of trainable domain decomposition. However, even without fine-tuning the domain decomposition, APINN-X-F can still outperform XPINN significantly, which shows the effectiveness of soft domain partition.

#### 6.1.4 Visualization of Gating Networks

Some representative optimized gating networks after convergence are visualized in Figure 7. In the first row, we visualize two gate nets of APINN-X. Despite the fact that their optimized gates differ, they retain the original left-and-right decomposition with the change in interface position.Thus, their errors are similar. In the second row, we show two gate nets of APINN-M. Their performances differ a lot, and they weight the two subnets differently. The third figure uses a weight for subnet-1 and a weight for subnet-2, while the fourth figure uses weight for subnet-1 and weight for subnet-2. It means that the training of MPINN-type decomposition is unstable, that APINN-M is worse than its XPINN counterpart in the Burgers problem, and that the weight in MPINN-type decomposition is crucial to its final performance. From these examples, we can see that initialization is crucial for APINN’s success. Despite the optimization, the trained gate will still be similar to the initialization.

Furthermore, we visualize the optimization trajectory of the gating network for the first subnet in the Burgers equation in Figure 8, where each snapshot is the gating net at epoch = 0, 1E4, 2E4, 3E4. That for the second subnet can be easily computed using the property of partition-of-unity . The trajectory is smooth, and the gating net gradually converges by moving the interface from left to right and shifting the interface.

### 6.2 Helmholtz Equation

Problems in physics including seismology, electromagnetic radiation, and acoustics are solved using the Helmholtz equation, which is given by

 uxx+uyy+k2u=q(x,y),x∈[−1,1],y∈[−1,1]. (21) u(−1,y)=u(1,y)=u(x,−1)=u(x,1)=0. q(x,y)=(−(a1π)2−(a2π)2+k2)sin(a1πx)sin(a2πy).

The analytic solution is

 u(x,y)=sin(a1πx)sin(a2πy), (22)

and is shown in Figure 9 left.

In this case, XPINNv1 performs worse than PINN due to the large errors near the interface. With additional regularization, XPINNv2 reduces 47% the relative error compared to PINN, but it still performs worse than our APINN due to the overfitting effect caused by the small availability of the training data in each sub-domain.

#### 6.2.1 PINN and Hard XPINN

For PINN, we provide 400 boundary and 10000 residual points. The XPINN decomposes the domain based on whether , whose training points are shown in Figure 9 right. We provide 200 boundary points, 5000 residual points, and 400 interface points for the two sub-nets in XPINN. Other settings of PINN and XPINN are the same as those in the Burgers equation.

#### 6.2.2 Apinn

We pretrain the gate net on the function to mimic XPINN, and on to mimic MPINN. For other experimental settings, please refer to the introduction of APINN in the Burgers equation.

#### 6.2.3 Results

The results for the Helmholtz equation are shown in Table 2. The reported relative errors are averaged over 10 independent runs, which are selected as having the lowest errors during optimization. The error plots of XPINNv1, APINN-X and XPINNv2 are visualized in Figure 11 left, middle, and right, respectively.

• XPINNv1 performs the worst, since its interface loss cannot enforce the interface continuity satisfactorily.

• XPINNv2 performs significantly better than PINN, but it is worse than APINN-X, because it overfits in the two sub-domains a bit due to the small number of available training samples, compared with APINN-X.

• APINN-M performs worse than APINN-X due to bad initialization of the gating network.

• The errors of APINN, XPINNv2 and PINN concentrate near the boundary, which is due to the gradient pathology wang2021understanding.

#### 6.2.4 Visualization of Optimized Gating Networks

The randomness of this problem is smaller, so that the final relative errors of different runs are similar. Some representative optimized gating networks after convergence of APINN-X are visualized in Figure 12. Specifically, every gating network maintains approximately the original decomposition into an upper and a lower domain, despite the fact that the interfaces change a bit in each run. From these observations, the XPINN-type decomposition into an upper and a bottom domain is already satisfactory for XPINN. We also notice that the XPINN outperforms PINN, which is consistent with our observation.

Furthermore, we visualize the optimization trajectory of the gating network for the first subnet in the Helmhotz equation in Figure 13, where each snapshot is the gating net at epoch = 0 to 5E2 with 6 snapshots in all. That for the second subnet can be easily computed using partition-of-unity property gating networks, i.e., . The trajectory is similar to the case in the Burgers equation. Here, the gating net of the Helmhotz equation converges much faster than the one in the previous Burgers equation.

### 6.3 Klein-Gordon Equation

In modern physics, the equation is used in a wide variety of fields, such as particle physics, astrophysics, cosmology, classical mechanics, etc. and it is given by

 utt−uxx+u3=f(x,t),x∈[0,1],t∈[0,1]. (23) u(x,0)=ut(x,0)=0. u(x,t)=h(x,t),x∈{0,1},t∈[0,1].

Its boundary and initial conditions are given by the ground truth solution:

 u(x,y)=xcos(5πt)+(xt)3, (24)

and is shown in Figure 14 left. In this case, XPINNv1 performs worse than PINN due to the large errors near the interface induced by unsatisfactory continuity between sub-nets, while XPINNv2 performs similarly to PINN. APINN performs much better than XPINNv1 and better than PINN and XPINNv2.

#### 6.3.1 PINN and Hard XPINN

The experimental settings of PINN and XPINN are identical to those of the previous Helmholtz equation, with the exception that XPINN now decomposes the domain based on whether and Adam optimization is performed for 200k epochs.

#### 6.3.2 Apinn

We pretrain the gate net on the function to mimic XPINN, and on to mimic MPINN. For other experimental settings, please refer to the introduction of APINN in the first equation.

#### 6.3.3 Results

The results for the Klein-Gordon equation are shown in Table 3. The reported relative errors are averaged over 10 independent runs. The error plots of XPINNv1, APINN-X, and XPINNv2 are visualized in Figures 16 left, middle, and right, respectively.

• XPINNv1 performs the worst, since the interface loss of XPINNv1 cannot enforce the interface continuity well, while XPINNv2 performs similarly to PINN, since the two factors in XPINN generalization reach a balance.

• APINN performs better than all XPINNs and PINNs, and APINN-M is slightly better than APINN-X.

### 6.4 Wave Equation

We consider a wave problem given by

 utt=4uxx,x∈[0,1],t∈[0,1]. (25)

The boundary and initial conditions are given by the ground truth solution:

 u(x,t)=sin(πx)cos(2πt), (26)

and is shown in Figure 17 left.

In this example, XPINN is already significantly better than PINN because its relative error is 27However, APINN still performs slightly better than XPINN, even if XPINN is already good enough.

#### 6.4.1 PINN and Hard XPINN

We use a 10-layer tanh network with 3441 neurons and 400 boundary points and 10,000 residual points for PINN.We use 20 weight on the boundary and unity weight for the residual. We train PINN using the Adam optimizer for 100k epochs at an 8E-4 learning rate. XPINN decomposes the domain based on whether . The weights for boundary loss, residual loss, interface boundary loss, interface residual loss, and interface first-order derivative continuity loss are 20, 1, 20, 0, 1, respectively. The sub-nets are 6-layer tanh networks of 20-width with 3522 neurons in total, and we provide 200 boundary points, 5000 residual points, and 400 interface points for all sub-nets in XPINN. The training points of XPINN are visualized in Figure 17 right. We train XPINN using the Adam optimizer for 100k epochs at a learning rate of 1e-4.

#### 6.4.2 Apinn

The APINNs mimic XPINN by pretraining on and mimic MPINN by pretraining on . For other experimental settings, please refer to the introduction of APINN in the first equation.

#### 6.4.3 Results

The results for the wave equation are shown in Table 4. The reported relative errors are averaged over 10 independent runs, which are selected as the error at the epoch with the smaller training loss among the last 10% epochs. The error plots of PINN, XPINNv2, and APINN-X are visualized in Figures 19 left, middle, and right, respectively.

• Although XPINN is already much better than PINN and reduces by 27% the relative of PINN, APINN can still slightly improve over XPINN and performs the best among all models. In particular, APINN-M outperforms APINN-X.

#### 6.4.4 Visualization of Optimized Gating Networks

Some representative optimized gating networks after convergence are visualized in Figure 20

. The first row shows the gate networks of optimized APINN-X, while the second row shows those of APINN-M. In this case, the variance is much smaller, and the optimized gate nets maintain the characteristics at initialization, i.e., those of APINN-X remain an upper-and-lower decomposition and those of APINN-M remain a multi-level partition. Gate nets under the same initialization are also similar in different independent runs, which is consistent with their similar performances.

### 6.5 Boussinesq-Burger Equation

Here we consider the Boussinesq-Burger system, which is a nonlinear water wave model consisting of two unknowns. A thorough understanding of such a model’s solutions is important in order to apply it to harbor and coastal designs. The Boussinesq-Burger equation under consideration is given by

 ut=2uux+12vx,vt=12vxxx+2(uv)x,x∈[−10,15],t∈[−3,2], (27)

where the Dirichlet boundary condition and the ground truth solution is given in lin2022two, and shown in Figure 21 (left and middle) for the unknown and , respectively. In this experiment, we consider a system of PDEs, and try XPINN and APINN with more than two subdomains.

#### 6.5.1 PINN and Hard XPINN

For PINN, we use a 10-layer Tanh network, and provide 400 boundary points and 10,000 residual points. We use 20 weight on the boundary and unity weight for the residual. It is trained by Adam kingma2014adam with an 8E-4 learning rate for 100K epochs.

For domain decomposition of (hard) XPINN, we design two different strategies. First, a XPINN with two subdomains decomposes the domain based on whether . The sub-nets are 6-layer tanh networks of 20-width, and we provide 200 boundary points and 5000 residual points for every sub-net in XPINN. Second, a XPINN4 with four subdomains decomposes the domain based on and into 4 subdomains, whose training points are visualized in Figure 21 right. The sub-nets in XPINN4 are 4-layer tanh networks of 20-width, and we provide 100 boundary points and 2500 residual points for every sub-net in XPINN4. The number of interface points is 400. The weights for boundary loss, residual loss, interface boundary loss, interface residual loss, and interface first-order derivative continuity loss are 20, 1, 20, 0, 1, respectively. We use the Adam optimizer to train XPINN and XPINN4 for 100k epochs with an 8E-4 learning rate. To make a fair comparison, the parameter counts in PINN, XPINN, and XPINN4 are 6882, 7044, and 7368, respectively.

#### 6.5.2 Apinn

For APINN with two subdomains, we pretrain the gate net of APINN-X on the function to mimic XPINN, and pretrain that of APINN-M on the function to mimic MPINN. In APINN-X and APINN-M, all networks have a width of 20. The numbers of layers in the gate network, sub-PINN networks, and shared network are 2, 4, and 5, respectively, with 6945 parameters in total. For APINN with four subdomains, we pretrain the gate net of APINN4-X on the function , where , to mimic XPINN. Furthermore, we pretrain that of APINN4-M on the function , and , to mimic MPINN. The pretrained gate functions of APINN4-X are visualized in Figure 23. In APINN4-X and APINN4-M, and are width 20, while is width 18. The numbers of layers in the gate network, sub-PINN networks, and the shared network are 2, 4, and 3, respectively, with 7046 parameters in total.

#### 6.5.3 Results

The results for the Boussinesq-Burger equation are shown in Tables 5 and 6. The reported relative errors are averaged over 10 independent runs, which are selected as the error at the epoch with the smaller training loss among the last 10% epochs. The key observations are as follows.

• APINN-M performs the best.

• APINN and XPINN with four sub-nets do not perform as well as their two sub-net counterparts, which may be due to the tradeoff between target function complexity and number of training samples in XPINN generalization. Also, more subdomains do not necessarily contribute to parameter efficiency.

• The error of the best performing APINN-M is visualized in Figure 24, which is concentrated near the steep regions, where the solution changes rapidly.

#### 6.5.4 Visualization of Optimized Gating Network

We visualize several representative optimized gating networks after convergence with similar relative errors in Figures 25, 26 and 27, for the APINNs with two subnets, APINN4-X and APINN4-M, respectively. Note that the variance of this Boussinesq-Burger equation is smaller, so these models have similar performances. The key observation is that the gate nets after optimization maintain the characteristics at initialization, especially for APINN-M. Specifically, for APINN-M, the optimized gate networks do not change much from the initialization. For APINN-X, although the position and slope of the interfaces between subdomains change, the optimized APINN-X is still partitioning the whole domain into four upper-to-bottom parts. Therefore, we have the following conclusions.

• Initialization is crucial to the success of APINN, which is reflected in the performance gaps between APINN-M and APINN-X, since the gate networks after optimization maintain the characteristics at initialization.

• APINN with one kind of initialization can hardly be optimized into another kind. For instance, we seldom see gate nets of APINN-M are optimized to be similar to the decomposition of XPINNs.

• These observations are consistent with our Theorem 5.2, which states that a good initialization of the gate net contributes to better generalization, since the gate net does not need to change significantly from its initialization.

• However, based on our extensive experiments, trainable gate nets still contribute to generalization, due to the positive fine-tuning effect, although it cannot optimize a MPINN-type APINN into a XPINN-type APINN and vice versa.

Furthermore, we visualize the optimization trajectory of the gating network for all subnets in the Boussinesq-Burger equation in Figure 28 in the Appendix, where each snapshot is the gating net at epochs = 0, 10, 20, 30, 40, and 50. The change is fast and continuous.

## 7 Summary

In this paper, we propose the Augmented Physics-Informed Neural Networks (APINN) method, which employs a gate network for soft domain partition that can mimic the hard eXtended PINN (XPINN) domain decomposition and is trainable and fine-tunable. The gate network satisfying the partition-of-unity property averages several sub-networks as the output of APINN. Moreover, it adopts partial parameter sharing for sub-nets. It has the following advantages over the state-of-the-art generalized space-time domain decomposition based XPINN method:

• APINN does not include the complicated interface losses to maintain the continuity between different sub-networks (sub-PINNs) due to the gate network decomposing the entire domain in a soft way, which also contributes to better convergence and lower training loss.

• The gate network can mimic the hard decomposition of XPINN, such that APINN enjoys the advantage of XPINN in that it can decompose the complicated target function into several simpler parts to reduce the complexity and improve the generalizability of each sub-network.

• The trainable gate network enables fine-tuning the domain decomposition to discover a better function as well as domain decomposition for simpler parts, contributing to better generalization based on hu2021extended.

• The parameter sharing in APINN utilizes the essential idea that each sub-PINN is learning one part of the same target function, so that the commonality can be well captured by the shared part.

• Each sub-networks in APINN takes advantage of all training samples within the domain to prevent over-fitting. By contrast, sub-networks in XPINN can only utilize part of the training samples.

All of the benefits are justified empirically on various PDEs and theoretically in hu2021extended using the PINN generalization theory. More specifically, we prove the generalization bound for APINNs with fixed and trainable get networks. Since APINNs with certain gate networks can recover PINN and XPINN, they have the advantages of the two models due to their trainability and flexibility. It is shown that APINN enjoys the benefit of general domain and function decomposition, which reduces the complexity of the optimized networks to improve generalization. In terms of parallelization, APINN shares more data points as well as parameters than XPINNs, and thus can be more expensive than the XPINN method.

## Acknowledgment

A. D. Jagtap and G. E. Karniadakis would like to acknowledge the funding by OSD/AFOSR MURI Grant FA9550-20-1-0358, and the US Department of Energy (DOE) PhILMs project (DE-SC0019453).

## Appendix A Proof

### a.1 Preliminary

The proof depends on Rademacher complexity and covering number defined below.

###### Definition A.1.

(Rademacher Complexity). Let be a dataset containing samples. The Rademacher complexity of a function class on is defined as , where

are independent and identically distributed (i.i.d.) random variables taking values uniformly in

.

###### Definition A.2.

(Matrix Covering) We use to denote the least cardinality of any subset that covers at scale with norm , i.e.,

They are correlated as below.

###### Lemma A.1.

bartlett2017spectrally Let be a real-valued function class taking values in , and assume that . Then