1 Introduction
Deep learning has become popular in scientific computing and is widely adopted in solving forward and inverse problems involving partial differential equations (PDEs). The physicsinformed neural network (PINN) raissi2019physics is one of the seminal works in utilizing deep neural networks to approximate PDE solutions by optimizing them to satisfy the data and physical laws governed by the PDE. Furthermore, the extended PINN (XPINN) jagtap2020extended is a followup work of PINN, which first proposes spacetime domain decomposition to partition the domain into several subdomains, where several subnets are employed to approximate the solution on their subdomains, while the solution continuity between them is enforced via interface losses. Then, its output is the ensemble of all subnets. The theoretical analysis of when XPINNs can improve generalization over PINNs is of great interest. The recent work of Hu et al., hu2021extended analyzes the tradeoff in XPINN generalization between the simplicity of decomposed target function in each subdomain and the overfitting effect due to less available training data in each subdomain, which counterbalance each other to determine if XPINN can improve generalization over PINN. However, sometimes the negative overfitting effect incurred by the less available training data in each subdomain dominates the positive effect of simpler partitioned target functions. Furthermore, XPINNs may also suffer from relatively large errors at the interfaces between subdomains, which degrades the overall performance of XPINNs.
In this paper, we propose the Augmented PINN (APINN), which employs a gate network for soft domain partitioning to mimic the hard XPINN decomposition, which can be finetuned for a better decomposition. The gate network gets rid of the need for interface losses and weightaverages several subnets as the output of APINN, where each subnet is able to utilize all training samples in the domain in order to prevent overfitting. Moreover, APINN adopts an efficient partial parameter sharing scheme for subnets, to capture the similar components in each decomposed function. To further understand the benefits of our APINN, we follow the theory in hu2021extended to theoretically analyze the generalization bound of APINN, compared to those of PINN and XPINN, which justifies our intuitive understanding of the advantages of APINN. Concretely, generalization bounds for APINNs with trainable or fixed gate networks are derived, which show the advantages of soft and trainable domain and function decomposition in APINN. We also perform extensive experiments on several PDEs that validate the effectiveness of our APINN. Specifically, we have examples where XPINN performs similarly to or worse than PINN, so that APINN can significantly improve both. Moreover, we present cases where XPINN is already much better than PINN, but APINN can still slightly improve XPINN. In addition to the superior performance of APINN, we also visualize the optimized gating networks and the optimization trajectories and then relate their shapes with their performances to select the potentially best decomposition. We show that if APINN is initialized by the optimal decomposition, then it can perform even better, which suggests strategies for designing the optimal domain decomposition for a given PDE problem.
2 Related Work
The PINN raissi2019physics is one of the pioneering frameworks that employs deep learning techniques to solve forward and inverse problems governed by parametrized PDEs. PINN has been successfully used to solve many problems in the field of computational science since its initial publication; for more information, see raissi2018hidden; yang2019adversarial; jagtap2022deep; haghighat2021physics; jagtap2022deepKNN. The original idea of domain decomposition in the PINN method was proposed in jagtap2020conservative for nonlinear conservation laws and named Conservative PINN (CPINNs). In subsequent work, the same authors proposed XPINN jagtap2020extended for general spacetime domain decomposition, where there is a subPINN on each subdomain for fitting the target function on this subdomain, while the continuity between subPINNs is enforced via additional interface losses (penalty terms). The Parallel PINN shukla2021parallel is the followup work of CPINN and XPINN, where CPINN and XPINN are trained on multiple GPUs or CPUs simultaneously. Parareal PINN meng2020ppinn decomposes a longer time domain into several shorttime subdomains, which can be efficiently solved by a coarsegrained (CG) solver and PINN, so that Parareal PINN shows an obvious speedup over PINN in longtime integration PDEs. The main limitation of Parareal PINN is, it cannot be applied to all types of PDEs. The hpVPINN kharazmi2021hp proposes a variational PINN method to decompose the domain when defining a new set of test functions, while the trial functions are still neural networks defined over the whole domain. DDM li2020deep uses the Schwarz method for overlapping domain decomposition and training the subnets iteratively rather than in parallel like XPINNs and CPINNs. Also, mercier2021coarse extends DDM through coarse space acceleration for improved convergence across a growing number of domains. li2022deep also uses the Schwarz method, but the subnets are multiFourier feature networks instead.
The finite basis PINN (FBPINN) moseley2021finite proposes dividing the domain into several small, overlapping subdomains, with each of them using PINNs. Although FBPINN eliminates the need for interface conditions, our model differs from it in the following aspects: First, our domain decomposition is flexible and trainable, while FBPINN fixes the decomposition. Second, FPINN does not allow parameter sharing for the efficiency of subnetworks. Moreover, the overlapping subdomains in FBPINN can become computationally costly for multidimensional problems. The penaltyfree neural networks (PFNN) sheng2022pfnn propose the idea of overlapping domain decomposition, which is different from our models for the same reasons as FBPINN. The GatedPINN stiller2020large proposes to adopt the idea of a mixture of experts (MoE) shazeer2017outrageously to modify XPINNs. Although they also use a gate network to weightaverage several subPINNs, their GatePINN is different from our APINN because the gate function in GatedPINN is randomly initialized, while that in our APINN is pretrained on an XPINN domain decomposition. Furthermore, they do not consider efficient parameter sharing for subnets to improve model expressiveness. dong2021local; DWIVEDI2021299 also proposes a similar idea of domain decomposition as in XPINNs. However, they use extreme learning machines (ELMs) to replace the neural networks in XPINNs, where only the parameters at the last layer are trained. Based on variational principles and the deep Ritz method, D3M li2019d3m further combines the Schwarz method for overlapping domain decomposition. Compared to our trainable domain decomposition, the domains in D3M are fixed during optimization. To learn optimal modifications on the interfaces of different subdomains, taghibakhshi2022learning
proposes using graph neural networks and unsupervised learning.
heinlein2021combining presents a review on the domain decomposition method for numerical PDEs.The first comprehensive theoretical analysis of PINNs as well as XPINNs for a prototypical nonlinear PDE, the NavierStokes equations, is proposed in Ryck2021ErrorAF. The generalization abilities of PINNs and XPINNs are theoretically analyzed in hu2021extended while the generalization and optimization capabilities of deep neural networks have been analyzed in the general field of deep learning in kawaguchi2018generalization; kawaguchi2022robustness; kawaguchi2016deep; kawaguchi2019depth; xu2021optimization; kawaguchi2021theory; kawaguchi2022understanding.
3 Problem Definition and Background
3.1 Problem Definition
We consider partial differential equations (PDEs) defined on the bounded domain , with the following form:
(1) 
For matrix norms, we denote the spectral norm by and norms by . In the following, we introduce the formulations of PINN raissi2019physics and XPINN jagtap2020extended.
3.2 PINN and XPINN
The PINN is motivated by optimizing neural networks to satisfy the data and physical laws governed by a PDE to approximate its solution. Given a set of boundary training points and residual training points , the ground truth PDE solution is approximated by the PINN model , by minimizing the training loss containing a boundary loss and a residual loss:
(2) 
where PINN learns boundary conditions in the first term, while learning the physical laws described by the PDEs in the second term.
The XPINN extends PINN by decomposing the domain
into several subdomains where several subPINNs are employed. The continuity between each subPINNs is maintained via the interface loss function, and the output of XPINN is the ensemble of all subPINNs, where each of them makes predictions on their corresponding subdomains. Concretely, domain
is decomposed into subdomains as . The loss of XPINN contains the sum of the PINN losses of the subPINNs, including boundary and residual losses, plus the interface losses using points on the interfaces of different subdomains , where such that to maintain the continuity between the two subPINNs and . Specifically, XPINN loss for the th subPINN is(3) 
where is the weight controlling the strength of the interface loss, is the parameters for subdomain , and each is the PINN loss for subdomain containing boundary and residual losses, i.e.,
(4) 
where and are the number of boundary points and residual points in subdomain respectively, and and are the th boundary and residual training points in subdomain , respectively. Furthermore, is the interface loss between the th and th subdomains based on interface training points
(5)  
where , is the number of interface points between the th and th subdomains, while is the th interface points between them. The first term is the average solution continuity between the th and the th subnets, while the second term is the residual continuity condition on the interface given by the th and the th subnets. We will refer to the XPINN model introduced above as XPINNv1, since it is exactly the model proposed in the original work of jagtap2020extended.
In practice, XPINNv1 may exhibit relatively larger errors near the interface, i.e., the interface losses in XPINNv1 cannot necessarily maintain the continuity between different subPINNs. This is because the enforcement of residual continuity conditions for PDEs involving higherorder derivatives is difficult to maintain accurately due to the obvious presence of higherorder derivatives. Therefore, de2022error introduces enforcing the continuity of firstorder derivatives between different subPINNs to resolve the issue:
(6) 
where is the problem dimension, i.e., . With this additional term on firstorder derivatives, we name the corresponding XPINN model as XPINNv2.
4 Augmented PINN (APINN)
4.1 Parameterization of Augmented PINN
In this section, we introduce the model parameterization of APINN, which is graphically shown in Figure 1. We consider a shared network (blue), where is the input dimension and is the hidden dimension, and subnets (red), where each , and a gating network (green) where is the dimensional simplex, for weightaveraging the outputs of the subnets. The output of our augmented PINN (APINN) parameterized by is:
(7) 
where is the th entry of , and is the collection of all parameters in , and . Both and are trainable in our APINN, while can be either trainable or fixed. If is trainable, we name the model APINN, otherwise we call it APINNF.
The APINN is a universal approximator. The detailed proof is as follows.
Proof.
(The APINN is a universal approximator) Denote the function class of all neural networks as , then it is universal, i.e., for all continuous functions and , there exists a neural network , such that . In addition, we also denote the function class of gating network by
, which collects all vectorvalue neural networks mapping
to .Back to the APINN model, and denote the function class of APINN as , which is
(8) 
If we choose , then APINN degenerates to a vanilla multilayer network since , i.e., . Therefore, since multilayer neural networks are already universal approximator and it is a subset of APINN model, APINN is a universal approximator. ∎
In APINN, is pretrained to mimic the hard and discrete decomposition of XPINN, which will be discussed in the next subsection. If is trainable, then our model can finetune the pretrained domain decomposition to further discover a better decomposition through optimization. If not, then APINN is exactly the soft version of XPINN with the corresponding hard decomposition. APINN is better than PINN thanks to the adaptive domain decomposition and parameter efficiency.
4.2 Explanation of the Gating Network
In this section, we will show how the gating network can be trained to mimic XPINNs for soft domain decomposition. Specifically, in Figure 2 left, XPINN decomposes the entire domain , into two subdomains: the upper one , and the lower one , which is based on the interface . The soft domain decomposition in APINN is shown in Figure 2 (middle and right), which are the pretrained gating networks for the two subnets corresponding to the upper and bottom subdomains. Here, is pretrained on and on . Intuitively, the first subPINN focuses on where is larger, corresponding to the upper part, while the second subPINN focuses on where is smaller, corresponding to the bottom part.
Another example is to decompose the domain into an inner part and an outer part, as shown in Figure 3. In particular, we decompose the entire domain , into two subdomains: the inner one , and the outer one . The soft domain decomposition is generated by the gating functions pretrained on and pretrained on , such that the first subnet concentrates in the inner part near , while the second subbet focuses on the rest of the domain.
The gating network can also be adapted for complex domains like the Lshape domain or even highdimensional domains by properly choosing the corresponding gating function.
4.3 Difference in the position of
We have three options for building the model of APINN. First, the simplest idea is that if we omit the parameter sharing in our APINN, then the model becomes:
(9) 
The proposed model in this paper is
(10) 
Another method for parameter sharing is to place outside the weighted average of several subnets.
(11) 
Compared to the first model, our new model given in equation (10) adopts parameter sharing for each subPINN to improve parameter efficiency. Equation (10) generalizes equation (9) by using the same networks and selecting the shared network as identity mapping. Intuitively, the functions learned by each subPINNs should have some kind of similarity, since they are parts of the same target function. The prior of network sharing in our model explicitly utilizes intuition and is therefore more parameter efficient.
Compared to the model given in equation (11), our model is more interpretable. In particular, our model in equation 10 is a weighted average of subPINNs , so that we can visualize each to observe what functions they are learning. However, for equation (11), there is no clear function decomposition due to the being outside, so that visualization of each function component learned is not possible.
5 Theoretical Analysis
5.1 Preliminaries
To facilitate the statement of our main generalization bound, we first define several quantities related to the network parameters. For a network , we denote and for fixed reference matrices , where can vary for different networks. We denote its complexity as follows
(12) 
where signifies the order of derivative, i.e., denotes the complexity of the th derivative of the network. We further denote the corresponding , , and quantities of the subPINN as , and . We also denote those of the gate network as , and .
The train loss and test loss of a model are the same as that of PINN, i.e.,
(13)  
Since the following assumption holds for a vast variety of PDEs, we can bound the test error by the test boundary and residual losses:
Assumption 5.1.
Assume that the PDE satisfies the following norm constraint:
(14) 
where the positive constant does not depend on but on the domain and the coefficients of the operators , and the function class contains all layer neural networks.
The following assumption is widely adopted in related works Luo2020TwoLayerNN; hu2021extended.
Assumption 5.2.
(Symmetry and boundedness of ). Throughout the analysis in this paper, we assume the differential operator in the PDE satisfies the following conditions. The operator is a linear secondorder differential operator in a nondivergence form, i.e., , where all are given coefficient functions and are the firstorder partial derivatives of the function with respect to its th argument (the variable ) and are the secondorder partial derivatives of the function with respect to its th and th arguments (the variables and ). Furthermore, there exists constant such that for all , and , we have and are all Lipschitz, and their absolute values are not larger than .
5.2 A Tradeoff in XPINN Generalization
In this subsection, we review the tradeoff in XPINN generalization, introduced in hu2021extended. There are two factors that counterbalance each other to affect XPINN generalization, namely the simplicity of the decomposed target function within each subdomain thanks to the domain decomposition, and the complexity and negative overfitting effect due to the lack of available training data. When the former effect is more obvious, XPINN outperforms PINN. Otherwise, PINN outperforms XPINN. When the two factors strike a balance, XPINN and PINN perform similarly.
5.3 APINN with NonTrainable Gate Network
In this section, we state the generalization bound for APINN with a nontrainable gate network. Since the gate network is fixed, the only complexity comes from the subPINNs. The following theorem holds for any gate function .
Theorem 5.1.
Assume that 5.2 holds for any
, with a probability of at least
over the choice of random samples with boundary points and residual points, we have the following generalization bound for an APINN model :(15)  
where
Intuition: The first term is the train loss, and the third is the probability term, in which we divide the probability into for a union bound over all parameters in .The second term is the Rademacher complexity of the model. For the boundary loss, the network is not differentiated. So, each contributes , and contributes since it is fixed and is Lipschitz. For the residual loss, the case of the second term is similar. Note that the secondorder derivative of APINN is
(16) 
Consequently, each contributes , while each contributes since it is fixed.
5.4 Explain the Effectiveness of APINN via Theorem 5.1
In this section, we explain the effectiveness of APINNs using Theorem 5.1, which shows that the benefit of APINN comes from (1) soft domain decomposition, (2) getting rid of interface losses, (3) general target function decomposition, and (4) the fact that each subPINN of APINN is provided with all the training data, which prevents overfitting.
For the boundary loss of APINN, we can apply Theorem 5.1 to each of the APINN’s soft subdomains. Specifically, for the th subnet in the th soft subdomain of APINN, i.e., the , the bound is
(17) 
where is the number of training boundary points in the th subdomain.
If the gate net is mimicking the hard decomposition of XPINN, then we assume that the th subPINN focuses on , in particular for , where approaches zero. Note that Theorem 5.1 does not depend on any requirement on the quantity , and we are making such assumption for illustration. Then, the bound reduces to
(18)  
which is exactly the bound of XPINN if the domain decomposition is hard.
Therefore, APINN has the benefit of XPINN, i.e., it can decompose the target function into several simpler parts in some subdomains. Furthermore, since APINN does not require the complex interface losses, its train loss is usually smaller than that of XPINN, and it is free from errors near the interface.
In addition to soft domain decomposition, even if the output of does not concentrate on certain subdomains, i.e., does not mimic XPINN, APINN still enjoys the benefit of general function decomposition, and each subPINN of APINN is provided with all training data, which prevents overfitting. Concretely, for boundary loss of APINN, the complexity term of the model is
which is a weighted average of the complexity of all subPINNs. Note that, similar to PINN, if we view APINN on the entire domain, then all subPINNs are able to take advantage of all training samples, thus preventing overfitting. Hopefully, the weighted sum of each part is simpler than the whole. To be more specific, if we train a PINN, , the complexity term will be . If APINN is able to decompose the target function into several simpler parts such that their complexity weighted sum is smaller than the complexity of PINN, then APINN can outperform PINN.
5.5 APINN with Trainable Gate Network
In this section, we state the generalization bound for APINN with a trainable gate network. In this case, both the gate network and the subPINNs contribute to the complexity of the APINN model, influencing generalization at the same time.
Theorem 5.2.
Let Assumption 5.2 holds, for any , with probability at least over the choice of random samples with boundary points and residual points, we have the following generalization bound for an APINN model :
(19)  
where
Intuition: It is somehow similar to that of Theorem 5.1. Here, we treat the APINN model as a whole. Now, will contribute its complexity, , rather than its infinity norm, since it is trainable rather than fixed.
5.6 Explain the Effectiveness of The APINN via Theorem 5.2
By Theorem 5.2, besides the benefits explained by Theorem 5.1, a good initialization of soft decomposition inspired by XPINN helps generalization. If this is the case, the trained gate network’s parameters will not deviate significantly from their initialization. Consequently, quantities for all and will be smaller, and thus will be smaller, decreasing the right hand side of the bound stated in Theorem 5.2, which means good generalization.
6 Computational Experiments
6.1 The Burgers Equation
The onedimensional viscous Burgers equation is given by
(20)  
The difficulty of the Burgers equation is in the steep region near where the solution changes rapidly, which is hard to capture by PINNs. The ground truth solution is visualized in Figure 4 left. In this case, XPINN performs badly near the interface. Thus, APINN improves XPINN, especially in the accuracy near the interface, both by getting rid of the interface losses and by improving the parameter efficiency.
6.1.1 PINN and Hard XPINN
For the PINN, we use a 10layer tanh network of 20width with 3441 neurons, and provide 300 boundary points and 20000 residual points. We use 20 as the weight on the boundary and 1 as the weight for the residual. We train PINN by the Adam optimizer with 8e4 learning rate for 100k epochs. XPINNv1 decomposes the domain based on whether
. The weights for boundary loss, residual loss, interface boundary loss, and interface residual loss are 20, 1, 20, 1, respectively. XPINNv2 shares the same decomposition as XPINNv1, but its weights for boundary loss, residual loss, interface boundary loss, and interface firstorder derivative continuity loss are 20, 1, 20, 1, respectively. The subnets are 6layer tanh networks of 20width with 3522 neurons in total, and we provide 150 boundary points and 10000 residual points for all subnets in XPINN. The number of interface points is 1000. The training points of XPINNs are visualized in Figure 4 right. We train XPINNs by the Adam optimizer with 8e4 learning rate for 100k epochs. Both models are finetuned by the LBFGS optimizer until convergence after Adam optimization.Model  PINN  XPINNv1  XPINNv2   
Rel.  1.620E37.632E4  1.490E16.781E3  1.304E17.256E3   
Model  APINNXF  APINNX  APINNMF  APINNM 
Rel.  1.293E34.629E4  9.109E43.689E4  1.375E36.732E4  1.137E37.675E4 
6.1.2 Apinn
To mimic the hard decomposition based on whether , we pretrain the gate net on the function , so that the first subPINN focuses on where is larger and the second subPINN focuses on where is smaller.The corresponding model is named APINNX. In addition, we pretrain the gate net on to mimic multilevel PINN (MPINN) anonymous2022multilevel, where the first subnet focuses on the majority part, while the second one is responsible for the minority part. The corresponding model is named APINNM. All networks have a width of 20. The numbers of layers in the gate network, subPINN networks, and shared network are 2, 4, and 3, respectively, with 3462 / 3543 parameters depending on whether the gate network is trainable. All models are finetuned by the LBFGS optimizer until convergence after Adam optimization.
6.1.3 Results
The results for the Burgers equation are shown in Table 1. The reported relative errors are averaged over 10 independent runs, which are the best errors among their whole optimization processes. The error plots of XPINNv1 and APINNX are visualized in Figures 6 left and right, respectively.

XPINN performs much worse than PINN, due to the large error near the interface, where the steep region is located.

APINNX performs the best because its parameters are more flexible than those of PINN, and it does not require interface conditions like in XPINN, so it can model the steep region well.

APINNM performs worse than APINNX, which means that MPINN initialization is worse than the XPINN one in this Burgers problem.

APINNXF with a fixed gate function performs slightly worse than PINN and APINN, which justifies the flexibility of trainable domain decomposition. However, even without finetuning the domain decomposition, APINNXF can still outperform XPINN significantly, which shows the effectiveness of soft domain partition.
6.1.4 Visualization of Gating Networks
Some representative optimized gating networks after convergence are visualized in Figure 7. In the first row, we visualize two gate nets of APINNX. Despite the fact that their optimized gates differ, they retain the original leftandright decomposition with the change in interface position.Thus, their errors are similar. In the second row, we show two gate nets of APINNM. Their performances differ a lot, and they weight the two subnets differently. The third figure uses a weight for subnet1 and a weight for subnet2, while the fourth figure uses weight for subnet1 and weight for subnet2. It means that the training of MPINNtype decomposition is unstable, that APINNM is worse than its XPINN counterpart in the Burgers problem, and that the weight in MPINNtype decomposition is crucial to its final performance. From these examples, we can see that initialization is crucial for APINN’s success. Despite the optimization, the trained gate will still be similar to the initialization.
Furthermore, we visualize the optimization trajectory of the gating network for the first subnet in the Burgers equation in Figure 8, where each snapshot is the gating net at epoch = 0, 1E4, 2E4, 3E4. That for the second subnet can be easily computed using the property of partitionofunity . The trajectory is smooth, and the gating net gradually converges by moving the interface from left to right and shifting the interface.
6.2 Helmholtz Equation
Problems in physics including seismology, electromagnetic radiation, and acoustics are solved using the Helmholtz equation, which is given by
(21)  
The analytic solution is
(22) 
and is shown in Figure 9 left.
In this case, XPINNv1 performs worse than PINN due to the large errors near the interface. With additional regularization, XPINNv2 reduces 47% the relative error compared to PINN, but it still performs worse than our APINN due to the overfitting effect caused by the small availability of the training data in each subdomain.
Model  PINN  XPINNv1  XPINNv2   
Rel.  2.438E35.196E4  5.222E24.001E3  1.297E31.786E4   
Model  APINNXF  APINNX  APINNMF  APINNM 
Rel.  1.554E33.203E4  1.275E34.710E4  1.911E33.850E4  1.477E35.679E4 
6.2.1 PINN and Hard XPINN
For PINN, we provide 400 boundary and 10000 residual points. The XPINN decomposes the domain based on whether , whose training points are shown in Figure 9 right. We provide 200 boundary points, 5000 residual points, and 400 interface points for the two subnets in XPINN. Other settings of PINN and XPINN are the same as those in the Burgers equation.
6.2.2 Apinn
We pretrain the gate net on the function to mimic XPINN, and on to mimic MPINN. For other experimental settings, please refer to the introduction of APINN in the Burgers equation.
6.2.3 Results
The results for the Helmholtz equation are shown in Table 2. The reported relative errors are averaged over 10 independent runs, which are selected as having the lowest errors during optimization. The error plots of XPINNv1, APINNX and XPINNv2 are visualized in Figure 11 left, middle, and right, respectively.

XPINNv1 performs the worst, since its interface loss cannot enforce the interface continuity satisfactorily.

XPINNv2 performs significantly better than PINN, but it is worse than APINNX, because it overfits in the two subdomains a bit due to the small number of available training samples, compared with APINNX.

APINNM performs worse than APINNX due to bad initialization of the gating network.

The errors of APINN, XPINNv2 and PINN concentrate near the boundary, which is due to the gradient pathology wang2021understanding.
6.2.4 Visualization of Optimized Gating Networks
The randomness of this problem is smaller, so that the final relative errors of different runs are similar. Some representative optimized gating networks after convergence of APINNX are visualized in Figure 12. Specifically, every gating network maintains approximately the original decomposition into an upper and a lower domain, despite the fact that the interfaces change a bit in each run. From these observations, the XPINNtype decomposition into an upper and a bottom domain is already satisfactory for XPINN. We also notice that the XPINN outperforms PINN, which is consistent with our observation.
Furthermore, we visualize the optimization trajectory of the gating network for the first subnet in the Helmhotz equation in Figure 13, where each snapshot is the gating net at epoch = 0 to 5E2 with 6 snapshots in all. That for the second subnet can be easily computed using partitionofunity property gating networks, i.e., . The trajectory is similar to the case in the Burgers equation. Here, the gating net of the Helmhotz equation converges much faster than the one in the previous Burgers equation.
Model  PINN  XPINNv1  XPINNv2   
Rel.  3.565E39.412E4  5.980E17.601E2  3.700E32.741E4   
Model  APINNXF  APINNX  APINNMF  APINNM 
Rel.  3.195E38.112E4  3.030E31.474E3  3.197E26.253E3  2.846E38.568E4 
6.3 KleinGordon Equation
In modern physics, the equation is used in a wide variety of fields, such as particle physics, astrophysics, cosmology, classical mechanics, etc. and it is given by
(23)  
Its boundary and initial conditions are given by the ground truth solution:
(24) 
and is shown in Figure 14 left. In this case, XPINNv1 performs worse than PINN due to the large errors near the interface induced by unsatisfactory continuity between subnets, while XPINNv2 performs similarly to PINN. APINN performs much better than XPINNv1 and better than PINN and XPINNv2.
6.3.1 PINN and Hard XPINN
The experimental settings of PINN and XPINN are identical to those of the previous Helmholtz equation, with the exception that XPINN now decomposes the domain based on whether and Adam optimization is performed for 200k epochs.
6.3.2 Apinn
We pretrain the gate net on the function to mimic XPINN, and on to mimic MPINN. For other experimental settings, please refer to the introduction of APINN in the first equation.
6.3.3 Results
The results for the KleinGordon equation are shown in Table 3. The reported relative errors are averaged over 10 independent runs. The error plots of XPINNv1, APINNX, and XPINNv2 are visualized in Figures 16 left, middle, and right, respectively.

XPINNv1 performs the worst, since the interface loss of XPINNv1 cannot enforce the interface continuity well, while XPINNv2 performs similarly to PINN, since the two factors in XPINN generalization reach a balance.

APINN performs better than all XPINNs and PINNs, and APINNM is slightly better than APINNX.
Model  PINN  XPINNv2  APINNX  APINNM 

Rel.  1.900E33.375E4  1.378E32.424E4  1.492E37.041E4  1.299E32.941E4 
6.4 Wave Equation
We consider a wave problem given by
(25) 
The boundary and initial conditions are given by the ground truth solution:
(26) 
and is shown in Figure 17 left.
In this example, XPINN is already significantly better than PINN because its relative error is 27However, APINN still performs slightly better than XPINN, even if XPINN is already good enough.
6.4.1 PINN and Hard XPINN
We use a 10layer tanh network with 3441 neurons and 400 boundary points and 10,000 residual points for PINN.We use 20 weight on the boundary and unity weight for the residual. We train PINN using the Adam optimizer for 100k epochs at an 8E4 learning rate. XPINN decomposes the domain based on whether . The weights for boundary loss, residual loss, interface boundary loss, interface residual loss, and interface firstorder derivative continuity loss are 20, 1, 20, 0, 1, respectively. The subnets are 6layer tanh networks of 20width with 3522 neurons in total, and we provide 200 boundary points, 5000 residual points, and 400 interface points for all subnets in XPINN. The training points of XPINN are visualized in Figure 17 right. We train XPINN using the Adam optimizer for 100k epochs at a learning rate of 1e4.
6.4.2 Apinn
The APINNs mimic XPINN by pretraining on and mimic MPINN by pretraining on . For other experimental settings, please refer to the introduction of APINN in the first equation.
6.4.3 Results
The results for the wave equation are shown in Table 4. The reported relative errors are averaged over 10 independent runs, which are selected as the error at the epoch with the smaller training loss among the last 10% epochs. The error plots of PINN, XPINNv2, and APINNX are visualized in Figures 19 left, middle, and right, respectively.

Although XPINN is already much better than PINN and reduces by 27% the relative of PINN, APINN can still slightly improve over XPINN and performs the best among all models. In particular, APINNM outperforms APINNX.
6.4.4 Visualization of Optimized Gating Networks
Some representative optimized gating networks after convergence are visualized in Figure 20
. The first row shows the gate networks of optimized APINNX, while the second row shows those of APINNM. In this case, the variance is much smaller, and the optimized gate nets maintain the characteristics at initialization, i.e., those of APINNX remain an upperandlower decomposition and those of APINNM remain a multilevel partition. Gate nets under the same initialization are also similar in different independent runs, which is consistent with their similar performances.
6.5 BoussinesqBurger Equation
Here we consider the BoussinesqBurger system, which is a nonlinear water wave model consisting of two unknowns. A thorough understanding of such a model’s solutions is important in order to apply it to harbor and coastal designs. The BoussinesqBurger equation under consideration is given by
(27) 
where the Dirichlet boundary condition and the ground truth solution is given in lin2022two, and shown in Figure 21 (left and middle) for the unknown and , respectively. In this experiment, we consider a system of PDEs, and try XPINN and APINN with more than two subdomains.
6.5.1 PINN and Hard XPINN
For PINN, we use a 10layer Tanh network, and provide 400 boundary points and 10,000 residual points. We use 20 weight on the boundary and unity weight for the residual. It is trained by Adam kingma2014adam with an 8E4 learning rate for 100K epochs.
For domain decomposition of (hard) XPINN, we design two different strategies. First, a XPINN with two subdomains decomposes the domain based on whether . The subnets are 6layer tanh networks of 20width, and we provide 200 boundary points and 5000 residual points for every subnet in XPINN. Second, a XPINN4 with four subdomains decomposes the domain based on and into 4 subdomains, whose training points are visualized in Figure 21 right. The subnets in XPINN4 are 4layer tanh networks of 20width, and we provide 100 boundary points and 2500 residual points for every subnet in XPINN4. The number of interface points is 400. The weights for boundary loss, residual loss, interface boundary loss, interface residual loss, and interface firstorder derivative continuity loss are 20, 1, 20, 0, 1, respectively. We use the Adam optimizer to train XPINN and XPINN4 for 100k epochs with an 8E4 learning rate. To make a fair comparison, the parameter counts in PINN, XPINN, and XPINN4 are 6882, 7044, and 7368, respectively.
6.5.2 Apinn
For APINN with two subdomains, we pretrain the gate net of APINNX on the function to mimic XPINN, and pretrain that of APINNM on the function to mimic MPINN. In APINNX and APINNM, all networks have a width of 20. The numbers of layers in the gate network, subPINN networks, and shared network are 2, 4, and 5, respectively, with 6945 parameters in total. For APINN with four subdomains, we pretrain the gate net of APINN4X on the function , where , to mimic XPINN. Furthermore, we pretrain that of APINN4M on the function , and , to mimic MPINN. The pretrained gate functions of APINN4X are visualized in Figure 23. In APINN4X and APINN4M, and are width 20, while is width 18. The numbers of layers in the gate network, subPINN networks, and the shared network are 2, 4, and 3, respectively, with 7046 parameters in total.
Model  PINN  XPINN  XPINN4  / 
Rel.  1.470E026.297E03  1.456E026.391E03  3.254E021.025E02  / 
Model  APINNM  APINNX  APINN4M  APINN4X 
Rel.  1.091E024.588E03  1.388E024.310E03  1.328E028.099E03  2.559E026.554E03 
Model  PINN  XPINN  XPINN4  / 
Rel.  1.106E014.498E02  9.786E023.485E02  2.706E019.078E02  / 
Model  APINNM  APINNX  APINN4M  APINN4X 
Rel.  8.185E022.973E02  9.623E022.446E02  9.616E025.397E02  1.676E014.946E02 
6.5.3 Results
The results for the BoussinesqBurger equation are shown in Tables 5 and 6. The reported relative errors are averaged over 10 independent runs, which are selected as the error at the epoch with the smaller training loss among the last 10% epochs. The key observations are as follows.

APINNM performs the best.

APINN and XPINN with four subnets do not perform as well as their two subnet counterparts, which may be due to the tradeoff between target function complexity and number of training samples in XPINN generalization. Also, more subdomains do not necessarily contribute to parameter efficiency.

The error of the best performing APINNM is visualized in Figure 24, which is concentrated near the steep regions, where the solution changes rapidly.
6.5.4 Visualization of Optimized Gating Network
We visualize several representative optimized gating networks after convergence with similar relative errors in Figures 25, 26 and 27, for the APINNs with two subnets, APINN4X and APINN4M, respectively. Note that the variance of this BoussinesqBurger equation is smaller, so these models have similar performances. The key observation is that the gate nets after optimization maintain the characteristics at initialization, especially for APINNM. Specifically, for APINNM, the optimized gate networks do not change much from the initialization. For APINNX, although the position and slope of the interfaces between subdomains change, the optimized APINNX is still partitioning the whole domain into four uppertobottom parts. Therefore, we have the following conclusions.

Initialization is crucial to the success of APINN, which is reflected in the performance gaps between APINNM and APINNX, since the gate networks after optimization maintain the characteristics at initialization.

APINN with one kind of initialization can hardly be optimized into another kind. For instance, we seldom see gate nets of APINNM are optimized to be similar to the decomposition of XPINNs.

These observations are consistent with our Theorem 5.2, which states that a good initialization of the gate net contributes to better generalization, since the gate net does not need to change significantly from its initialization.

However, based on our extensive experiments, trainable gate nets still contribute to generalization, due to the positive finetuning effect, although it cannot optimize a MPINNtype APINN into a XPINNtype APINN and vice versa.
Furthermore, we visualize the optimization trajectory of the gating network for all subnets in the BoussinesqBurger equation in Figure 28 in the Appendix, where each snapshot is the gating net at epochs = 0, 10, 20, 30, 40, and 50. The change is fast and continuous.
7 Summary
In this paper, we propose the Augmented PhysicsInformed Neural Networks (APINN) method, which employs a gate network for soft domain partition that can mimic the hard eXtended PINN (XPINN) domain decomposition and is trainable and finetunable. The gate network satisfying the partitionofunity property averages several subnetworks as the output of APINN. Moreover, it adopts partial parameter sharing for subnets. It has the following advantages over the stateoftheart generalized spacetime domain decomposition based XPINN method:

APINN does not include the complicated interface losses to maintain the continuity between different subnetworks (subPINNs) due to the gate network decomposing the entire domain in a soft way, which also contributes to better convergence and lower training loss.

The gate network can mimic the hard decomposition of XPINN, such that APINN enjoys the advantage of XPINN in that it can decompose the complicated target function into several simpler parts to reduce the complexity and improve the generalizability of each subnetwork.

The trainable gate network enables finetuning the domain decomposition to discover a better function as well as domain decomposition for simpler parts, contributing to better generalization based on hu2021extended.

The parameter sharing in APINN utilizes the essential idea that each subPINN is learning one part of the same target function, so that the commonality can be well captured by the shared part.

Each subnetworks in APINN takes advantage of all training samples within the domain to prevent overfitting. By contrast, subnetworks in XPINN can only utilize part of the training samples.
All of the benefits are justified empirically on various PDEs and theoretically in hu2021extended using the PINN generalization theory. More specifically, we prove the generalization bound for APINNs with fixed and trainable get networks. Since APINNs with certain gate networks can recover PINN and XPINN, they have the advantages of the two models due to their trainability and flexibility. It is shown that APINN enjoys the benefit of general domain and function decomposition, which reduces the complexity of the optimized networks to improve generalization. In terms of parallelization, APINN shares more data points as well as parameters than XPINNs, and thus can be more expensive than the XPINN method.
Acknowledgment
A. D. Jagtap and G. E. Karniadakis would like to acknowledge the funding by OSD/AFOSR MURI Grant FA95502010358, and the US Department of Energy (DOE) PhILMs project (DESC0019453).
Appendix A Proof
a.1 Preliminary
The proof depends on Rademacher complexity and covering number defined below.
Definition A.1.
(Rademacher Complexity). Let be a dataset containing samples. The Rademacher complexity of a function class on is defined as , where
are independent and identically distributed (i.i.d.) random variables taking values uniformly in
.Definition A.2.
(Matrix Covering) We use to denote the least cardinality of any subset that covers at scale with norm , i.e.,
They are correlated as below.
Lemma A.1.
bartlett2017spectrally Let be a realvalued function class taking values in , and assume that . Then