Schwarz Waveform Relaxation Physics-Informed Neural Networks for Solving Advection-Diffusion-Reaction Equations

This paper develops a physics-informed neural network (PINN) based on the Schwarz waveform relaxation (SWR) method for solving local and nonlocal advection-diffusion-reaction equations. Specifically, we derive the formulation by constructing subdomain-dependent local solutions by minimizing local loss functions, allowing the decomposition of the training process in different domains in an embarrassingly parallel procedure. Provided the convergence of PINN, the overall proposed algorithm is convergent. By constructing local solutions, one can, in particular, adapt the depth of the deep neural networks, depending on the solution's spectral space and time complexity in each subdomain. We present some numerical experiments based on classical and Robin-SWR to illustrate the performance and comment on the convergence of the proposed method.



There are no comments yet.


page 18


Convergence Analysis of Waveform Relaxation Method to Compute Coupled Advection-Diffusion-Reaction Equations

We study the computation of coupled advection-diffusion-reaction equatio...

Learning Green's Functions of Linear Reaction-Diffusion Equations with Application to Fast Numerical Solver

Partial differential equations are often used to model various physical ...

On computation of coupled advection-diffusion-reaction problems by Schwarz waveform relaxation methods

A study is presented on the computation of coupled advection-diffusion-r...

AL-PINNs: Augmented Lagrangian relaxation method for Physics-Informed Neural Networks

Physics-Informed Neural Networks (PINNs) has become a prominent applicat...

Parallel Physics-Informed Neural Networks via Domain Decomposition

We develop a distributed framework for the physics-informed neural netwo...

Neural Born Iteration Method For Solving Inverse Scattering Problems: 2D Cases

In this paper, we propose the neural Born iteration method (NeuralBIM) f...

hp-VPINNs: Variational Physics-Informed Neural Networks With Domain Decomposition

We formulate a general framework for hp-variational physics-informed neu...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This paper focuses on the derivation and analysis of a Neural-Network (NN) based Schwarz Waveform Relaxation (SWR) Domain Decomposition Method (DDM) for solving partial differential equations (PDE) in parallel. We will focus in this paper on a simple diffusion-advection-reaction equation. Still, the proposed strategy applies to any other evolution (in particular wave-like) equations, for which the convergence of SWR is proven

Antoine and Lorin (2018, 2017); Antoine et al. (2018); Antoine and Lorin (2019b); Halpern and Szeftel (2010); Gander and Halpern (2007); Gander et al. (1999); Antoine and Lorin (2019a); Gander (2003, 2006); Dolean,V. et al. (2015). We derive a combined DDM-SWR and Physics-Informed Neural Network (PINN) method for solving local and nonlocal diffusion-advection-reaction equations. The latter was developed by Karniadakis et al. Raissi et al. (2019); Pang et al. (2019); Yang et al. (2020)

and is a general strategy in scientific machine learning for solving PDE using deep neural networks via the minimization of well-designed loss functions. Notice that in

Jagtap and Karniadakis (2020), was also proposed a more direct DDM for solving PDE. Interestingly, both (Jagtap and Karniadakis (2020) and the one presented here) methods could actually be combined; this was however not tested in this paper. Let us also mention a recent paper Heinlein et al. (2021), where a combination of Schwarz DDM with NN-based solvers is proposed for stationary PDE. Beyond the derivation of the SWR-NN method, this paper’s objective is to exhibit some fundamental properties that make this methodology very promising. The general principle is to solve Initial Boundary Value Problems (IBVPs) by constructing local solutions (subdomain-dependent) obtained by minimizing local loss functions. The overall strategy is convergent (provided that the PINN method is convergent) and allows, in particular, to locally decompose the training process in different subdomains within an embarrassingly parallel procedure. The construction of local solutions also allows to locally adapt the depth of the deep neural network, depending on the solution’s spectral space and time complexity in each subdomain.

In this paper, we will primarily focus on the derivation aspects and will not necessarily detail all the computational aspects, particularly regarding the selection of the training points. This will, however, be specified in the Numerical Experiment section. For convenience, we shall recall some basic aspects about PINNs, neural networks and SWR method for evolution equations, which shall be used in the paper later.

1.1 Basics on PINNs

Let us recall the principle of PINNs for solving, e.g., an evolution PDE over ,


where i) is a differential operator in space, and a differential or algebraic boundary operator over the domain boundary ; ii) are and are imposed functions. The PINN approach, which generalizes the DE-solver from Lagaris Lagaris et al. (1998) consists in parameterizing (by say ) a NN, approximating the solution to (1), by minimizing (a discrete version of) the following loss function

where are some free positive parameters and for some large , and where (resp. ) denotes the norm over (resp.

). Practically, the loss functions are constructed by estimating the values at a very large number of space

time-training points . Hence the -norm are not exactly computed, but only approximated. Karniadakis and collaborators have developed numerous techniques to improve the efficiency and rate of convergence of PINN-algorithms for different types of PDE. We refer for instance to Raissi et al. (2019); Pang et al. (2019); Yang et al. (2020) for details. Ultimately, the PINN strategy is to provide more efficient solvers than standard methods (finite difference, finite-volume, finite-elements, spectral methods, pseudospectral methods, etc) for high dimensional (stochastic or deterministic) PDEs. As far as we know, this is not clearly yet established, which justifies the active research in this field and the developed of new methods.

1.2 Basics of Neural Networks

We here recall the basics of neural networks. We denote the neural network with and where we denote by

the unknown parameters. Neural networks usually read (for 1 hidden layer, machine learning)


where are the sigmoid transfer functions, and is the number of sigmoid units, are the weights and

the bias. When considering several hidden layers (deep learning), we have then to compose functions of the form (

2), Després (2021). That is

where for , is defined from (with ) to by ,

is an activation function,

and where where layers are considered. The layer is the input layer and is the output layer, such that . In fine, from to .

1.3 Basics on SWR methods for evolution equations

In this subsection, we recall the principle of SWR-DDM for solving evolution PDE. Consider a -dimensional first order in time evolution partial differential equation in the spatial domain , and time domain , where is a linear differential operator in space. The initial data is denoted by , and we impose, say, null Dirichlet boundary conditions on . We present the method for 2 subdomains, although in practice an arbitrary number of subdomains can be employed. We first split into two open subdomains , with or without overlap ( or ), and . The SWR algorithm consists in iteratively solving IBVPs in , using transmission conditions at the subdomain interfaces . The imposed transmission conditions are established using the preceding Schwarz iteration data in the adjacent subdomain. That is, for , and denoting the solution in , we consider


with a given initial guess , where denotes a boundarytransmission operator and where are internal boundaries. Classical SWR (CSWR) method consists in taking as the identity operator while Optimized-SWR (OSWR) method consists in taking for some well chosen (optimized from the convergence rate point of view)

, and outward normal vector

to . The OSWR method is then a special case of Robin-SWR methods. In addition, in order to provide a faster convergence than CSWR, the OSWR method is often convergent even for non-overlapping DDM. The latter is hence of crucial interest from the computational complexity point. We refer to Gander and Halpern (2007); Gander (2003, 2006); Dolean,V. et al. (2015); Gander et al. (1999) for details. The convergence criterion for the Schwarz DDM is typically given for any , by


with small enough. When the convergence of the full iterative algorithm is obtained at Schwarz iteration , one gets the converged global solution in . The reconstructed solution , is finally defined as .

1.4 Advection-diffusion-reaction equation

Rather than considering a general situation, for which the rapid convergence of the SWR method and efficiency are not necessarily proven, we propose to focus on the advection-diffusion-reaction equation, for which both properties are established in Gander and Halpern (2007) (see also Halpern and Szeftel (2010); Antoine and Lorin (2018, 2017); Antoine et al. (2018); Antoine and Lorin (2019b) for the Schrödinger equation). Let us consider the following initial boundary-value problem: find the real function solution to the advection-diffusion-reaction equation on , ,


with initial condition , and the real-valued space-dependent smooth reaction term , advection vector and diffusion .
We recall from Gander and Halpern (2007), that for considering with constant coefficients in (5), and for , , there exists a unique weak solution in . Moreover, if and , there exists a unique weak solution in .

1.5 Organization of the paper

The rest of the paper is organized as follows. In Section 2, we derive the combined SWR-PINN method, and some properties are proposed in Sections 2.2 and 2.4. Section 3 is devoted to some numerical experiments illustrating the convergence of the overall SWR-PINN method. We make conclusive remarks in Section 4.

2 SWR-PINN method

In this section, we propose to combine PINN-based solvers with SWR-DDM to solve the advection-diffusion-reaction equation on a bounded domain , imposing null Dirichlet boundary conditions at . For the sake of simplicity of the presentation, the derivation is proposed for two subdomains; the extension to an arbitrary number of subdomains is straightforward.

2.1 Derivation of the SWR-PINN method

The standard SWR method for two subdomains consists in solving the IVBP using the following algorithm


where is a boundary operator, and where we recall that . The well-posedness and the convergence of this method and its rate of convergence were established in Gander and Halpern (2007) for different types of transmission conditions. SWR algorithms can actually be reformulated as a fixed point methods (FPM), and their rate of convergence is hence determined by a contraction factor of the FPM. More specifically, it is proven in Halpern and Szeftel (2010) that for

where , , in , the CSWR method is convergent and has a convergence rate (contract factor), at least, given by

In fact, this can be refined to superlinear convergence rate . For Robin-SWR methods, with transmission conditions , the convergence rate is actually improved

as . We notice in particular, that a crucial element for the rate of convergence of SWR methods, is the size of the overlapping zone. However, overlapping is not required for Robin-SWR methods to converge.
Rather than a standard approximation of (6) using a finite elementsdifference or pseudospectral methods Antoine and Lorin (2019b); Halpern and Szeftel (2010); Antoine et al. (2018), we then propose to solve this system using a PINN method. We denote by the generic NN to optimize, where denotes the unknown parameters. The SWR-NN hence consists in searching for an approximate solution to the SWR method by applying local PINN algorithms. That is, we now consider

Remark 2.0

For the CSWR method ( is the identity operator), it is proven in Gander and Halpern (2007), among many other well-posed results, that for and with and some compatibility conditions, the algorithms (6) is well-posed in .
Let us now denote and . In Theorem 3.3 from Gander and Halpern (2007) it is stated that for any and for some positive constant

where we have denoted

Now if we assume (convergence of the PINN-method) that the NN solution to (7) is such that there exist


We then trivially deduce the convergence of the overall PINN-CSWR method. Similar conclusions can be reached for the PINN-OSWR algorithms. In particular in Gander and Halpern (2007), the OSWR method is shown to be convergent in .

First, we use the standard technique to include the initial condition Lagaris et al. (1998), by searching for a trial network in the form

For the sake of simplicity of the notations, we will yet denote by the neural networks, which includes the contribution of the initial data. Notice that this step is not essential, but allows to simplify the loss functions. Hence at each Schwarz iteration we solve (7), by minimization the following two local “independent” loss functions , for some positive parameters , and where we have denoted . In particular, we benefit from local training processes (subdomain-dependent), which allows us to potentially avoid using the stochastic gradient method orand improve its convergence. Typically, the mini-batches would actually correspond to training points for the local loss functions under consideration. At Schwarz iteration , we hence minimize

where were computed at the Schwarz iteration . Recall that practically the loss functions are numerically evaluated by approximating the norm using training points, typically randomly chosen in . This method allows for a complete spatial decoupling of the problem over 2 (or arbitrary number of) subdomains. Finally, the reconstructed solution is hence defined as for all . More specifically


and we define the solution to the advection-diffusion-reaction equation as

Practically, in order to evaluate the loss functions, it is necessary to compute the equation at some very large randomly chosen training points in , as the norms are not exactly performed. From the point of view of the computation of the loss function (requiring the evaluation of the PDE at the training points), the algorithm is hence trivially embarrassingly parallel. From the optimization point of view, the method now requires minimizing two loss functions. Naturally, the computation of the minimization problems is embarrassingly parallel as the two IBVPs are totally decoupled. As we are now considering two IBVPs on smaller spatial domains, we can locally adapt the depth of the local networks.
It is important to mention that, unlike SWR methods combined with standard numerical (finite-difference, -volume, -elements, pseudospectral) methods, for which convergence can be proven, the combination of SWR and PINN methods will not necessarily ensure convergence to zero of the residual history. This is due to the fact that from one Schwarz iteration to the next, the reconstructed solutions may slightly differ as the minima obtained by minimization of the local loss functions will a priori slightly differ. This fact is actually inherent to the NN-based method. However, we expect the residual history to be small from a practical point of view and for loss functions sufficiently small. In addition to this argument, let us mention that the transmission condition is naturally not exactly satisfied if it is included in the loss function. A large enough weight can, for instance, be imposed on the transmission constraint to ensure that it is accurately satisfied.

2.2 About the interest of using SWR DDM for NN-based algorithms

The estimation of the loss function using the direct PINN-method for solving local PDE is trivially embarrassingly parallel, as the estimation is independently performed at any given training point. However, this associated minimization problem (batch-size related) is not locally specific, and Stochastic Gradient Method (SGM) is hence an essential tool. In the proposed approach, the local loss functions which are evaluated have specific meanings; and allows to get accurate approximations of the solution in any given subdomain.

The SWR method is a domain decomposition method in space for solving PDE. Using standard advection-diffusion-reaction equation solvers, the main algorithmic costs are the loss function estimations and the computation of solutions to linear systems at each time iteration, involved in implicit or semi-implicit stable schemes Gander and Halpern (2007). The latter has a polynomial complexity , where is typically dependent on the structure of the matrix involved in the linear system. Using a PINN approach, there are naturally no more linear systems to solve “to estimate” the solution. Instead, an optimization algorithm is necessary to parameterize the NN-based solution. Denoting by the (a priori very large) number of space-time training points to construct the loss function, and the total number of parameters. The computation of solution using the direct PINN method is decomposed into two parts:

  • Estimation of the loss function, with a complexity with . This step is hence embarrassingly parallel for local PDE (with or without combination with the SWR method.)

  • Minimization of the loss function with a complexity for . Typically stochastic gradient methods Robbins and Monro (1951); Bottou (2010); Sun et al. (2020) are used to deal with possibly high dimensionality (for very accurate solutions) of this minimization problem and allows for a relatively efficient parallelization.

Within the framework of DDM and for two subdomains, the SWR-NN indeed requires the embarrassingly parallel minimization of two independent loss functions constructed using local training. The empirical argument which justifies the proposed methodology is as follows. The structure and complexity of the solution is thought to be “divided” in the two (much more in practice of course) spatial subdomains. As a consequence, based on the local structure of the solutions in , the depth of the local neural networks can then be adaptedreduced compared to the one-domain PINN approach with one unique neural network. The extreme case in that matter, would be a domain decomposition into small finite volumes, where the solution would be approximated by a constant (cell-center finite volume method) that is 0-depth NNs, even if the overall solution has a complex spatial structure. Naturally, the larger the subdomain size, the deeper the depth of the searched local neural network associated to this subdomain. For two subdomains, the minimization step within the SWR-NN consists in solving in parallel

rather than (for direct method)

where . That is, it is possible to decompose the minimization problem in several spatial subregions, where the spectral structure of the solution can be very different from on subdomain to the next, requiring locally smaller depths than using a unique global deep NN. Hence, for SWR-NN method we have to perform the following tasks.

  • Estimation of the loss functions with a complexity . This step is hence embarrassingly parallel within the SWR method and allows to deal with local (subdomain-dependent) training points.

  • Minimization of the local loss functions with a complexity for , where in principle .

The downsize of SWR-NN methods is that it requires to repeat times (that until convergence) the computation of the uncoupled systems. Unlike standard SWR-DDM, where the gain is on the computation of local linear systems of smaller size, the main interest is that we locally solve local (and less complex) minimization problems, where we expect the size of the search space to be smaller.

Notice that the SWR-DDM allows for an embarrassing parallelization of the overall PINN PDE solver. Indeed, unlike the standard computation of the (local) minima of the loss function, which requires non-trivial non-embarrassingly parallelization, the proposed approach allows for the embarrassingly parallel computation of minima of local loss functions. Three levels of parallelization are then possible

  • Trivial parallelization of the estimation of the local loss functions.

  • Embarrassingly parallel computation of the minima of the local loss functions.

  • In addition, the minimum of a local loss function can also be performed in parallel using the domain decomposition method for PINN, as proposed in see Jagtap and Karniadakis (2020).

From the computational point of view, the SWR-PINN algorithm allows i) to adapt the depth of (most of) local NNs compared to using one unique (global) NN, and ii) to estimate the local loss functions using local subdomain-dependent training points and potentially allows for using direct (none-stochastic) gradient methods for a sufficiently large number of subdomains. This step is the analog of the reduction of the size of the linear systems to be solved (scaling effect) within standard SWR when are applied as real space solvers Antoine and Lorin (2018, 2017); Antoine et al. (2018).

We here summarize the overall computational complexity of SWR-PINN and direct PINN methods.

  • Direct approach: . In this case, we expect to be large, and depends on the used optimization algorithm.

  • SWR approach: . In this case, we expect . As is strictly greater than there is a scaling effect which makes this approach potential more efficient. Moreover the prefactor is also though to be much smaller using SWR methods. Practically it is required for to be small enough. As it is well-known the choice of the transmission conditions is a crucial element to minimize . Dirichlet transmission conditions is known to provide very slow convergence. At the opposite of the spectrum and for wave-like equations, Dirichlet-to-Neumann like transmission conditions are known to provide extremely fast convergence, but can be computationally complex to approximate. Another way to accelerate the convergence of the SWR algorithm consists in increasing the subdomain overlap (that is increase ) For the advection-diffusion-reaction equation, optimized SWR method, based on optimized Robin-transmission conditions is a good compromise between convergence rate and computational complexity Halpern and Szeftel (2010). As specified above, the computation of the loss function is embarrassingly parallel unlike the direct approach.

2.3 Nonlocal operator

We have argued above that the use of SWR methods allows for an efficient parallel computation of the overall loss functions through the efficient estimation (using local training points) of local loss functions. We show below that whenever nonlocal terms are present in the equation, the efficiency of the SWR-PINN method is not deteriorated by those terms. In the following, we assume that the equation contains a nonlocal operator , typically defined as a convolution product:

  • , with , fractional derivative in space, modeling nonlocal effect Lischke et al. (2020). The latter is actually defined as a convolution. We refer to De Oliveira and Tenreiro Machado (2014) for details.

  • where denotes the spatial convolution product, a nonlocal potential for some given function .

We consider the equation on a truncated domain with boundary , as follows


and such that is defined as a convolution product in space

Then the SWR-PINN scheme reads


where was computed at the previous Schwarz iteration, with some transmission operator . Hence in this case, we still have to minimize local loss functions

Practically, we can approximate the convolution product as follows. Denoting by the local spatial training points, for , we approximate


for some weights .

As it was discussed above the interest of using a DDM is to decompose the training and search of the local solution over smaller set of parameters. However, whenever the equation is nonlocal, it is necessary to extend the search of the parameters in the global computational domains. More specifically, for local equations, the local NN-solution in (resp. ) only requires parameters in (resp. ). However, if the equation is nonlocal, in order to construct the solution in , we have in principle to search the NN parameters in all , containing both and , as the solution values in depend on values of the solution in . This problem would also occur to construct the loss function, using the direct PINNs method, within the term

The SWR-PINN method allows to deal with this issue, as at Schwarz iteration , the loss function in is evaluated through the solution in at the previous Schwarz iteration from thanks to the previously evaluated parameter .

2.4 How about a non-iterative domain decomposition in space?

The domain decomposition method which is derived in this paper is a SWR-in space method which is an iterative method allowing for the convergence of the decomposed solution towards the exact solution of the PDE under consideration. The main weakness of this DDM is the fact that the decoupled system has to be solved several times (iterative method). It is hence natural to ask if a “similar spatial domain decomposition”, but non-iterative, is possible.
In this goal, we decompose the domain as above: , with or without overlap ( or ), with and consider (9). That is we search for a solution of the form for all such that


where . The PINN-method then consists in solving


Therefore, in this case, we still have to minimize local loss functions. However, there are 2 main issues:

  • Even if is taken equal to zero, the decoupling of the solution in the two subdomains, naturally induces a discontinuity at the subdomain interfaces. It is possible to impose additional compatibility conditions (to be included in the loss function), in the form of continuity condition at , differentiability, but the reconstructed global solution (such that ), will obviously not be an approximate solution to the equation under consideration. Moreover, the compatibility conditions will induce a re-coupling of the two systems in the spirit of the following item.

  • The two systems, in and are actually also coupled through the nonlocal term. This effect is similar to the addition of a compatibility condition described above. Hence, the computation of the loss functions would not be embarrassingly parallel anymore. This is not an issue in the SWR framework; as in the latter case, say at Schwarz iteration , the nonlocal term uses the approximate solution at the Schwarz iteration , which is a known quantity.

Hence unlike the SWR-PINN method, for which (8) occurs


3 Numerical Experiments

In this section, we propose basic experiments in order to numerically illustrate the convergence of the overall method. The PINN-algorithm was implemented using deep-learning and optimization toolboxes from matlab, DeepXDE Lu et al. (2020) and tensorflow Abadi et al. (2015). Although relatively simple, these experiments illustrate the proof of concept of the proposed strategy and not to provide the best convergence possible (which will be the purpose of a future work).
Experiment 1. We consider the standard advection-diffusion-reaction equation


on with Dirichlet boundary conditions at and such that , , and the initial conditions is . We decompose the domain in two subdomains and with . We here use the Classical Schwarz Waveform Relaxation method, based on Dirichlet transmission conditions , where are the two local NN defined in . We consider the following data: the NN have both layers, with neurons each. We select internal collocation points. We also use a local SGM with epochs and mini-batches size of . In the gradient method the learning rate with decay rate of starting at . We reconstruct the overall solution using a total of prediction points. Initially we take . We report the reconstructed solution after the first SWR iteration (resp. converged SWR algorithm) in Fig. 1 (Left) (resp. 1 (Right)) from two the local solutions in at final time , and overlapping zone of size .

Figure 1: Experiment 1. (Left) Reconstructed solutions after the first Schwartz iteration (Right) Reconstructed solutions at convergence of the SWR method.

The SWR convergence rate is defined as the slope of the logarithm of the residual history according to the Schwarz iteration number, that is , with (for 2 subdomains)


being a small parameter.
We report in Fig 2 (Left) the graph of convergence of the stochastic gradient methods applied to each local loss functions. Notice that each “oscillation” corresponds to a new Schwarz iteration. We report in Fig. 2 (Right) the graph of convergence of the SWR-method in the form of the residual history in the overlapping zone. The combination of convergent SWR methods with standard numerical (finite element, finite-difference) methods for which there is a uniform convergence to zero of the residual history as a function of the Schwarz iterations.

Figure 2: Experiment 1. (Left) Converged reconstructed solutions (Right) Graph of local loss functions.

Notice that, unlike the converged SWR method combined with standard numerical methods where the residual goes to zero when goes to infinity, the residual does not exactly go to zero. This is due to the fact that from one Schwarz iteration to the next, the (local) solution are obtained by constructing “new” local minima as the local loss functions are small but not null, and hence change from one iteration to the next.

Experiment 2. In the following experiment, we implement a Robin-SWR method for solving (13), which is expected to provide better convergence than CSWR Halpern and Szeftel (2010). As it was discussed in Halpern and Szeftel (2010) and recalled above, the optimized SWR (and more generally Robin-SWR) methods is convergent, even without overlap, that is when is null. We consider the same equation as above with , , and . The initial conditions is . We decompose the domain in two subdomains and , with hence . The Robin transmission conditions , where are the two local NN defined in and we have taken . We consider the following data: the NN have both layers, with neurons each. We select internal collocation points. We also use local SGM with epochs and mini-batches size of . In the gradient method the learning rate with decay rate of starting at . We reconstruct the overall solution using a total of prediction points. Initially we take . We report the reconstructed solution after the first SWR iteration (resp. converged SWR algorithm) in Fig. 3 (Left) (resp. 3 (Right)) from two the local solutions in at final time .

Figure 3: Experiment 2. (Left) Reconstructed solutions after the first Schwartz iteration (Right) Reconstructed solutions at convergence of the SWR method.

We next report in Fig 4 (Left) the graph of convergence of the stochastic gradient methods applied to each local loss functions. We report in Fig. 4 (Right) the graph of convergence of the SWR-method in the form of the residual history in the overlapping zone.

Importantly, we observe that Robin-SWR-PINN still converges even if the two subdomains do not overlap.

Figure 4: Experiment 2. (Left) Converged reconstructed solutions (Right) Graph of local loss functions.

Experiment 2bis. In the following non-overlapping 2-domain Robin-SWR experiment, we now consider that the diffusion coefficient is space-dependent; more specifically, (resp. ) for (resp. ). The rest of the data are as follows: , , , , . The initial condition is given by and is such that the solution has a very different structure in the two subdomains. We want here to illustrate the ability of the derived approach to select different depths of the local neural networks, depending on the structure of the solution: in (resp. ) the solution is mainly null (resp. oscillatory), except close to the interface. The two subdomains are and , with hence . The two local NN , over have the following structure: (resp. ) possesses (resp. ) layers and (resp. ) neurons. The minimization process in is much more efficiently performed than in with a relatively similar accuracy. As above, we select internal collocation points. We also use local SGM with epochs and mini-batches size of . In the gradient method the learning rate with decay rate of starting at . We reconstruct the overall solution using a total of prediction points. Initially we take . We report the reconstructed solution after the first SWR iteration (resp. converged SWR algorithm) in Fig. 5 (Top-Left) (resp. 5 (Top-Right)) from two the local solutions in at final time . We also zoom in (5, (Bottom-Left)), in the interface region to better observe the SWR convergence. The local loss functions are represented in Fig. 5 (Bottom-Right). We observe that roughly, the computational time to perform the solution in was times faster than in .

Figure 5: Experiment 2bis. (Top-Left) Converged solutions after the first Schwarz iteration. (Top-Right) Reconstructed converged solution. (Bottom-Left) Zoom in of the converged solution in the interface region. (Bottom-Right) Local loss function convergence.

Experiment 3. In this last experiment, we consider a two-dimensional advection-diffusion equation on a square .

with and and . The two subdomains are and where ; hence the interfaces are located at . The initial data is a Gaussian function and the final computational time is . A classical SWR algorithm is here combined with the PINN method. On the other subdomain boundaries, we impose null Dirichlet boundary conditions. The equation is solved using the library DeepXDE Lu et al. (2020) combined with tensorflow Abadi et al. (2015). In each subdomain is used a neural network with layers and neurons; Adam’s optimizer is used (learning rate , epoch=) along with activation function. In Fig. 6 (Top), we report the initial data in . In Fig. 6, we represent the solution at the end of the first Schwarz iteration (Left) and fully converged solution (Right) at final time . In future works, we will propose more advanced simulations. The corresponding code is available on GitHub, where the interested reader could find all the relevant information regarding the code.

Figure 6: Experiment 3. (Top) Initial data in . (Bottom-Left) Solutions after the first Schwarz iteration at time . (Bottom-Right) Reconstructed converged solution at time .

4 Conclusion

In this paper, we have derived a Schwarz Waveform Relaxed Physics-Informed Neural Networks (SWR-PINN) method for solving advection-diffusion-reaction equations in parallel. Some preliminary illustrating experiments are presented to validate the approach.

4.1 Pros. and cons. of the SWR-PINN method

We summarize below the pros and cons of the proposed method.

  • Embarrassingly parallelization of the local loss function training.

  • Parallel construction of local neural networks with adaptive depth and complexity.

  • For convergent PINN algorithms, the SWR-PINN is convergent.

  • Flexible choice of the transmission conditions.


  • As a fixed point method, SWR methods require several iterations.

  • The transmission conditions must be accurately satisfied through a penalization term in the loss function in order to accurately implement the SWR algorithm. Ideally, we should directly include the transmission within the definition in the NN. This is possible, considering the CSWR (Dirichlet-based transmission conditions) method and the following NN,

  • Convergence or high precision of the overall algorithm can be hard to reach if the PINN algorithm is not used with sufficiently high precision. Instable numerical behavior can also be observed with the CSWR method.

4.2 Concluding remarks and future investigations

As far as we know, this paper is the first attempt to combine the SWR and PINN methods. Although the theory of SWR-DDM is now well developed in terms of convergence and convergence rate for different types of evolution PDE and their approximation with finite difference and finite element methods, the theory of convergence of PINN is not yet complete. Consequently, the convergence of the overall SWR-PINN method is still subject to the proof of convergence of the latter, which is largely empirically established. In future works, we plan to focus on “real-life” experiments where the main benefits of the SWR-PINN will be exhibited and illustrated.


  • [1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Note: Software available from External Links: Link Cited by: §3, §3.
  • [2] X. Antoine, F. Hou, and E. Lorin (2018) Asymptotic estimates of the convergence of classical Schwarz waveform relaxation domain decomposition methods for two-dimensional stationary quantum waves. ESAIM Math. Model. Numer. Anal. 52 (4), pp. 1569–1596. External Links: ISSN 0764-583X Cited by: §1.4, §1, §2.1, §2.2.
  • [3] X. Antoine and E. Lorin (2017) An analysis of Schwarz waveform relaxation domain decomposition methods for the imaginary-time linear Schrödinger and Gross-Pitaevskii equations. Numer. Math. 137 (4), pp. 923–958. Cited by: §1.4, §1, §2.2.
  • [4] X. Antoine and E. Lorin (2018) Multilevel preconditioning technique for Schwarz waveform relaxation domain decomposition method for real- and imaginary-time nonlinear Schrödinger equation. Appl. Math. Comput. 336, pp. 403–417. External Links: ISSN 0096-3003 Cited by: §1.4, §1, §2.2.
  • [5] X. Antoine and E. Lorin (2019) Asymptotic convergence rates of Schwarz waveform relaxation algorithms for Schrödinger equations with an arbitrary number of subdomains. Multiscale Science and Engineering 1 (1), pp. 34–46. Cited by: §1.
  • [6] X. Antoine and E. Lorin (2019) On the rate of convergence of Schwarz waveform relaxation methods for the time-dependent Schrödinger equation. J. Comput. Appl. Math. 354, pp. 15–30. Cited by: §1.4, §1, §2.1.
  • [7] L. Bottou (2010)

    Large-scale machine learning with stochastic gradient descent

    In Proceedings of COMPSTAT’2010, pp. 177–186. External Links: MathReview Entry Cited by: item 2.
  • [8] E.C. De Oliveira and J.A. Tenreiro Machado (2014) A review of definitions for fractional derivatives and integral. Mathematical Problems in Engineering 2014. Cited by: item 1.
  • [9] B. Després (2021) Analyse numérique et neural networks. Technical report Université de Paris, Cited by: §1.2.
  • [10] Dolean,V., P. Jolivet, and F. Nataf (2015) An introduction to domain decomposition methods: theory and parallel implementation. External Links: Link Cited by: §1.3, §1.
  • [11] M. Gander and L. Halpern (2007) Optimized Schwarz waveform relaxation methods for advection reaction diffusion problems. SIAM J. Numer. Anal. 45 (2). Cited by: §1.3, §1.4, §1, §2.1, §2.2, Remark 1.
  • [12] M.J. Gander, L. Halpern, and F. Nataf (1999) Optimal convergence for overlapping and non-overlapping Schwarz waveform relaxation. In Proceedings of the 11th International Conference on Domain decomposition, pp. 27–36. Cited by: §1.3, §1.
  • [13] M.J. Gander (2003) Optimal Schwarz waveform relaxation methods for the one-dimensional wave equation. SIAM J. Numer. Anal. 41, pp. 1643–1681. Cited by: §1.3, §1.
  • [14] M.J. Gander (2006) Optimized Schwarz methods. SIAM J. Numer. Anal. 44, pp. 699–731. Cited by: §1.3, §1.
  • [15] L. Halpern and J. Szeftel (2010) Optimized and quasi-optimal Schwarz waveform relaxation for the one-dimensional Schrödinger equation. Math. Models Methods Appl. Sci. 20 (12), pp. 2167–2199. External Links: ISSN 0218-2025, MathReview (Boris Ettinger) Cited by: §1.4, §1, item 2, §2.1, §3.
  • [16] A. Heinlein, A. Klawonn, M. Lanser, and J. Weber (2021) Combining machine learning and domain decomposition methods for the solution of partial differential equations—a review. GAMM Mitteilungen 44 (1). Cited by: §1.
  • [17] A. D. Jagtap and G. E. Karniadakis (2020) Extended physics-informed neural networks (XPINNs): a generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations. Commun. Comput. Phys. 28 (5), pp. 2002–2041. External Links: ISSN 1815-2406, Document, Link, MathReview Entry Cited by: §1, item 3.
  • [18] I.E. Lagaris, A. Likas, and D.I. Fotiadis (1998) Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks 9 (5), pp. 987–1000. Cited by: §1.1, §2.1.
  • [19] A. Lischke, G. Pang, M. Gulian, F. Song, C. Glusa, X. Zheng, Z. Mao, W. Cai, M.M. Meerschaert, M. Ainsworth, and G. E. Karniadakis (2020) What is the fractional Laplacian? A comparative review with new results. J. Comput. Phys. 404. Cited by: item 1.
  • [20] L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis (2020) DeepXDE: a deep learning library for solving differential equations. External Links: 1907.04502 Cited by: §3, §3.
  • [21] G. Pang, L. Lu, and G.E. Karniadakis (2019) FPINNs: fractional physics-informed neural networks. SIAM J. Sci. Comput. 41 (4), pp. A2603–A2626. External Links: ISSN 1064-8275, Document, Link, MathReview Entry Cited by: §1.1, §1.
  • [22] M. Raissi, P. Perdikaris, and G.E. Karniadakis (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1.1, §1.
  • [23] H. Robbins and S. Monro (1951) A stochastic approximation method. Ann. Math. Statistics 22, pp. 400–407. External Links: ISSN 0003-4851, Document, Link, MathReview (R. P. Peterson) Cited by: item 2.
  • [24] S. Sun, Z. Cao, H. Zhu, and J. Zhao (2020) A survey of optimization methods from a machine learning perspective. IEEE Transactions on Cybernetics 50 (8), pp. 3668–3681. External Links: Link Cited by: item 2.
  • [25] L. Yang, D. Zhang, and G. E. Karniadakis (2020)

    Physics-informed generative adversarial networks for stochastic differential equations

    SIAM J. Sci. Comput. 42 (1), pp. A292–A317. External Links: ISSN 1064-8275, Document, Link, MathReview Entry Cited by: §1.1, §1.