1 Introduction
Timedependent partial differential equations (PDEs) are an essential mathematical tool to describe numerous physical processes in various disciplines, such as wave propagation (Zhou and others, 2004), quantum transport (Manzano and others, 2012; Ezhov and others, 2016), cell diffusion (Hinderliter and others, 2010; Lipkova and others, 2019), among others. Solving the initialvalue problem and the boundaryvalue problem accurately in a computationally efficient way is the primary research interest for these PDE problems.
The numerical solution of timedependant PDEs relies on the appropriate spatiotemporal discretization. Spatial discretization can be implemented using a finite difference, finite element, or finite volume method. For temporal discretization under Eularian settings, either explicit, implicit, or semiimplicit methods can be used. Explicit temporal update rules are generally a single or few forward computation steps. In contrast, implicit or semiimplicit update rules, such as CrankNicolson’s scheme, require a fixedpoint iterative solver. In all the methods mentioned above, smaller time steps and finer spatial resolution facilitate a more accurate solution. At the same time, it substantially increases the computational burden. Moreover, the maximum allowed spatiotemporal resolution is also upper bounded by numerical stability criteria. It can be observed that, in contrast to explicit methods, implicit and semiimplicit methods offer relaxed stability constraints (sometimes unconditionally stable) for admissible time steps at the expense of an increased computational cost caused by the iterative solver.
In recent times, the use of neural networks, e.g., as proposed by Raissi and others (2019) has gained significant attention for supporting numerical computations. Superior performance is achieved in solving forward simulations (Magill and others, 2018; Li and others, 2020; Tompson and others, 2017; Ezhov and others, 2020; Kochkov and others, 2021; Greenfeld and others, 2019; Shit and others, 2021) and inverseproblems (Papamakarios and Murray, 2016; Lueckmann and others, 2017; Long and others, 2018; Ezhov and others, 2019; Greenberg and others, 2019). On the contrary to the well understood and theoretically grounded classical methods, the deep learningbased approaches rely mainly on empirical validity. Recently, Hsieh and others (2019) developed a promising method to learn numerical solvers while providing a theoretical convergence guarantee. They demonstrate that a feedforward fullyconvolutional network (FCN) trained to correct the error of a single iteration of a linear solver can deliver a faster solution than the handcrafted solver. Astonishingly, for timedependent PDEs, the temporal update step of all previously mentioned applications neural schemes relies on an explicit forward Euler method; none of them is exploiting the powerful implicit and semiimplicit methods. Incorporating implicit iterative schemes into the neural solvers would potentially expand the applicability of neural architectures in realworld applications by solving timedependent PDEs much more efficiently.
1.1 Our Contribution
In this paper, we introduce the first neural solver for timedependant PDEs. First, we construct a neural iterator from a semiimplicit update rule for linear PDEs with the Dirichlet boundary condition. Then, we replace a single iteration of the semiimplicit scheme with a learnable parameterized function such that the fixed point of the algorithm is preserved. We provide theoretical guarantees, similar to Hsieh and others (2019), which prove that our proposed solution converges to the correct solution and generalizes over parameter settings very different from the ones seen at training time. Subsequently, we show how the method can be extended to other types of BC through diffuse domain approximation (Li and others, 2009), which we illustrate on Neumann boundary condition. Next, we extend our learnable framework beyond linear PDEs to a class of nonlinear PDEs. We validate our method on 2D and 3D experiments with Dirichlet, Neumann boundary conditions, and nonlinear PDEs. Empirically, we show that our model also generalizes well across different complex geometries and produces a more accurate solution than the existing semiimplicit solver while taking much lesser computational cost.
2 Method
In the following, we present the iterative semiimplicit scheme (Section 2.1), our proposed learnable approximation of the iterative update (Section 2.2), how to implement the Neumann boundary condition (Section 2.3), demonstrate its applicability to a broad class of nonlinear PDEs (Section 2.4), and describes the neural solver learning setup (Section 2.5).
2.1 Timedependent Linear PDEs with Dirichlet Boundary Condition
First, we consider the initial value problem of variable of interest governed by a timedependent linear PDE with Dirichlet boundary condition of the following form,
(1) 
where is the domain (e.g., or ), is the boundary with boundary values and the initial value at time . is a linear combination of spatial partial differential operators and its corresponding parameter . Without the loss of generalizability, we choose a uniform discretization step for all spatial dimensions and we can discretize it in the following matrix form:
(2) 
where , , is a diagonal matrix consisting of the values of corresponding to the discrete differential operator of order , which is a Toeplitz matrix. We denote at time , as . A first order semiimplicit update rule to get from (with time step ) is given by
(3) 
To obtain , one needs to solve the following linear system of equations
(4) 
where is independent of and is the central element of the central difference discretization of . Note that for central difference scheme,
is real, zerodiagonal, and either circulant or skewcirculant matrix.
One can use an iterative scheme to compute from an arbitrary initialization on the righthandside of Eq. 4. We denote updated value of as , and for the ease of notation, we introduce and as following:
(5)  
(6) 
Using the and notations the iterator can be written as:
(7) 
and by enforcing the Dirichlet boundary condition using a projection step with a binary boundary mask , the iterator becomes:
(8) 
where is the boundary value. Notice that is independent of . In Hsieh and others (2019), a formal definition of a valid iterator for solving linear system of equation is provided invoking convergence and fixed point criteria. Using the same definition, we show that with an appropriate choice of and , Eq. 8 is a valid iterator.
[] For an appropriate choice of and , the linear iterator
is valid. See Appendix. For simplicity of notation, we drop the superscript of for a single update. Hence right hand side of the Eq. 8 can be seen as a linear operator
(9) 
where and .
2.2 Neural Solver
We propose the following endtoend trainable iterator to replace the analytically derived iterator in Eq. 9, following a similar structure to Hsieh and others (2019)
(10) 
where and are the learnable operators, which satisfy . Substituting in Eq. 10, we get to the linear neural operator
(11) 
where is independent of , and . Following, we show that the neural iterator has the same fixed point () as the handdesigned iterator in Eq. 8.
[] For a given linear PDE problem and any choice of that satisfies , a fixed point of is also a fixed point of .
Follows trivially from the definition.
By construction, the neural iterator in Eq. 10 inherits the properties of the iterator proposed by Hsieh and others (2019). The most notable one is that, if then . Furthermore, if , then since
(12) 
which is equal to two iterations of . Since computing requires two separate convolutions: i) , and ii) . Since, a convolution operation needs roughly the same computation time as a differential operator, one iteration of requires same order complexity of two iterations of . This shows that we can learn a set of , such that the proposed iterator performs at least as good as the standard solver .
Following the theoretical framework of Hsieh and others (2019), we extend their results for the case of the proposed neural iterator. The usefulness of any trainable iterator naturally depends on its training efficacy and the ability to generalize. Thus for trainability, we first show that the spectral norm of is a convex function of and the relevant space of is a convex open set. Subsequently, we show that the iterator generalizes over arbitrary initialization, boundary value, and PDE parameters.
[] For fixed , the spectral norm of is a convex function of , and the set of , such that the spectral norm of is a convex open set. See Appendix.
To prove generalizability, we first need an upper bound on the spectral norm of .
[] For a choice of and , if is a valid iterator, is also a valid iterator.
See Appendix. In stark contrast with previous work by Hsieh and others (2019), we have several sets of parameters , and attached to the PDEs governing equation. The following theorem allows us to train the proposed model on finite parameter settings and still generalizes well.
[] For a fixed and , and some and , if is a valid iterator for the PDE problem , then for all and , the iterator is a valid iterator for the PDE problem , if and are chosen such that See Appendix.
2.3 Neumann Boundary Condition
Li and others (2009) proposed diffuse domain approach to accurately approximate a Neumann boundary condition on a complex geometry into a Dirichlet boundary condition on a simple geometry. They introduce a phase field function to take care of a smooth transition between the domain and boundary. We consider the following Neumann boundary condition
(13) 
where is the unit outwards surface normal at the boundary.
As shown in Fig 1, the computational domain can be realized as a phasefield function , where inside the domain and outside the domain. At the boundary, this creates a smooth transition from .
Now, we solve the following approximated PDE problem
(14) 
It can be shown that without the smoothness in the boundary of phasefield function, Eq. 14 reduces to Eq. 13 (since within domain , we get the original PDE back, and at boundary gives the original boundary condition). The advantage of the phasefield approximation is that we can only approximate the exact surface normal orientation up to a certain accuracy for voxelbased computation, depending on the discretization resolution. Thus our surface normal orientation is noisy, resulting in an unstable boundary condition for a highly irregular shape of boundary that is a common feature of many applications. However, when we use the smoothness on the boundary, we have a consistent orientation of the resultant surface normals at the cost of the reduced magnitude of the surface normal. Eq. 14 is also of linear form and can easily be converted to the solution formulation of Eq.8 and, hence, we can apply the proposed neural solver.
2.4 Extending to a Class of Nonlinear PDEs
We consider the following nonlinear class of PDEs described in Boscarino and others (2016):
(15) 
where is any linear operator and is the nonstiff nonlinear operator. Since is nonstiff, we can use explicit schemes to solve the nonlinear component., while we continue to use the semiimplicit formulation for . Thus, an implicit/explicit scheme arises [Boscarino and others (2016)], which is used in many problems, including convectiondiffusion equations, reactiondiffusion equations, collisional kinetic equations, etc. Because of using an explicit scheme for , it is evident that the nonlinear term remains in the form of a constant in the iterative update rule. Thus, the theoretical guarantees are valid for this particular class of nonlinear PDEs.
2.5 Learning Setup
We model each of
with a separate three layer convolutional neural network without any bias and nonlinear activation function (c.f. Fig.
2). For each time step, the network takes the initial variable, PDE parameters and the hyperparameters as input and produces the solution for the next time points using a fixed number of iterative forward passes.
iteration for a single time step denoted as . Starting from , we denote the solution at after iterations, i.e., as . For number of time steps at this process repeats and we get the solutions, i.e., as . The reference solutions can be easily obtained from using a sufficiently large till machine precision convergence.We minimize the following mean squared loss function,
(16) 
where
is the maximum number of timesteps. We select the number of iteration from an uniform distribution between
to avoid overfitting during training.3 Experiments
In this section, we present experimental validation for the proposed neural solver. We aim to answer the following questions:

How well does the neural solver converge for different sets of PDE parameters?

How good is the proposed solver for different classes of linear PDEs in different dimensionality?

Does the neural solver generalize over different computation domains, which are not seen during the training period?

How is the performance for the class of nonlinear PDEs that we describe in the method section?

How is the accuracy and runtime gain for the neural solver compared to the traditional numerical schemes?
To answer these questions, we design two experiments: in 2D and 3D space, with linear and nonlinear PDEs, with different PDE parametrization, in different simulation domains, and boundary conditions. In Sec. 3.1 and 3.2, we detail two experiments. From this, we draw precise answers to Q15 and a brief discussion of the results in Sec. 3.3.
3.1 2D Linear PDE with Dirichlet Boundary
We consider a 2D advectiondiffusion equation of the following form
(17) 
where and are advection velocity and diffusivity respectively.
3.1.1 Data Generation
We follow the experimental setup by Long and others (2018) and consider a rectangular domain of . Elements of and are drawn from a uniform distribution of and respectively. The computational domain is discretized by a 64 x 64 regular mesh. We assume zero Dirichlet boundary condition and the initial value is generated according to Long and others (2018) as where and
are drawn from a normal distribution of
, and, and have random values from a uniform distribution of . We generate 200 simulations, each with 50 time steps, using FEniCS (Alnæs and others, 2015) for . FEniCS uses finite element discretization and we found that the FEniCS solution is identical to the one we obtain from our semiimplicit solver with sufficiently large number of iterations per time step. This confirms that there is no discrepancy in the discretization and implementation of the boundary conditions for our semiimplicit and neural solver. An exemplary time series of a test data is shown in Fig 3.3.1.2 Experimental Details
We split our train, test, and validation set of the simulated time series in . During training, we fixed the following parameters as follows . The elements of and are drawn from the same distribution as before. We investigate the effect of different parameter settings than those we used during training to validate the generalizability of the neural scheme. To study the effect of different , we use the original test set. We generate two additional test cases varying one parameter at a time: a) , and b) . We implement all differential operators using the convolutional kernel. The convolutions have kernel size 3 3 3 and are unique for each differential operator. Following Hsieh and others (2019), we use a threelayer convolutional neural network to model each of the
with constant padding of one in between. The network is trained using Adam Optimizer with a learning rate
, betas 0.9 and 0.99 for 20 epochs. The total training time is
6 hours in a Quadro P6000 GPU.3.1.3 Results
From Fig. 4, we see that error from the neural scheme is less compared to the error from the semiimplicit solution for all three different test sets, with varying , , and respectively. Note that we sample the value of from the same ranges as the training set, however the exact values are different. We observe that one neural solver iteration takes approximately twice the time of one semiimplicit iteration. Hence, for a fair comparison, we use 10 iterations per time step for our neural solver compared to 25 iterations for the semiimplicit solver. Thus, this experiment affirms our hypothesis that the neural solver is more accurate compared to the semiimplicit solution while keeping generalizability to other PDE hyperparameter settings at the same time.
Fig. 3 shows a typical test sample solution from the neural and semiimplicit schemes against our reference FEniCS solution. Qualitatively, we find that the neural solver is producing an accurate solution with a lesser number of iterations, which suggests the learned CNN achieves faster convergence.
For runtime comparison, we chose CPU time because we also want to compare against FEniCS, which runs on CPU. We compare the run time for the neural solver (10 iterations per time step) and semiimplicit scheme (25 iterations per time step). Note that here we only compare the test time of the neural solver after it is trained. The experiments are conducted on an Intel Xeon W2123 CPU @ 3.60GHz, with code running on one of the four cores. Fig. 5 shows the ratio of runtime and error between neural solver and semiimplicit solver. We performed two experiments: we make both neural and semiimplicit solvers run for the same time and compare their error; we set the same average error and compare the runtime of the solvers. From Fig. 5a, we can clearly see that given a fixed runtime neural solver produces 49.3% error reduction, whereas for a given error acceptance the neural solver is 19.2% faster, Fig. 5b. We also observe that the trained neural solver takes circa 0.0148s compared to 0.0141s for the semiimplicit scheme, whereas the FEniCS solution takes 3.19s for machine precision convergence.
3.2 3D Nonlinear PDE with Neumann Boundary
We consider a Fisher–Kolmogorov equation for modelling tumor cell density in human brains as used, e.g., in Lipkova and others (2019). This equation consists of a reaction and a diffusion term, which is as follows
(18) 
where is given by:
(19) 
and denote the % of the white and gray matter tissue at the voxel, respectively. The constants and describe tumor infiltration rate in white and gray matter, respectively and it is assumed that . The unit for is []. parameterizes the proliferation rate (the number of cells that divide per day). Its unit is .
It can already be seen that this equation differs in several ways from the previously considered 2D equation and other commonly used equations in the existing literature that aim at leveraging neural networks in their solver strategy. The Fisher–Kolmogorov equation is a nonlinear timedependent PDE with spatially dependent PDE parameters subject to Neumann boundary condition, which to the best of our knowledge, is being used in this paper for the first time when leveraging a neural solver. Also, it is important to point out that the human brain possesses very irregular geometry, which in turn offers a wide variety of simulation domains. Although there is no theoretical guarantee that the learned neural solver will generalize to an unseen domain, e.g., arising from the brain anatomy of a new patient, we experimentally show that it does.
Following the diffuse domain approach in Sec. 2.3, we reformulate the problem as
(20)  
(21) 
This allows us to recast the Neumann boundary condition to a Dirichlet boundary condition given by:
(22) 
where is the entire voxels volume. To adapt this reformulation to fit into our update rule, we write,
(23) 
3.2.1 Data Generation:
For our training data, we randomly generated 72 initial conditions with different and at random initial tumor locations. To acquire the reference data, we let a semiimplicit solver run to convergence for each time step. Here the reference solution means, in this case, the fixed point of the iterator on which the neural solver is built (in this case, the semiimplicit solver). We obtain the reference solution by running the semiimplicit solver in implicit mode ( for maximum convergence speed) until convergence. This way, the ground truth data need not necessarily be simulated and stored beforehand and can be computed on the fly during training. Per initial condition, we run the training for 19 timesteps with days, mm. Thus, in the end, we have 20 time points for each sample.
3.2.2 Experimental Details
We assign randomly selected 49 samples as training and 23 samples as the test set. After we generated our training data and found our reference, we run the neural solver on a random number of iterations (between 5 and 10) per time step. The cost is evaluated as the mean squared error between the previously calculated fixed point (the reference solution) and the neural output, as given in Equation (16). We implement all differential operators using the convolutional kernel. The convolutions have kernel size 3 3 3 and are unique for each differential operator. The network is trained using Adam Optimizer with a learning rate of 1e3, betas 0.9, and 0.99. The training error converged after 13 epochs.
3.2.3 Results
To better understand the mechanism of the solver with visual illustration, we look at few iterations of an example solution of the PDE, Fig. 7. Importantly, Fig. 7 highlights that the neural part works as a corrective rather than a prediction of its own. The neural net has learned to correct the errors from the semiimplicit iterator, so that the fixed point is reached faster. Note, however, that the speed of growth naturally depends significantly on the parameters. Also, an important observation here is that the correction terms are stronger in the initial iterations since the discrepancy between the fixed point and the current solution is larger in the initial iterations.
Fig. 6 shows a qualitative visual comparison between the neural solver and the semiimplicit solver in contrast to the reference solution. While both the neural solver and the semiimplicit solver took the same runtime, the neural solver shows faster convergence as the time step increases.
Figure 8 shows the ratios between the semiimplicit solver and the neural solver for their respective computation time and the mean absolute errors for the test data. As in the 2D case, we performed two experiments testing the solvers’ error upon a fixed running time and the solver’s running time upon a fixed error level. We observe that for a given runtime, the neural solver reduces the error by 87%, Fig. 8a. In addition, for a given error acceptance the neural solver is 35% faster, Fig. 8b.
In our experiments, the test set consists of parameters and computation domain, which were not seen during the training time. This confirms that the solver is able to generalize over other parameters and computation domains.
3.3 Discussion
From two experiments, we draw the following observations.

Both experiments confirm the theoretical convergence property of the neural solver in different PDE parameter settings.

We observe that our solver is well applicable to the given advectiondiffusion and reactiondiffusion equations in 2D and 3D, which indicates its generalizability to other linear PDEs as well.

In the 3D experiment, we have very different computation domains in a different region of brain tissue. Hence, we empirically show that our solver generalizes to the arbitrary shape of the boundary.

The reactiondiffusion experiment shows that our model can handle the specific class of nonstiff nonlinear PDEs really well.

We observe significant speedup compared to the traditional solver in both experiments. We observe this to be more pronounced in the 3D setting than in 2D. We hypothesize that in 3D, the neural solver can leverage the domainspecific spatially varying PDE parameters more efficiently in the learned PDE solutions.
4 Conclusion
This work introduces a novel implicit neural scheme to solve timedependent PDEs in arbitrary geometry and boundary conditions. We leverage an existing semiimplicit update rule to design a learnable iterator that provides theoretical guarantees. The learned iterator achieves faster convergence compared to the existing semiimplicit solver and produces a more accurate solution for a fixed computation budget. Importantly, we empirically demonstrate that training on a single parameter setting is enough to generalize over other parameter settings which confirms our theoretical results. The learner neural solver offers computationally and scalable alternative to standard numerical approaches. The increased computational efficiency expand computational possibilities and enables simulations of complex real world systems.
Acknowledgement
S. Shit and I. Ezhov are supported by the Translational Brain Imaging Training Network under the EU Marie SklodowskaCurie programme (GrantID: 765148). We thank Prof. Dr. Elisabeth Ullmann for giving critical feedback on this work.
Appendix A Proofs
See 2.1 The spectral radius of can be bounded by:
Thus given we have . We can therefore conclude that the newly introduced iterator from Eq. 8 is a valid fixedpoint iterator for a proper choice of and .
See 2.2 The proof is similar to that of (2.2). The spectral norm is convex form the subadditive property and is linear in . To prove that it is open, observe that is a continous function, so is continous in . Given , the set of is the preimage under this continous function of for some , and the inverse image of the open set must be open.
See 2.2 Considering the spectral norm of and invoking product and triangular inequality of norms, we obtain the following tight bound:
Given we have , hence
See 2.2 From Theorem (2.1) and Lemma (2.2) we know that our iterator is valid if and only if . From Lemma (2.2) the upper bound of the spectral norm of the iterator depends only on and given Nonetheless, for any matrix the spectral radius is upper bounded by its spectral norm. Thus, if the iterator is valid for some and then it is feasible for any choice of and that satisfy the constraints.
References
 The FEniCS project version 1.5. Archive of Numerical Software 3 (100). Cited by: §3.1.1.
 High order semiimplicit schemes for time dependent partial differential equations. Journal of Scientific Computing 68 (3), pp. 975–1001. Cited by: §2.4, §2.4.
 Influence of screening on longitudinaloptical phonon scattering in quantum cascade lasers. Journal of Applied Physics 119 (3), pp. 033102. Cited by: §1.

Neural parameters estimation for brain tumor growth modeling
. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention, Cited by: §1.  Realtime bayesian personalization via a learnable brain tumor growth model. arXiv preprint arXiv:2009.04240. Cited by: §1.

Automatic posterior transformation for likelihoodfree inference.
In
International Conference on Machine Learning
, pp. 2404–2414. Cited by: §1.  Learning to optimize multigrid pde solvers. In International Conference on Machine Learning, pp. 2415–2423. Cited by: §1.
 ISDD: a computational model of particle sedimentation, diffusion and target cell dosimetry for in vitro toxicity studies. Particle and fibre toxicology 7 (1), pp. 1–20. Cited by: §1.
 Learning neural PDE solvers with convergence guarantees. In Proceedings of the International Conference on Learning Representations, Cited by: §1.1, §1, §2.1, §2.2, §2.2, §2.2, §2.2, §3.1.2.
 Machine learning–accelerated computational fluid dynamics. Proceedings of the National Academy of Sciences 118 (21). Cited by: §1.
 Solving pdes in complex geometries: a diffuse domain approach. Communications in mathematical sciences 7 (1), pp. 81. Cited by: §1.1, §2.3.
 Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895. Cited by: §1.

Personalized radiotherapy design for glioblastoma: integrating mathematical tumor models, multimodal scans, and bayesian inference
. IEEE transactions on medical imaging 38 (8), pp. 1875–1884. Cited by: §1, §3.2.  PDEnet: learning PDEs from data. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80, pp. 3208–3216. Cited by: §1, §3.1.1.
 Flexible statistical inference for mechanistic models of neural dynamics. In NeurIPS, pp. 1289–1299. Cited by: §1.
 Neural networks trained to solve differential equations learn general representations. In Advances in Neural Information Processing Systems, pp. 4071–4081. Cited by: §1.
 Quantum transport efficiency and fourier’s law. Physical Review E 86 (6), pp. 061118. Cited by: §1.
 Fast free inference of simulation models with bayesian conditional density estimation. In NeurIPS, pp. 1028–1036. Cited by: §1.
 Physicsinformed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686 – 707. Cited by: §1.
 Velocitytopressure (v2p)net: inferring relative pressures from timevarying 3d fluid flow velocities. In International Conference on Information Processing in Medical Imaging, pp. 545–558. Cited by: §1.
 Accelerating Eulerian fluid simulation with convolutional networks. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70, pp. 3424–3433. Cited by: §1.
 The periodic wave solutions and solitary wave solutions for a class of nonlinear partial differential equations. Physics Letters A 323 (12), pp. 77–88. Cited by: §1.
Comments
There are no comments yet.