I Introduction
Dynamical systems are typically modeled using differential equations that capture their inputoutput behavior. Particularly, for spatiotemporal systems the underlying dynamics are modeled using partial differential equations (PDEs) whose states live in infinitedimensional spaces. Except for some special cases, only approximate solutions to such systems can be obtained using discretizationbased numerical methods. These methods typically belong to one of three main categories, namely, finite difference (FD) methods, finite element (FE) methods, and finite volume (FV) methods [1]. The FD methods are often restricted to simple domains whereas the FE and FV methods can be applied to more general problems. These methods return solutions defined over a set of grid points that sample the spatiotemporal domain; the solution at any other points can be obtained by interpolating the nearby nodal values. This local treatment of the solution makes the models obtained from these methods extremely expressive but also very expensive to evaluate and store.
In this paper we follow a different approach that relies on deep neural networks (NNs) to capture the solutions of PDEs using trainable parameters that are considerably fewer than the number of grid points used in discretizationbased methods. The proposed framework, called VarNet for variational neural networks, employs a novel loss function that relies on the variational (integral) form of PDEs and is discretizationfree and highly parallelizable. Since, compared to the differential form of the PDE, the variational form contains lower order derivatives of the solution and considers segments of spacetime as opposed to single points, the proposed loss function allows to learn solutions of PDEs using significantly fewer training points. Moreover, this loss function allows the NN to learn solutions of PDEs in a modelbased and unsupervised way, where training of the NN parameters is guided by the PDE itself without a need for prior labeling. We also propose a way to optimally select the points needed to train the NN, based on the feedback provided from the residual of the PDE. Using NNs to approximate solutions of PDEs makes it trivial to solve these PDEs parametrically; these parameters need only be included as additional inputs to the NN. Consequently, a great strength of VarNet algorithm is that it can also be used as a powerful model order reduction (MOR) tool for PDEs, since the resulting models are very fast to evaluate.
Ia Relevant Literature
Using NNs to approximate solutions of PDEs can be beneficial for the following reasons: (i) their evaluation is extremely fast and thus, unlike currently available MOR methods, there is no need to compromise accuracy for speed, (ii) parallelization of the training is trivial, and (iii) the resulting model is smooth and differentiable and thus, it can be readily used in PDEconstrained optimization problems, e.g., for source identification [2] or control of PDEs [3]. The authors in [4, Ch. 04]
provide a review of different approaches for solving PDEs using NNs. One group of approaches utilize NNs to memorize the solution of PDEs. Particularly, they solve the PDE using a numerical method to obtain labeled training data and often utilize convolutional NNs (CNNs), being powerful image processing tools, to capture the numerical solution in a supervised learning way
[5]. For instance, [6] proposes a fluid dynamics solver that utilizes a CNN, consisting of encoding and decoding layers, to predict the velocity field in steadystate problems. It reports a stellar speedup of four ordersofmagnitude in computation time. The authors in [7] propose PDENET that is capable of identifying partial differential operators as well as approximating the solution of PDEs. Note that these approaches do not replace numerical methods but rather rely on them and introduce an extra layer of approximation.Another class of works directly discretize the domain and construct a model based on this discretization. For instance, [8]
utilizes radial basis functions to construct a linear system of equations that is solved to obtain an approximate solution of the desired PDE. There exist also another group of methods called FENNs which represent the governing FE equations at the element level using artifical neurons
[9, 10, 11]. Note that both of these approaches scale with the number of discretization points and are similar in spirit to numerical methods like the FE method.Most relevant to the approach proposed in this paper is a set of works that also directly train a NN to approximate the solution of PDEs in an unsupervised learning way. One of the early works of this kind is [12] that uses the residual of the PDE to define the required loss function. In order to remove the constraints from the training problem, the authors only consider simple domains for which the boundary conditions (BCs) can be manually enforced by a change of variables. The work in [13] elaborates more on this technique. Although these approaches attain comparable accuracy to numerical methods, they are impractical since in general enforcing the BCs is as difficult as solving the original PDE. Following a different approach, the work in [14] utilizes a constrained backpropagation algorithm to enforce the initial and boundary conditions during training. In order to avoid solving a constrained training problem, the authors in [15] add the constraints corresponding to BCs to the objective as penalty terms. Similarly, in [16]
the authors focus on the solution of PDEs with high dimensions using a long shortterm memory architecture and prove a convergence result for the solution as the number of trainable parameters of the NN is increased. Although a large number of samples is required to train the model, this training data is sequentially provided to the NN. A similar approach is proposed in
[17] that utilizes the physical laws modeled by PDEs as regularizing terms to guide the learning process, while [18]uses the PDEs to define energy fields that are minimized to train CNNs to predict the PDE solutions at discrete sets of points. Alternatively, reinforcement learning can be used to train NNs that approximate the solutions of PDEs. For instance, in
[19] the actorcritic algorithm is used where the PDE residual acts as the critic.The literature discussed above uses the PDE residual as the loss function, which contains high order derivatives and trains over measurezero points in spacetime. Consequently, adequate learning of the PDE solution requires an extremely large number of training points. Moreover, these training points are selected arbitrarily and not based on their information content. Optimal selection of training points has been extensively studied in the context of active learning to alleviate the cost of labeling data in supervised learning. The work in [20] presents an extensive survey of this literature including various strategies to determine the most informative queries to be made to an oracle. Often samples with the highest ambiguity are preferred; this is closely related to informative planning methods in the robotics literature [21, 22] and optimal experiment design [23]. Here we propose an active learning method for unsupervised learning that relies on the feedback obtained from the PDE residual to optimally select the training points themselves instead of inquiring labels for them. This is closely related to mesh adaptivity methods in the FE literature where the underlying mesh is locally refined in regions with large error values [24].
IB Contributions
The contributions of this paper can be summarized as follows: (i) Compared to methods that use NNs to approximate the solutions of PDEs, we propose a novel loss function that relies on the variational form of the PDE. The advantages of this loss function, compared to existing approaches that use the PDE residual, are twofold. First, it contains lower order derivatives so that the solution of the PDE can be estimated more accurately. Note that it becomes progressively more difficult to estimate a function from its higher order derivatives since differential operators are agnostic to translation. Second, it utilizes the variational (integral) form of the PDE that considers segments of spacetime as opposed to single points and imposes fewer smoothness requirements on the solution. Note that variational formulations have been successfully used in the FE method for a long time
[25]. (ii) We propose a new way to optimally select the points needed to train the NNs that is informed by the feedback obtained from the PDE residual. Our optimal sampling method considerably improves the efficiency of the training and the accuracy of the resulting models. (iii) We develop the VarNet library [26] that uses the proposed deep learning framework to solve PDEs. We present thorough numerical experiments using this library to demonstrate our proposed approach. Particularly, we focus on the advectiondiffusion (AD) equation although the presented approach applies to any PDE that can be solved using the FE method. Note that discretizationbased solutions to the ADPDE often have stability issues for highly advective problems [27] which are magnified when the model is reduced. This is because this reduction typically amounts to loosing high frequency content of the original model. Through simulations we demonstrate that unlike traditional MOR methods, our approach does not suffer from such instability issues.The proposed loss function was first introduced and used to solve a robotic PDEconstrained optimal control problem in our short paper [3]. Compared to [3], here we introduce the VarNet library that employs the same loss function but is additionally equipped with the proposed adaptive method to optimally select the NN training points and can be used not only to solve PDEs but also for MOR. Finally, compared to [3], this paper presents extensive numerical experiments to validate the VarNet algorithm.
The remainder of this paper is organized as follows. In Section II we discuss the ADPDE and formulate the desired training problem. Section III is devoted to the solution of PDEs using NNs where we discuss the loss function, the optimal selection of the training points, and steps to solve an ADPDE using the VarNet library. We present numerical experiments in Section IV and finally, Section V concludes the paper.
Ii Problem Definition
Iia AdvectionDiffusion PDE
Let denote a domain of interest where is its dimension and let denote a location in this domain and
denote the time variable. Furthermore, consider a velocity vector field
and its corresponding diffusivity field . ^{1}^{1}1See [28] for more details on the relation between the velocity and diffusivity fields. Then, the transport of a quantity of interest , e.g., a chemical concentration, in this domain is described by the advectiondiffusion (AD) PDE [29](1) 
where denotes the time derivative of the concentration and is the timedependent source field.
Given an appropriate initial condition (IC) and a set of boundary conditions (BCs), it can be shown that the ADPDE (1) is wellposed and has a unique solution [30]. In this paper, we use the following IC and BCs ^{2}^{2}2Extending the results that follow to include more general BCs is straightforward and is skipped here for simplicity of presentation.:
(2a)  
(2b) 
where for denote the boundaries of and and possess appropriate regularity conditions [30]. Equation (2a) describes the initial state of the field at time whereas equation (2b) prescribes the field value along the boundary as a timevarying function.
We refer to the parameters that appear in the ADPDE (1), i.e., the velocity, diffusivity, and source fields as well as the IC and BCs, as the input data. Any of these fields can depend on secondary parameters which consequently means that the solution of the ADPDE will depend on those secondary parameters. For instance, in source identification problems, a parametric solution of the ADPDE is sought as the properties of the source term vary. This parametric solution is then used to solve a PDEconstrained optimization problem [2]. In general, secondary parameters can also appear in MOR of PDEs. In the following, we denote by the set of secondary parameters that the PDE inputdata depend on, where is the set of feasible values.
[Alternative PDEs] In this paper we focus on the ADPDE as an important casestudy that models a wide range of physical phenomena from mass transport to population dynamics [31]. Nevertheless, the method developed in this paper applies to any other PDE that can be solved using the FE method.
IiB Training Problem
Discretizationbased numerical methods, e.g., the FE method, can be used to approximate the solution of the ADPDE (1) given an appropriate set of boundaryinitial conditions (BICs) (2). As discussed earlier, these methods although very expressive, are computationally demanding to store and evaluate. Furthermore, it is not straightforward to obtain parametric solutions of PDEs using these methods. As a result, a wide range of MOR methods have been proposed to obtain such parametric solutions. These MOR methods typically rely on numerical methods to obtain the desired reduced models. Utilizing neural networks (NNs) to solve PDEs, allows us to directly obtain differentiable parametric solutions of PDEs that are computationally more efficient to evaluate and require considerably less memory to store.
Let denote the weights and biases of the NN, a total of trainable parameters. Then the solution of the ADPDE can be approximated by the nonlinear output function of this NN, where maps the spatiotemporal coordinates and to the scalar output field of the PDE, given a value for the secondary parameters . Note that the input of the NN consists of the coordinates and as well as the parameters and its dimension is . See [32] for a detailed introduction to NNs. In the following, we drop the arguments and whenever they are not explicitly needed.
[Training Problem] Given the PDE inputdata and the NN function with trainable parameters, find the optimal value such that
(3) 
where denotes the true solution of the PDE. As discussed in Section IA, Problem IIB can be solved in different ways. The simplest approach is to use supervised learning to obtain an approximate solution of the PDE using labeled data collected using numerical methods and a loss function defined by objective (3). In this approach, the NN acts merely as a memory cell and an interpolator to store the solution more efficiently than storing the values at all grid points. Note that this approach does not alleviate the need for discretizationbased numerical methods but instead it relies on them. Alternative methods exist that train the NN in an unsupervised way. A typical approach is to consider a loss function defined by the PDE residual obtained when the approximate solution is substituted in (1). This approach has two problems. First, for the ADPDE (1) it requires evaluation of second order derivatives of . Estimating a function from its higher order derivatives is inefficient since differentiation only retains slope information and is agnostic to translation. Second, training the differential form of the PDE amounts to learning a complicated field by only considering values at a limited set of points, i.e., a measurezero set, ignoring correlations in spacetime. In the following, we propose a different approach that addresses these issues by defining a loss function that relies on the variational form of the PDE.
Iii Neural Networks for Solution of PDEs
Iiia Variational Loss Function
The goal of Problem IIB is to learn the parameters of the NN so that for all values of , the function approximates the solution of the ADPDE (1) as well as possible. To capture this, we need to define a loss function that reflects how well the function approximates the solution of (1). To do so, we rely on the variational form of the ADPDE (1). Particularly, let be an arbitrary compactly supported test function. Multiplying (1) by and integrating over spatial and temporal coordinates we get
Performing integration by parts for we have
Note that since is compactly supported over the temporal coordinate. Similarly,
where again the boundary terms vanish since is compactly supported. Finally, note that for velocities far below the speed of sound the incompressibility assumption holds and ; see [33]. Putting together all of these pieces, we get the variational form of the ADPDE as
(4) 
This variational form only requires the firstorder spatial derivative and also an integration over a nonzero measure set as opposed to a single point. The test function acts as a weight on the PDE residual and the idea is that if (4) holds for a reasonable number of test functions with their compact supports located at different regions in spacetime , the function has to satisfy the PDE. A very important feature of the test function is that it is compactly supported. This allows local treatment of the PDE as opposed to considering the whole spacetime at once and is the basis of the FE method [25]. We discuss the explicit form of the test function in Appendix AA.
Given the variational form (4), we can now define the desired loss function. Consider a set of test functions sampling the spacetime , a set of points corresponding to the IC, and sets of points for the enforcement of the BCs. Then, for a given set of PDE inputdata, specified by , we define the loss function as
(5) 
where is defined by (4), stores the penalty weights corresponding to each term, and is the total number of training points for the BCs. Note that in the first term in (IIIA), the integration is limited to the support of which is computationally very advantageous. In the next terms, normalizing by makes the weights independent of the number of training points.
Next, consider a set of points sampling the space of the secondary parameters. Then, integrating (IIIA) over the secondary parameters we obtain the desired total loss function as
(6) 
This loss function is lowerbounded by zero. Since this bound is attainable for the exact solution of the ADPDE (1), the value of the loss function is an indicator of how well the NN approximates the solution of the PDE. Training using loss function (6) is an instance of unsupervised learning since the solution is not learned from labeled data. Instead, the training data here are unlabeled samples of spacetime and the physics captured by the ADPDE (1) guides learning of the parameters . In that sense, our approach is a modelbased method as opposed to a merely statistical method that automatically extracts features and cannot be easily interpreted.
[Numerical Computation] The derivatives that appear in the variational form (4) can be evaluated in a variety of ways. Here we use the automatic differentiation method available in TensorFlow. Furthermore, to evaluate the integral over the support of the test functions, we utilize the GaussLegendre quadrature rule as is common in the FE literature [25]. See Appendix AB for more details on the computation of the loss function (6).
IiiB Optimal Training Points
As discussed in Section IA, existing NN methods to solve PDEs use arbitrary training points. Instead, here we optimally select the training points over the spacetime to expedite the training process and increase the accuracy of the resulting solutions. This is very similar to the idea of local adaptive mesh refinement in the FE literature [24]. Note that often the PDE solution varies more severely at certain regions of the spacetime and adding more training points at those regions ensures that the NN adequately captures these variations. In order to prioritize more informative training points, we employ the residual field of the PDE. For a given set of input data specified by , plugging the NN function into the PDE (1), we obtain the PDE residual field as
(7) 
Given this metric, we sample spacetime using the rejection sampling algorithm [34] to generate training points that are proportionally more concentrated in regions with higher residual values. To this end, let denote the normalized metric defining a distribution over the spacetime. Furthermore, let
denote a uniform distribution over the spacetime from which we draw the candidate training points where
is a constant. Also, let be another constant such that over the spacetime. Then, for a given candidate point, the rejection sampling algorithm consists of drawing a random variable
and accepting the candidate point if Define Then, the acceptance condition simplifies to(8) 
Note that according to (8), candidate points that are closer to regions with higher PDE residual values have higher chances of being selected. We approximately determine by sampling the PDE residual over a uniform grid.
[BoundaryInitial Conditions] A similar procedure is utilized for optimal selection of training points for the BICs. Particularly, we replace metric (7) with the corresponding leastsquare error values between the prescribed IC and BCs and the NN output as given by the second and third terms in the loss function (IIIA).
[Secondary Parameters] We take into account the effect of the secondary parameters in sampling from the spacetime, by integrating the residual field over them. Note also that a similar approach can be used to also optimally select the samples of the secondary parameters.
IiiC VarNet Deep Learning Library
The proposed VarNet algorithm combines the loss function (6) with the optimal selection method for the training points to approximate the solution of ADPDEs.
The algorithm begins by requiring the properties of the spatial domain and the time horizon as well as the input data to the ADPDE, i.e., the velocity and corresponding diffusivity fields, the source field, and the BICs, as a function of spatial and temporal coordinates and possibly secondary parameters . These inputs define an ADPDE instance that is implemented in the VarNet library using the ADPDE(domain, diff, vel, source, tInterval, IC, BCs, MORvar) class where depending on the dimension the spatial domain , domain is defined by an instance of Domain1D or PolygonDomain2D class, and MORvar is an instance of MOR class for parametric problems. Next, the VarNet Algorithm 1
requires the width of the layers of the multilayer perceptron (MLP) NN, used to capture the solution of the ADPDE, as well as the number of training points
. This information defines the training problem as an instance of VarNet(layerWidth, discNum, bDiscNum, tDiscNum) class. Finally, the number of training epochs as well as the penalty weights for the terms appearing in the loss function (IIIA) must be given.Given the number of training points, in line 1, the algorithm generates a uniform grid over spacetime, its boundaries, and possibly the space of secondary parameters . The training process begins in line 2 where in each epoch the algorithm iterates through all training data. The loop over the samples of the secondary parameters is performed in line 3. Training on a subset of data, corresponding to one value of at a time, amounts to the idea of batchoptimization and computationally facilitates the inclusion of arbitrarily large number of samples of the secondary parameters by breaking down the training points into smaller batches [32]. In line 4, the algorithm performs the optimization iteration for at epoch , updating the parameters of the NN for all training samples of the spacetime and a given set of PDE inputdata. A variety of optimization algorithms, included in TensorFlow, can be used for this purpose; see [32] for details. In line 6, the algorithm checks the convergence of the training process and in line 7 it adds new optimal training points over the spacetime according to the procedure outlined in Section IIIB. The number of these new training points is given as a fraction of the original numbers . The training process is performed in the member function VarNet.train(epochNum, weight, frac, batchNum).
The VarNet library [26] implements Algorithm 1 and contains additional functionalities including data parallelism and extensive tools for report generation during training and postprocessing of the trained NNs. In Appendix A we present more details on the implementation of Algorithm 1. For more details on the VarNet library [26] and its functionalities refer to the code documentation.
[Convergence] By the universal approximation theorem, for a large enough number of trainable parameters , the NN can approximate any smooth function [35]. Given that the ADPDE (1) has a unique smooth solution for an appropriate set of input data, the NN should converge to this solution when ; see [16] for details.
[Parallelization] Referring to Algorithm 1, parallelization of the training process is trivial. We can choose the number of samples in accordance with the available computational resources and decompose and assign the summations to different processing units.
Iv Numerical Experiments
In this section we present numerical experiments to demonstrate the performance of the VarNet Algorithm 1. Particularly, we study four different types of problems with an increasing order of difficulty: (i) 1D and 2D timedependent AD problems with analytical solutions that are used as benchmarks in the FE literature. For highly advective cases with large Peclet numbers,^{3}^{3}3Peclet number is a measure of the relative dominancy of advection over diffusion and is defined as Pe = , where is a characteristic length of , is a characteristic velocity, and is the average diffusivity of the medium. a boundary layer is formed whose prediction becomes very challenging for discretizationbased numerical methods. The objective here is to demonstrate the capability of our algorithm to solve problems that require very fine grids when solved by these methods. These two problems have constant inputdata which often is the only case that is studied in the relevant literature. (ii) A 2DAD problem with an analytical solution and nonconstant input data. This is a fabricated problem where we find the source field that generates a desired concentration field given a set of input data. This is a standard method used to check the convergence of FE methods. (iii) A 2DAD problem with compactly supported sources. In this case, we cannot obtain an analytical solution for the ADPDE and only an approximation from numerical methods, e.g., the FE method, is available. The objective here is to examine the effect of localized source terms. (iv) A 2DAD problem whose input data correspond to realistic mass transport phenomena in turbulent flows. In fluid dynamics applications often the input data are obtained for turbulent conditions and are nonconstant. An analytical solution is not an option then. This is the most general case that one encounters in using the ADPDE.
We solve all cases using the VarNet library [26] that was developed as part of this paper. This library is written in Python using the TensorFlow module; see Appendix A for details about its implementation and the code documentation [26] for more details on its capabilities. We use the ADAM stochastic optimization algorithm to train the NNs [36]. The VarNet Algorithm 1 is run on a single NVIDIA GeForce RTX 2080 Ti processor for each problem instance. In each case, we use a multilayer perceptron (MLP) to capture the solution of the ADPDE and unless otherwise specified, we use a sigmoidactivation function for all neurons. For a given uniform grid over the spacetime, we compute the approximation error as
(9) 
where the vectors stack the outputs of the NN and the exact solution evaluated at the grid points, respectively. We use an analytical solution or the FE approximate solution to compute the vector .^{4}^{4}4When the reported error values are biased against the NN solution due to, e.g., error in analytical solution, we add an asterisk to the reported value. To generate the required FE models, we use an inhouse code based on the DiffPack library [37]. We set the fraction of the optimal training points in VarNet Algorithm 1 to and use shorthand notation ‘1Dt’ to denote a onedimensional timedependent ADPDE and so on for other cases. When reporting the number of training points, we may only give . The number of training points for BICs is then deduced from . For instance if for a 2Dt problem, then the first entry refers to discretization of temporal coordinate and the last two entries refer to discretization of spatial coordinates and thus, and , where and is the length of boundary .
Iva 1Dt ADPDE with Analytical Solution
In this section we study the performance of the VarNet Algorithm 1 for the benchmark problem presented in [38]. This problem is defined for as
(10) 
Similar to [38], we fix the velocity and study the performance of our approach for two diffusivity values . For the latter value, the analytical solution becomes numerically unstable.
Figure 1 shows the solution, provided by the VarNet Algorithm 1 for overlaid on the analytical solution. The final error (9) is err .
We use a single layer MLP with sigmoid activation, neurons in the layer, training points, and set the weights to . Note that the number of temporal training points should be sufficiently large to ensure that the dimensionless Courant number defined as
(11) 
is upperbounded and guarantee the stability of the solution across time [39]. An animation of the approximate solution for the case of is compared to the analytical solution in [40]. Note that the solution is continuous both in space and time and does not require interpolation to generate this video.
As mentioned earlier, one of the important contributions of this paper is the novel loss function that is based on the variational form of the ADPDE as opposed to its differential form. To demonstrate the effectiveness of our loss function, we solve the case of using the PDE residual (7) as well; see also the discussion pursuant to Problem IIB. Using the same architecture and settings with training points, which is equivalent to the number of integration points used above, the final error is err . After a set of simulations, the best result corresponding to and has an error of err . An animation of the approximate solution for this case is compared to the analytical solution in [41]. Observe that there is considerable error in capturing the solution and its BICs when the PDE residual is used, although the number of training points is considerably larger.
Next, we consider the case of which corresponds to an order of magnitude higher Peclet number and is much more challenging to solve. Figure 2 shows the snapshots of the solution from a MLP with two layers with neurons in the layers amounting to trainable parameters. We train the NN using training points and set the weights to .
As mentioned earlier, for the analytical solution is numerically unstable and we only compare to a set of point values reported in [38]. The final error comparing to these point values is err* . Note that this error value is conservative since it only considers the boundary layer, which is the most challenging to approximate, as opposed to the whole spacetime. Discretizationbased methods have difficulty capturing the boundary layer at and require an extremely fine grid and consequently a large model [38], whereas our approach can capture the solution with only trainable parameters. An animation of the approximate solution for this case is given in [42].
Figure 3 shows the evolution of the individual terms and total value of the loss function (IIIA), the evolution of the PDE residual (7), and the corresponding final residual snapshots for the case of .
Referring to Figure 2(a), note that the penalty weights on the BICs in (IIIA) must be selected as high as possible without making the variational term insignificant. Using small weights on BICs will result in inaccurate solutions since the ADPDE does not have a unique solution without properly enforced BICs. For a properly selected set of weights , values of the BIC terms in the loss function (IIIA) drop considerably fast to small values while the value of the variational term does not. This amounts to an increase in the PDE residual (7) at the early training epochs, as can be seen from Figure 2(b). Once the BICs are approximately enforced, the contribution of the BIC terms in the loss function (IIIA) decreases and the variational term starts to drop and the output of the NN converges to the unique solution of the 1Dt ADPDE (IVA); this amounts to a drop in the PDE residual at the later epochs; see Figure 2(b). The sharp increase around epoch corresponds to the addition of the optimal training points. Furthermore, from Figure 2(c) it can be observed that the value of residual is considerably small across spacetime except for which corresponds to the boundary layer.
Table I shows the performance of the VarNet Algorithm 1, in solving the AD problem (IVA) with diffusivity , for different activation functions and network capacities where the number of training points and the weights are tuned for the best performance in each case.
No.  activation  optimal  err*  
1  tanh  no  0.83  
2  sigmoid  no  0.49  
3  sigmoid  no  0.44  
4  sigmoid  yes  0.42  
5  sigmoid  scaled  0.39  
6  sigmoid  no  0.08  
7  sigmoid  yes  0.09  
8  sigmoid  no  0.09  
9  sigmoid  no  0.08 
Specifically, we examine three MLP structures with neurons in the layers. As mentioned before, and can be deduced from and are not reported. Comparing the first two cases, observe that tanh activation is incapable of capturing the solution. This behavior is observed in other occasions as well and thus, in the rest of simulations we only use sigmoid activation. The next three cases study the effect of using optimal training points, as proposed in Section IIIB, against uniform selection of the training points. More specifically, case 3 reports the error values for number of training points that is equivalent to addition of optimal points in cases 4 and 5. In case 5, the support of test functions used at optimal training points is scaled down by a factor of 10 at each dimension decreasing the error further. This is motivated by the argument in Section IIIB that the optimal training points should be placed in regions of high variability. Decreasing the support allows the NN to better capture the curvature at the boundary layer. The white dots in Figure 4 show the optimally selected training points for this case overlaid on the PDE residual field (7).
Note that as expected, most of the optimal training points are placed close to where the boundary layer is formed. This local refinement decreases the final error without having to uniformly increase the number of training points across spacetime. Further increasing the number of trainable parameters and the number of training points dramatically decreases the error as it can be seen in case 6. Note however that increasing without increasing unstabilizes the training process. Also the increase in the number of training points across temporal and spatial coordinates needs to be done proportionally to ensure that the Courant number (11) stays upperbounded. Comparing case 7 to case 6, observe that once the number of training points is increased enough so that the whole spacetime is being adequately sampled, addition of optimal training points is not beneficial. Finally considering cases 9 and 8, observe that further increase of the network capacity does not seem to decrease the error. Referring to Figure 2 note that the error values reported in Table I effectively capture the error in boundary layer . To capture this layer even better, the number of spatial samples and thus, to maintain the Courant number, the number of temporal samples need to be considerably increased.
Finally, we study the performance of the VarNet algorithm 1 for MOR. Specifically, we train NNs that parametrically solve the ADPDE (IVA) as a function of diffusivity for , which amounts to a range of more than one order of magnitude. We use geometric samples for training. Table II shows the final errors for diffusivity values other than the ones used for training, and for different numbers of training points.
No.  err  

1  0.33*  
2  0.40*  
3  0.14  
4  0.09*  
5  0.15*  
6  0.00 
For the two smaller values of diffusivity, the analytical solution is unstable and only the point values reported in [38] are compared which results in biased (larger) error values for these cases. Note that the parametric solutions are as good as the ones reported for individual diffusivity values above, meaning that MOR has no major effect in the solution predicted by the NNs. This is in contrast to traditional MOR methods that often introduce a large error in the estimation. Note also that the VarNet Algorithm 1 is able to capture the solution of the ADPDE (IVA) by only trainable parameters. The continuous, parametric representation of the solution provides valuable tools for analysis of dynamical systems. For instance, we can study the sensitivity of the solution of the AD problem (IVA) to the change of diffusivity of the medium. Figure 5 shows the solution snapshots at as a function of .
Note that as decreases, the peak value is better preserved and carried downstream with less diffusion and as a result, the boundary layer becomes sharper.
IvB 2Dt ADPDE with Analytical Solution
In this section we consider a 2D timedependent AD problem with an analytical solution presented in [43]; this problem is used as a benchmark to evaluate the FE methods proposed in [44]. Specifically, a constant, onedirectional flow with is considered and we adjust the constant diffusivity to study the effect of the Peclet number on the solution. We consider a convex 2D domain where we assign a constant, nonzero Dirichlet condition to boundary 2 on the left side of the domain, i.e., ; see Figure 6.
In other words, the concentration on the boundary is transported downstream via advection and in the transverse direction via diffusion and at any given time , a steplike concentration front exists at that travels with a speed of . At there is no concentration in the domain and we assume the boundaries are far enough so that they stay concentrationfree for the whole time, i.e., in (2) we set and . Noting that the value of is immaterial as long as it is set large enough, we use .
In Table III we report the error values compared to the analytical solution provided in [43] for different diffusivity values where we set weights .
No.  optimal  err  
1  no  0.24  
2  no  0.20  
3  yes  0.16  
4  scaled  0.18  
5  no  0.15  
6  no  0.04  
7  no  0.07* 
Comparing the first 6 cases, observe that generally simultaneous addition of more training points and increasing of the capacity of the NN, i.e., the number of the trainable parameters , lead to a decrease in the error value. Comparing cases 2, 3, and 4, it can be seen that using optimal training points results in a smaller error than a uniform grid with the same number of training points. Note that since in this problem, unlike the 1Dt AD problem (IVA), the high variations are not concentrated at the boundary, scaling the support of optimal test functions in case 4 seems to have an adverse effect on the final error. Case 7 shows the performance of the proposed method when the Peclet number is increased by three orders of magnitude. Note that this increase has no effect in the performance of the VarNet Algorithm 1. This is in contrast to FE methods which are known to become unstable for higher Peclet numbers [45]. Figure 7 shows the snapshots of the concentration field corresponding to case 7. Note the traveling steplike concentration front.
IvC 2D ADPDE with Analytical Solution and Field InputData
The problems that we have studied so far have constant input data, i.e., the diffusivity and velocity fields as well as the BICs are constant and the source function is set to zero. Often, this is the only case studied in the relevant literature. When the PDE inputdata are not constant, it is often impossible to obtain analytical solutions. In this section we construct a problem with nonconstant input data and an analytical solution. Specifically, we start with concentration, diffusivity, and velocity fields, and use the ADPDE (1) to derive a corresponding source that will create this desired steadystate concentration field. Specifically, we consider a domain with the following concentration field
(12) 
where the integer adjusts the number of zerocrossing of the trigonometric function in the field. The larger , the more complicated the resulting field is. Plugging this field into the ADPDE (1), we obtain the corresponding source field that will generate the concentration field (12) for a given set of PDE inputdata. Note that this concentration field attains a value of zero on the boundaries so we use zerovalued Dirichlet BCs in (2b). Since the problem is in steady state, the IC (2a) is irrelevant here.
In Table IV, we report the performance of the VarNet Algorithm 1 in solving this problem instance where we set the weights for the reported results.
No.  err  
1  0.03  
2  0.04  
3  0.03  
4  0.00  
5  field (13b)  0.04  
6  field (13a)  field (13b)  0.03 
Note that since the AD PDEs studied here and the following sections are timeindependent, in (IIIA) is trivially set to zero and is not reported. In the first four cases in Table IV, and we use constant diffusivity and velocity fields, as reported in columns four and five. A single layer NN with 20 neurons can easily capture the concentration field (12) in these cases regardless of the Peclet number. Comparing the first two cases, observe that increasing the number of training points does not improve the error. The reason could be that the current network capacity is insufficient to better capture the concentration field (12). In cases 5 and 6, we use which results in the concentration field depicted in Figure 8(a). More specifically, in case 5 we use a nonconstant velocity field and in case 6 we also use a nonconstant diffusivity field. These two fields are defined as
(13a)  
(13b) 
where is the maximum diffusivity value in the domain which we set to . These two fields are depicted in Figures 8(b) and 8(c), respectively. Figure 8(d) shows the resulting source field corresponding to case 6.
The VarNet Algorithm 1 successfully solves these two cases using only a single layer with trainable parameters.
IvD 2D ADPDE with Compactly Supported Source
The source field used in the previous section is not compactly supported; see Figure 8(d). Often in practice, the source term in ADPDE (1) is compactly supported and occupies a very small part of the domain. Existence of such source terms translates to a more localized and nonlinear concentration field that is more challenging to capture. To investigate this point further, in this section we solve the ADPDE (1) over a convex domain with constant input date , , and a set of Dirichlet BCs for two localized sources. Specifically, we first consider a Gaussian source given by
where is the source intensity, is the source center and is its characteristic length. This source technically is not compactly supported but for practical purposes it decays to almost zero beyond distance from the center . Because of its smoothness, the corresponding concentration field is easier to solve for. We also consider a compactly supported tower source function given by
(14) 
where delineates the support and is set to for the following results. For consistency, we define a characteristic length for this source as the sidelength of its support. We adjust the source intensity for both sources to ensure that the resultant maximum concentration levels are close to unit. Note that for these arbitrary sources, we cannot solve the ADPDE (1) analytically. Instead, we use the FE solution as the groundtruth to calculate the error values according to (9).
In Table V, we report the performance of the VarNet Algorithm 1 in solving the ADPDE (1) for these localized sources.
No.  source  err  
1  Gaussian  0.00  
2  Gaussian  0.02  
3  Gaussian  0.02  
4  tower  0.17 
Comparing the first three cases, observe that for smooth Gaussian sources, our proposed algorithm is capable of obtaining solutions identical to the FE method with a single layer MLP with 20 neurons. We set the weights to for this case. Nevertheless, the problem is more challenging when compactly supported tower source is used. Using a larger network with three layers amounting to trainable parameters, training points, and setting the weights to , the algorithm solves the problem with err . Figure 10 shows the NN and FE approximations of the concentration field corresponding to case 4.
Observe that the general shape of the concentration field is recovered almost exactly but the peak value is not. Obviously, using larger network capacities and number of training points would decrease the error.
IvE 2D ADPDE with Turbulent Input Data
So far in the simulations, we have focused on simple flow fields that do not often occur in realistic problems. For instance, when ADPDE (1) is used to model mass transport in ‘air’, even in very small velocities the resulting flow field is turbulent [22], causing an excessive amount of mixing due to instantaneous changes in velocity vector. This often is captured with an equivalent turbulent diffusivity that is added to the laminar diffusivity of the medium; see [2] for more details. In the following, we consider velocity and diffusivity fields that are obtained using computational fluid dynamics (CFD) methods via ANSYSFluent. Specifically consider the problem that we studied in Section IVB with the domain depicted in Figure 6 but in steadystate. We assume that air flows into this domain through boundary 2 with a velocity of 1m/s and leaves the domain through the opposite side. The corresponding velocity magnitude and diffusivity fields are given in Figure 11.
Similar to Section IVB, we assume a constant Dirichlet BC on boundary 2, i.e., we set .^{5}^{5}5The concentration unit is arbitrary. The rest of the BCs are set to zero.
First, we study the case of fixed diffusivity field by setting and using the nonconstant velocity field of Figure 10(a). For this case, we use a MLP with three layers and neurons in the layers which amounts to trainable parameters. We set and use training points. The approximation error for this case is err . Figure 12 shows the approximate solutions obtained from VarNet Algorithm 1 and the FE method, respectively.
Figure 13 shows the loss value as a function of training epochs.
Note that for a large number of epochs the iterations get trapped in a local minimum after an initial sharp decrease and then start to drop again. In fact when the maximum number of epochs is reached, the solution is still improving. As more realistic, nonconstant fields are used, the training process becomes more challenging since the loss function is more nonlinear. In such cases, larger number of epochs might be necessary to ensure convergence to an acceptable local minimum. Regular shuffling of the training data can also help with escaping such local minima.
Next, we consider the nonconstant diffusivity field given in Figure 10(b). This results in an even more nonlinear loss function. To solve the ADPDE (1) in this case, we use a MLP with three layers and neurons in the layers which amounts to trainable parameters. We set and use training points. The resulting approximation error for this case is err . Note that similar to Figure 10 in Section IVD, the shape of the concentration field is reconstructed here but the peak value is inaccurate. Throughout the simulations, we have used very small MLP networks with small capacities. Using larger networks along with larger number of training points and training for longer periods will further improve the solutions. This is particularly true for this case since the loss curves obviously show that the NN is still improving when the maximum number of epochs is reached.
V Conclusion
In this paper we proposed a new modelbased unsupervised learning method, called VarNet , for the solution of partial differential equations (PDEs) using deep neural networks (NNs). Particularly, we proposed a novel loss function that relies on the variational (integral) form of PDEs as apposed to their differential form which is commonly used in the literature. Our loss function is discretizationfree, highly parallelizable, and more effective in capturing the solution of PDEs since it employs lowerorder derivatives and trains over measure nonzero regions of spacetime. Given this loss function, we also proposed an approach to optimally select the spacetime samples, used to train the NN, that is based on the feedback provided from the PDE residual. The models obtained using our algorithm are smooth and do not require interpolation. They are also easily differentiable and can directly be used for control and optimization of PDEs. Finally, the VarNet algorithm can straightforwardly incorporate parametric PDE models making it a natural tool for model order reduction of PDEs. We demonstrated the performance of our method through extensive numerical experiments for the advectiondiffusion PDE as an important casestudy.
Appendix A Variational Loss Function
Aa Test Functions
As discussed in Section IIIA, the only requirement for the test function is that it must be compactly supported. Most of the FE shape functions satisfy this requirement and can be used with our proposed algorithm; see [25]. Often in FE schemes standard elements are defined in the so called mathematical domain and then we can map these elements to the actual elements in the physical domain under isotropy assumption. In this paper we use a simple 1D standard element with linear shape functions as the building block to construct the desired test functions. Particularly, let denote a generalized coordinate in the mathematical domain, that can be mapped to a spatial or temporal coordinate in the physical domain later on, and define the linear basis functions given by
(15) 
Figure 14 shows the variation of these two shape functions along the length of the 1D element.
Given this 1D element, corresponding standard, higherdimensional elements can easily be constructed. Let denote the desired dimension of these elements where for timedependent problems and otherwise. Furthermore, let denote the coordinate along the th dimension, denote the corresponding vector of coordinates, and the multiindex specify one of the corners of the higherdimensional element where . Given this multiindex, the shape function corresponding to the respective corner of the element is defined by the multiplication of the 1D shape functions (15) as
(16) 
For simplicity of presentation, let the augmented coordinate be defined as for timedependent problems and , otherwise. Then, we denote the shape functions in the physical domain by . Assuming isotropic elements, we can write the coordinates in the physical domain in terms of the coordinates in the mathematical domain as
(17) 
In words, given the coordinates in the mathematical domain, the corresponding th coordinate in the physical domain is given as a convex combination of the th coordinates of the corners of the element, specified by the multiindex . This mapping from mathematical to physical domain, implicitly defines the shape functions as
Without loss of generality, we use cubical elements so that this mapping amounts to simple scalings of the coordinates where . Note that this simple choice of elements considerably facilitates the implementation of the VarNet Algorithm 1. Nevertheless, the type, shape, and size of the elements are arbitrary and can be as general as the ones used in different FE schemes [25].
Given the above shape functions, we are ready to define the test functions that we use in the loss function (IIIA). Specifically, given a training point in the spacetime, we consider a test function centered at that point that belongs to a set of elements that share the point. Then, we define the desired test function as over each of these elements where the multiindex corresponds to the shared corner of that element. As an example, the test function for the 1D (physical) domain corresponding to Figure 14, centered at the training point with element scaling of , is depicted in Figure 15.
Over element 1, the test function is defined as whereas over element 2, it is defined as . Note that the support of is obtained by patching together elements 1 and 2 and is compact. During training, we arbitrarily select the location of the test functions over the spacetime while fixing the shape and size of the elements. As mentioned earlier, this assumption can be relaxed at the expense of more computational cost.
AB Numerical Computation of Loss Function
In order to compute the loss function (IIIA), we need to compute the variational form (4) which in turn requires the calculation of the derivatives of the test function . To this end, we first derive the derivatives of the shape function for one of the elements within the support of . Then, the desired derivatives of are obtained from the derivatives of these shape functions over each element. The decomposed form of the shape function (16) makes its differentiation straightforward. Specifically, using (15) for coordinate in the mathematical domain we have
(18) 
where minus sign corresponds to . Let the matrix collect these derivatives where each column corresponds to one shape function, specified by the multiindex , and is equal to the gradient whose entries are given by (18). Furthermore, define the Jacobian matrix whose column is the gradient of the coordinate with respect to the mathematical coordinates. Let also the matrix collect in its column , the th coordinate of the corners of the element in the order specified by the multiindex . Then, we have
This expression can be checked by direct differentiation of (17). For the cubical elements that we use in the physical domain, this Jacobian matrix is given explicitly by
(19) 
where denotes the scaling along coordinate from mathematical to physical domain. Finally, let the matrix collect the desired derivatives , with respect to the coordinates , in its columns. Then, we have
The FE class in the VarNet tool [26] computes the test functions, presented in the previous section, as well as their derivatives as discussed above.
The standard elements, defined in the mathematical domain, make the numerical computation of the variational form (4) and the loss function (IIIA) very efficient. This is because the GaussLegendre quadrature rules can be used for integration over these elements. Specifically, let denote the number of integration points for the 1D element introduced in Section AA, denote the values of the coordinate at these integration points for , and denote the corresponding weights for a 1D point quadrature rule. For instance for , we have and . Let denote the integrand in (4). Given the 1D integration point information, we can approximate the variational form of the PDE (4) by a dimensional quadrature rule as
where denotes the determinant of the Jacobian matrix (19) which is constant for the cubical elements used here, denotes the compact support of the test function , and for . For small element sizes, the integrand can accurately be estimated by low order polynomials making the numerical integration very accurate. Finally, note that the integrand involves the PDE input data that need to be evaluated at the integration points. Since these integration points are defined in the mathematical coordinates, we map them to the physical domain using (17).
References
 [1] W. F. Ames, Numerical methods for partial differential equations. Academic press, 2014.
 [2] R. Khodayimehr, W. Aquino, and M. M. Zavlanos, “Modelbased active source identification in complex environments,” IEEE Transactions on Robotics, 2019.
 [3] R. Khodayimehr and M. M. Zavlanos, “Deep learning for robotic mass transport cloaking,” IEEE Transactions on Robotics, 2018. (submitted). [Online]. Available: https://arxiv.org/pdf/1812.04157.pdf.
 [4] N. Yadav, A. Yadav, and M. Kumar, An introduction to Neural Network methods for differential equations. Springer, 2015.
 [5] Y. Khoo, J. Lu, and L. Ying, “Solving parametric PDE problems with artificial Neural Networks,” arXiv preprint arXiv:1707.03351, 2017.

[6]
X. Guo, W. Li, and F. Iorio, “Convolutional Neural Networks for steady flow approximation,” in
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 481–490, ACM, 2016.  [7] Z. Long, Y. Lu, X. Ma, and B. Dong, “PDENet: Learning PDEs from data,” arXiv preprint arXiv:1710.09668, 2017.

[8]
N. MaiDuy and T. TranCong, “Numerical solution of differential equations using multiquadric radial basis function networks,”
Neural Networks, vol. 14, no. 2, pp. 185–199, 2001.  [9] J. Takeuchi and Y. Kosugi, “Neural Network representation of Finite Element method,” Neural Networks, vol. 7, no. 2, pp. 389–395, 1994.
 [10] C. Xu, C. Wang, F. Ji, and X. Yuan, “FiniteElement Neural Networkbased solving 3D differential equations in MFL,” IEEE Transactions on Magnetics, vol. 48, no. 12, pp. 4747–4756, 2012.
 [11] P. Ramuhalli, L. Udpa, and S. S. Udpa, “FiniteElement Neural Networks for solving differential equations,” IEEE Transactions on Neural Networks, vol. 16, no. 6, pp. 1381–1392, 2005.
 [12] I. E. Lagaris, A. Likas, and D. I. Fotiadis, “Artificial Neural Networks for solving ordinary and partial differential equations,” IEEE Transactions on Neural Networks, vol. 9, no. 5, pp. 987–1000, 1998.
 [13] V. Avrutskiy, “Neural Networks catching up with finite differences in solving partial differential equations in higher dimensions,” arXiv preprint arXiv:1712.05067, 2017.
 [14] K. Rudd, Solving Partial Differential Equations Using Artificial Neural Networks. PhD thesis, Duke University, 2013.

[15]
Y. Shirvany, M. Hayati, and R. Moradian, “Multilayer perceptron Neural Networks with novel unsupervised training method for numerical solution of the partial differential equations,”
Applied Soft Computing, vol. 9, no. 1, pp. 20–29, 2009.  [16] J. Sirignano and K. Spiliopoulos, “DGM: A deep learning algorithm for solving partial differential equations,” arXiv preprint:1708.07469, 2017.
 [17] M. Raissi, P. Perdikaris, and G. Karniadakis, “Physicsinformed Neural Networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” Journal of Computational Physics, vol. 378, pp. 686–707, 2019.
 [18] Y. Zhu, N. Zabaras, P.S. Koutsourelakis, and P. Perdikaris, “Physicsconstrained deep learning for highdimensional surrogate modeling and uncertainty quantification without labeled data,” arXiv preprint arXiv:1901.06314, 2019.
 [19] S. Wei, X. Jin, and H. Li, “General solutions for nonlinear differential equations: a deep reinforcement learning approach,” arXiv preprint arXiv:1805.07297, 2018.
 [20] B. Settles, “Active learning literature survey,” computer sciences technical report, University of WisconsinMadison, 2009.
 [21] R. Khodayimehr, Y. Kantaros, and M. M. Zavlanos, “Distributed state estimation using intermittently connected robot networks,” IEEE Transactions on Robotics, 2019.
 [22] R. Khodayimehr and M. M. Zavlanos, “Modelbased learning of turbulent flows using mobile robots,” IEEE Transactions on Robotics, 2018. (submitted). [Online]. Available: https://arxiv.org/pdf/1812.03894.pdf.
 [23] D. Ucinski, Optimal measurement methods for distributed parameter system identification. CRC Press, 2004.
 [24] T. J. Hughes, J. A. Cottrell, and Y. Bazilevs, “Isogeometric analysis: CAD, Finite Elements, NURBS, exact geometry and mesh refinement,” Computer methods in applied mechanics and engineering, vol. 194, no. 3941, pp. 4135–4195, 2005.
 [25] T. J. Hughes, The Finite Element method: linear static and dynamic finite element analysis. Courier Corporation, 2012.
 [26] “VarNet deep learning library for the solution of partial differential equations.” https://github.com/RizaXudayi/VarNet.
 [27] T. J. Hughes and G. N. Wells, “Conservation properties for the Galerkin and stabilised forms of the advection–diffusion and incompressible Navier–Stokes equations,” Computer methods in applied mechanics and engineering, vol. 194, no. 911, pp. 1141–1159, 2005.
 [28] P. J. Roberts and D. R. Webster, Turbulent diffusion. ASCE Press, Reston, Virginia, 2002.
 [29] W. Hundsdorfer, “Numerical solution of advectiondiffusionreaction equations,” Centrum voor Wiskunde en Informatica, vol. 24, pp. 30–41, 1996.
 [30] B. D. Reddy, Introductory functional analysis: with applications to boundary value problems and finite elements, vol. 27. Springer, 2013.
 [31] C. Kuttler, “Reactiondiffusion equations with applications,” 2011.
 [32] F. Chollet, Deep learning with Python. Manning Publications Co., 2017.
 [33] G. K. Batchelor, An introduction to fluid dynamics. Cambridge university press, 2000.
 [34] C. P. Robert, G. Casella, and G. Casella, Introducing Monte Carlo methods with R, vol. 18. Springer, 2010.
 [35] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.
 [36] D. P. Kingma and J. Ba, “ADAM: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
 [37] H. P. Langtangen, Computational partial differential equations: numerical methods and Diffpack programming, vol. 2. Springer Science & Business Media, 2013.
 [38] A. Mojtabi and M. O. Deville, “Onedimensional linear AdvectionDiffusion equation: Analytical and Finite Element solutions,” Computers & Fluids, vol. 107, pp. 189–195, 2015.
 [39] J. Donea and A. Huerta, Finite Element methods for flow problems. John Wiley & Sons, 2003.
 [40] “1D timedependent AdvectionDiffusion PDE  low Pe number.” https://vimeo.com/328756962.
 [41] “1D timedependent AdvectionDiffusion PDE  low Pe number  residual loss function.” https://vimeo.com/350138714.
 [42] “1D timedependent AdvectionDiffusion PDE  high Pe number.” https://vimeo.com/328759472.
 [43] F. J. Leij and J. Dane, “Analytical solution of the onedimensional advection equation and two or threedimensional dispersion equation,” Water Resources Research, vol. 26, pp. 1475–1482, 07 1990.
 [44] P. Siegel, R. Mosé, P. Ackerer, and J. Jaffré, “Solution of the advection–diffusion equation using a combination of discontinuous and mixed finite elements,” International journal for numerical methods in fluids, vol. 24, no. 6, pp. 595–613, 1997.
 [45] A. N. Brooks and T. J. Hughes, “Streamline upwind/PetrovGalerkin formulations for convection dominated flows with particular emphasis on the incompressible navierstokes equations,” Computer methods in applied mechanics and engineering, vol. 32, no. 13, pp. 199–259, 1982.
 [46] “2D timedependent AdvectionDiffusion PDE.” https://vimeo.com/350221337.
Comments
There are no comments yet.