1. Introduction
In recent times deep neural networks, ([8]
), have become the method of choice in solving state of the art machine learning problems, such as classification, clustering, pattern recognition, and prediction with enormous impact in many applied areas. There is also an increasing trend in scientific computing to take advantage of the potential of DNNs as nonlinear approximation tool, (
[4]). This goes both for using DNNs in devising new approximation algorithms as well as for trying to develop mathematical theories that explain and quantify the ability of DNNs as universal approximation methodology, with results originating in ([6]) to many recent works, especially in the area of convolutional NN, ([16], [21], [7]). At any rate, this is still a very active area of research with no ultimate theoretical result available yet.Recently, the deep neural networks have been also utilized in the field of numerical solution of PDEs ([1], [5], [9], [11], [12]), and for convolutional ones, see ([10]).
In this work, we investigate the ability of fully connected DNNs to provide an accurate enough approximation of the nonlinear mappings that arise in the finite element discretization of nonlinear PDEs. The finite element method applied to a nonlinear PDE posed variationally, basically requires the evaluation of integrals over each finite element which involves nonlinear functions that can be evaluated pointwise (generally, using quadratures). The unknown function which approximates the solution of the PDE is a linear combination of piecewise polynomials and the individual integrals for any given value of can be evaluated accurately enough (in general, approximately, using quadrature formulas). Typically, in a finite element discretization procedure, we use refinement. That is, we can have a fine enough final mesh, a set of elements , obtained from a previous level coarse mesh . One way to solve the finelevel nonlinear discretization problem, is to utilize the existing hierarchy of discretizations. One approach to maintain high accuracy of the coarse operators while evaluating their actions is to utilize the accurate fine level nonlinear operator. That is, for any given coarse finite element function, we expand it in terms of the fine level basis, apply the action of the fine level nonlinear operator and then restrict the result back to the coarse level, i.e., we apply a Galerkin procedure. This way of defining the coarse nonlinear operator provides better accuracy, however its evaluation requires finelevel computations. In the linear case, one can actually precompute the coarse level operators (matrices) explicitly, which is not the case in the nonlinear case. This study offers a way to generate a coarse operator, that can approximate the variationally defined finite element one on coarse levels by training a fully connected DNN. We do not do this globally, but rather construct the desired nonlinear mapping based on actions of locally trained DNNs associated with each coarse element from a coarse mesh
. This is much more feasible than training the actions of the global coarse nonlinear mapping; these will have as input many coarse functions (i.e., their coefficient vectors) and are thus much bigger and hence much more expensive than their restrictions to the individual coarse elements.
The remainder of this paper is structured as follows. In Section 2, we introduce the problem in a general setting. Then in the following Section 2.3
, we present a computational study of the training of the coarse nonlinear operators by varying the domain (a box in a highdimensional space with size equal to the number of local coarse degrees of freedom). The purpose of the study is to assess the complexity of the local DNNs depending on the desired approximation accuracy. We also show the approximation accuracy of the global coarse nonlinear mapping (of our main interest) which depends on the coarsening ratio
. In Section 3, we introduce the FAS (full approximation scheme) solver ([2]), and in the following section 3.5, we apply the trained DNNs to replace the true coarse operator in a twolevel FAS for a model nonlinear diffusionreaction PDE discretized by piecewise linear elements. Finally, in Section 4, we draw some conclusions and outline few directions for possible future work.2. Approximation for nonlinear mappings using DNNs
2.1. Problem setting
We are given the system of nonlinear equations
(2.1) 
Here, is a mapping from , and we have access only at its actions (by calling some function).
We assume that the solution belongs to a box , e.g., for some value of . Typically, nonlinear problems like (2.1) are solved by iterations, and for any given current iterate , we look for a correction such that gives a better accuracy. This motivates us to rewrite (2.1) as
where
Our goal is to train a DNN where and are the inputs and is the output. The input is drawn from the box , whereas the correction is drawn from a small ball . In our study to follow, we vary the parameters and for a particular mapping (and respective ) to assess the complexity of the resulting DNN and examine the approximation accuracy. The general strategy is as follows. We draw vectors from the box using Sobol sequence ([17, 18]) and vectors from the ball , also using Sobol sequence. The alternative would be to simply use random points in and , however Sobol sequence is better in terms of cost versus approximation ability (at least for smooth mappings). Once we have built the DNN with a desired accuracy on the training data, we test its approximation quality on a number of randomly selected points from and .
Our results are documented in the next subsection for a particular example of a finite element mapping; first on each individual subdomain (coarse element) and then for its global action composed from all locally trained DNNs.
2.2. Training DNNs for model nonlinear finite element mapping
We consider the nonlinear PDE
(2.2) 
Here, is a polygon in and is a given positive nonlinear function of .
The variational formulation for (2.2) is: find such that
The above problem is discretized by piecewise linear finite elements on triangular mesh that yields a system of nonlinear equations.
In this section, we consider the coarse nonlinear mapping that corresponds to a coarse triangulation which after refinement gives the fine one . The coarse finite element space is and the fine one is . By construction, we have . Let be the basis of and be the basis of . These are piecewise linear functions associated with their respective triangulations and . More specifically, we use Lagrangian bases, i.e., and are associated with the sets of vertices, and , of the elements of their respective triangulations.
The coarse nonlinear operator is then defined as follows. Let be a coarse finite element function. Since , we can expand in terms of the basis of , i.e.,
We can also expand in terms of the basis of , i.e., we have
In the actual computations we use their coefficient vectors
These coefficient vectors are related by an interpolation mapping
(which is piecewise linear), i.e., we haveFirst we define the local nonlinear mappings , associated with each .
In terms of finite element functions, we have as input restricted to , and we evaluate the integrals
Each integral over is computed as a sum of integrals over the finelevel elements , , using fine level computations, i.e., and are linear on each and these finelevel integrals are assumed computable (by the finite element software used to generate the fine level discretization, which possibly employs high order quadratures).
In terms of linear algebra computations, we have as an input vector, and have as an output a vector of the same size, i.e., equal to the number of vertices of . Note that we will be training the DNN for the mapping of two variables, and , i.e.,
That is, the input vectors will have size two times bigger than the output vectors. Once we have trained the local actions of the nonlinear mapping, the global action is obtained by standard assembly, using the fact that
(2.3) 
Here, stands for the mapping that extends a local vector defined on to a global vector by zero values outside . For a global vector , denotes its restriction to .
In the following section, we provide actual test results for training DNNs, first for the local mappings , and then for the respective global one .
2.3. Training local DNNs for the model finite element mapping
), a Python Deep Learning library to approximate nonlinear mappings.
Kerasis a highlevel neural networks API (Application Programming Interface), written in Python and capable of running on top of TensorFlow (
[15]). In this work, we use a fully connected network. The Sequential model in Keras provides a way to implement such a network.The input vectors are of size ( stacked on top of ) and the desired outputs are the actions () represented as vectors of size for any given input. The network consists of few fully connected layers with tanh
activation at each layer. We use the standard mean squared error as the loss function.
In the tests to follow, we use data from the boxes , and from balls of various sizes. Specifically, we performed the numerical tests with vectors each taken from
and vectors drawn from balls with radii , respectively (i.e., the first is paired with the first ball , the second box is paired with the second ball , and so on). In our test we have chosen , that is, the local sets have four coarse dofs. Also, we vary the ratio , which implies that we have and fine dofs, respectively (while keeping fixed the number of coarse dofs in ).
The network was trained with layers, each with neurons. The training algorithm was provided by TensorFlow using the ADAM optimizer ([13]
) which is a variant of the SGD (stochastic gradient descent) algorithm, see, e.g., (
[19]). We used epochs, , learning rate along with and . For more details on the meaning of these parameters, we refer to ([14]) and ([19]).Let be the action after the training. Tables 2.3, 2.3, and 2.3 show the average of the relative errors using and of and over examples consisting of within the box and within the ball .
The approximation of the global coarse nonlinear operator is presented in Table 2.3. As it is seen from formula (2.3), we can approximate each independently of each other, and after combining the individual approximations (using the same assembly formula with each replaced by its DNN approximation), we define the approximation to the global . We use the same setting for training the individual neural networks for each . We have decomposed into several subdomains so that which each has . In this test, we chose .
In Table 2.3, we show how the accuracy varies for different local boxes and respective balls . As before, we present the average of the relative and errors. One can notice that for finer (and ), i.e., more local subdomains , we get somewhat better approximations of the global . It should be expected since with smaller the finite element problem approximates better the continuous one. Some visual illustration of this fact is presented in Figures 1, 2, 3, 4, 5, and 6. More specifically, we provide plots of and for the data that achieves the min and max errors when the number of subdomains are 4, 16 and 64 corersponding to box and and ball with .
For the next test, we have and . The same settings for neural networks are used along with and . The results in Table 2.3 show the average of the relative and errors for different number of subdomains.
As expected, since smaller and fixed gives better approximation of the nonlinear operator, for the ratio we do see better approximation than for .
3. Application of DNN approximate coarse mappings in twolevel FAS
3.1. The FAS algorithm
A standard approach for solving (2.1) is to use Newton’s method. The latter is an iterative process in which given a current iterate , we compute the next one, , by solving the Jacobian equation
(3.1) 
and then .
Typically, the Jacobian problem (3.1) is solved by an iterative method such as GMRES (generalized minimal residual).
To speedup the convergence, for nonlinear equations coming from finite element discretizations of elliptic PDEs, such as (2.2), we can exploit hierarchy of discretizations. A popular method is the twolevel FAS (full approximation scheme) proposed by Achi Brandt, ([2]) (see also ([20])). For a recent convergence analysis of FAS, we refer to ([3]).
To define the twolevel FAS, we need a coarse version of , which is another nonlinear mapping , for some . We also need its Jacobian . Again, we assume that both are available via their actions (by calling some appropriate functions). To communicate data between the original, fine level, and the coarse level, , we are given two linear mappings (matrices):

Coarsetofine (interpolation or prolongation) mapping .

Finetocoarse projection , more precisely, we assume that (then is a projection). In our finite element setting, is simply the restriction of a finegrid vector to the coarse dofs, i.e., , where the columns in correspond to the coarse dofs viewed as subset of the the fine dofs.
Then, the twolevel FAS (TLFAS) can be formulated as follows.
Algorithm 3.1 (Twolevel FAS).
For problem (2.1), with a current approximation , the twolevel FAS method performs the following steps to compute the next approximation .

For a given apply steps of (inexact) Newton algorithm, (3.1), to compute and let .

Form the coarselevel nonlinear problem for
(3.2) 
Solve (3.2) accurately enough using Newton’s method based on the coarse Jacobian and initial iterate . Here we use enough iterations of GMRES for solving the resulting coarse Jacobian residual equations
until a desired accuracy is reached.

Update finelevel approximation

Repeat the FAS cycle starting with until a desired accuracy is reached.
In what follows, we use the following equivalent form of the TLFAS. At the coarse level, we will represent , where is the initial coarse iterate coming from the fine level, and we will be solving for the correction . That is, the coarse problem in terms of the correction reads
where
The rest of the algorithm does not change, in particular, we have
and
3.1.1. The choice for using DNN
In our finite element setting, a true (Galerkin) coarse operator is , where is the piecewise linear interpolation mapping and is the fine level nonlinear finite element operator. We train the global coarse actions based on the fact that the actions of and can be computed subdomainbysubdomain employing standard finite element assembly procedure, as described in the previous section. That is, can be assembed by local s and the coarse can be assembled from local coarse actions based on local versions of .
More specifically, we train for each subdomain a DNN which takes as input any pair of coarse vectors and produces as our desired output. The global action is computed by assembling all local actions, and we use the same assembling procdeure for the approximations obtained using the trained local DNNs.
The trained this way DNN gives the actions of our coarse nonlinear mapping .
3.2. Some details on implementing TLFAS algorithm
We are solving the nonlinear equation where the actions of are available. Also, we assume that we can compute its Jacobian matrix for any given , .
We are also given an interpolation matrix . Finally, we have a mapping such that on . This implies that is a projection, i.e, .
Consider the coarse nonlinear mapping .
We assume that an approximation to the mapping
is available through its actions for a set of input vectors varying in a given box and for another input vector varying in a small ball about the origin. For a fixed , we denote .
We are interested in the following twolevel FAS algorithm for solving .
Algorithm 3.2 (TlFas).
Input parameters:

Initial approximation sufficiently close to the exact solution . For a problem with a known solution, we choose , where is an input (e.g., ). The random vector has as components random numbers in .

(e.g., , or )  tolerance used in GMRES to solve approximately the finelevel Jacobian equations.

Maximal number of finelevel inexact Newton iterations

Additionally, maximal number of GMRES iterations, allowed in solving the finelevel Jacobian equations.

(e.g., ), tolerance used in GMRES for solving coarselevel Jacobian equations.

(equal to or , or )  step length in coarselevel inexact Newton iterations.

Maximal number of coarselevel GMRES iterations, .

Maximal number of inexact coarselevel Newton iterations , .

Maximal number of FAS iterations, .

Tolerance for FAS iterations, .
With the above input specified, a TLFAS algorithm takes the form:

FASloop: If visited times perform the steps below. Otherwise exit.

Perform finelevel inexact Newton iterations, which involve the following steps:

For the current iterate (the initial one, , is given as input) compute residual

Compute Jacobian .

Solve approximately the Jacobian equation
using at most GMRES iterations or until reaching tolerance , i.e.,

Update .


Compute finelevel Jacobian and the coarselevel one .

Compute .

Coarseloop for solving
with initial guess where we keep and the coarse Jacobian fixed. The coarselevel loop reads:

Compute coarse residual

Solve by GMRES, using at most iterations or until we reach

Update

Repeat at most times the above three steps of the coarselevel loop.


Update finelevel iterate

If , go to the beginning of FASloop. Otherwise exit.

3.3. Local tools for FAS
We stress again that all global actions of the coarse operator (exact and approimate via DNNs) are realized by assembly of local actions. All this is possible due to the inherent nature of the finite element method. We illustrate the local subdomains in Fig. 7 and Fig. 8.
3.4. Some details on implementing TLFAS with DNNs
Using the tools and algorithms from the previous sections, we first outline an implementation that involves training inside FAS and then present the algorithm for training outside the FAS. Training on the outside is useful because the training is then only part of a one time set up cost and independent of the r.h.s. The twolevel training inside FAS is formulated as follows.
Algorithm 3.3 (Training inside FAS).
With the inputs specified in algorithm 3.2, the training inside takes the following steps

Perform finelevel inexact Newton iterations, which involve steps in algorithm 3.2 and update .

Compute finelevel Jacobian and the coarselevel one .

Compute .

We generate samples which are in the neighborhood of . More specifically, we use a shifted box and draw to vectors from and to vectors from the ball with radius . We use these sets as inputs for training the DNNs to get the approximations of for each subdomain and assemble them into a global coarse action, . Next, we enter the coarselevel loop as in Algorithm 3.2 and the rest of the algorithm remains the same. Note that the training is done only for the first finelevel iterate .
The following implementation gives the approximations of for each subdomain and can be performed outside the Algorithm 3.2.
Algorithm 3.4 (Training outside FAS).
Given the inputs specified in algorithm 3.2,in order to get the approximations of we proceed the training outside as follows

We take inputs from the box , and corrections are drawn from the ball of radius . This gives a training set with vectors.

We then use these vectors as inputs and train the neural networks which provide approximations of for each subdomain . The latter after assembly give the global coarse action, .

Next, we enter Algorithm 3.2 with this approximate and the rest of the algorithm remains the same.
3.5. Comparative results for FAS with exact and approximate coarse operators using DNNs
In this section, we present some results for two level FAS using the exact operators and its approximation from the training inside and outside as mentioned in the previous section.
We perform the tests with the neural networks using the same settings in section 2.3 and test our algorithm for problem 2.1 with exact solution is on the unit square in with . For the FAS algorithm, we use the following parameters:

[topsep=2pt]

Initial approximation , where and the random vector has as components random numbers in .

 tolerance used in GMRES to approximately solve the finelevel Jacobian equations.

Maximal number of finelevel inexact Newton iterations and tolerance is .

Maximal number of GMRES iterations, allowed in solving the finelevel Jacobian equations.

, tolerance used in GMRES for solving coarselevel Jacobian equations.

 step length in coarselevel inexact Newton iterations for FAS two levels with true operators, FAS training inside, and for FAS training outside.

Maximal number of coarselevel GMRES iterations, .

Maximal number of inexact coarselevel Newton iterations , with tolerance .

Maximal number of FAS iterations, .

Tolerance for FAS iterations, .
We present the results in Table 3.5 (TLFAS with true coarse operator), Table 3.5 (TLFAS with training inside), and Table 3.5 (TLFAS with training outside). It can be seen that the training inside does give similar results to the true operators and even better in terms of relative residuals. For four subdomains, the training outside reached the same number of iterations in FAS as in Tables 3.5 and 3.5. However, we did not achieve as small residuals as we could in the previous two cases. When we have more subdomains, the training outside requires more iterations on the coarse level than the other two approaches, but nevertheless it still meets the desired tolerance.
We also studied the case with the same setting as above for the neural networks, the same and the exact solution. The following tables (Tables 3.5, 3.5, and 3.5) represent the results for 4 and 16 subdomains. They show similar behaviour as in the previous case of .
3.5.1. Cost of training
To get a sense of the cost, for the case of 4 subdomains and , we display the accuracy and the loss values, characteristics provided by Keras, commonly used in training DNNs. Here, we only present the results for one of the four subdomains since the outcomes are similar for all other subdomains. Figure 9 shows plots for training inside whereas Figure 10 for the training outside FAS.
3.5.2. Results for different nonlinear coefficients
Next, we consider 4 subdomains with for different coefficients . We use the same settings for the initial inputs and the neural networks as specified at the beginning of this section. We see in Table 3.5.2, Table 3.5.2, Table 3.5.2, Table 3.5.2, Table 3.5.2 and Table 3.5.2 that the results are comparable for all of the used coefficients .
3.5.3. Illustration of the computed approximate solutions
We also provide illustration for the approximate solutions obtained from Newton method using the nonlinear operators from the training outside FAS and project them back to the fine level along with the true solutions in the following figures 11, 12 and 13 for several subdomain cases. Here, we use the true solution . These illustrations demonstrate the potential for using the trained DNNs as accurate discretization tool.
Similarly, we perform the test with the exact solution . Figure 14, 15, and 16 show the plots of approximate solutions and true solutions for different number of subdomains.
4. Conlusions and future work
The paper presented first encouraging results for approximating coarse finite element nonlinear operators for model diffusion reaction PDE in two dimensions. These operators were successfully employed in a twolevel FAS for solving the resulting system of nonlinear algebraic equations. The resulting DNNs are quite expensive to replace the true Galerkin coarse nonlinear operators, however once constructed, one could in principle use them for solving the same type nonlinear PDEs with different r.h.s. Upto a certain extent we can control the DNN complexity by choosing larger ratio and finally, it is clear that the training since it is local, subdomainbysubdomain, and independent of each other, one can exploit parallelism in the training. Another viable option is to use convolutional DNNs instead of the currently employed fully connected ones. Also, a natural next step is to apply recursion, thus ending up with a hierarchy of trained coarse DNNs for use as coarse nonlinear discretization operators. There is one more part where we can apply DNNs, namely to get approximations to the coarse Jacobians, (also done locally). Here, the input is the local coarse vector and the output will be a local matrix . It is also of interest to consider more general nonlinear, including stochastic, PDEs, which is a widely open area for future research.
Disclaimer
This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accu racy, completeness, or usefulness of any information, apparatus, product, or pro cess disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or im ply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States gov ernment or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.
References
 [1] L. Bar and N. Sochen, Unsupervised deep learning algorithm for pdebased for ward and inverse problems, 2019. arXiv: 1904.05417.
 [2] A. Brandt and O. Livne, Multigrid techniques, Society for Industrial and Applied Mathematics, 2011. DOI: 10.1137/1.9781611970753.
 [3] L. Chen, X. Hu, and S. M. Wise, Convergence analysis of a generalized full approximation storage scheme for convex optimization problems, submitted. 2018.

[4]
T. Chen and H. Chen,
Universal approximation to nonlinear operators by neu ral networks with arbitrary activation functions and its application to dynamical systems
, Ieee transactions on neural networks, vol. 6, no. 4, pp. 911917, 1995.  [5] M. M. Chiaramonte and M. Kiener, Solving differential equations using neural networks, http://cs229.stanford.edu/proj2013/, 2017.

[6]
G. Cybenko,
Approximation by superpositions of a sigmoidal function
, Math ematics of control, signals and systems, vol. 2, no. 4, pp. 303314, Dec. 1989. [Online]. Available: https://doi.org/10.1007/BF02551274.  [7] I. Daubechies, R. DeVore, S. Foucart, B. Hanin, and G. Petrova, Nonlinear approximation and (deep) relu networks, 2019. arXiv: 1905.02199.
 [8] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, The MIT Press, 2016.
 [9] J. Han, A. Jentzen, and W. E, Solving highdimensional partial differential equations using deep learning, 2018. arXiv: 1707.02568.

[10]
J. He and J. Xu,
Mgnet: A unified framework of multigrid and convolutional neural network
, Science China Mathematics, vol. 62, no. 7, pp. 13311354, Jul. 2019. DOI: 10.1007/s1142501995472.  [11] J.T. Hsieh, S. Zhao, S. Eismann, L. Mirabella, and S. Ermon, Learning neural PDE Solvers with convergence guarantees, 2019. arXiv: 1906.01200.
 [12] T.W. Ke, M. Maire, and S. X. Yu, Multigrid neural architectures, 2016. arXiv:1611.07661.
 [13] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, Corr, vol. abs/1412.6980, 2014.
 [14] S. Pal and S. Gulli, Deep learning with keras: Implementing deep learning models and neural networks with the power of python. Packt Publishing, 2017.
 [15] N. Shukla, Machine learning with tensorflow. Manning Publications Co., 2018.
 [16] J. W. Siegel and J. Xu, On the approximation properties of neural networks, 2019. arXiv:1904.02311.
 [17] I. M. Sobol’, On the distribution of points in a cube and the approximate eval uation of integrals, USSR computational mathematics and mathematical physics, vol. 7, no. 4, pp. 86112, 1967.
 [18] I. M. Sobol’, D. Asotsky, A. Kreinin, and S. Kucherenko, Construction and comparison of highdimensional sobol’ generators, Wilmott, vol. 2011, no. 56, pp. 6479, 2011.
 [19] G. Strang, Linear algebra and learning from data. WellesleyCambridge Press, Feb. 2019.
 [20] P. S. Vassilevski, Multilevel block factorization preconditioners: Matrixbasedanalysis and algorithms for solving finite element equations. Springer Science and Business Media, 2008.
 [21] D.X. Zhou, Universality of deep convolutional neural networks, 2018. arXiv:1805.10769.