Solving PDEs by Variational Physics-Informed Neural Networks: an a posteriori error analysis

by   Stefano Berrone, et al.
Politecnico di Torino

We consider the discretization of elliptic boundary-value problems by variational physics-informed neural networks (VPINNs), in which test functions are continuous, piecewise linear functions on a triangulation of the domain. We define an a posteriori error estimator, made of a residual-type term, a loss-function term, and data oscillation terms. We prove that the estimator is both reliable and efficient in controlling the energy norm of the error between the exact and VPINN solutions. Numerical results are in excellent agreement with the theoretical predictions.



page 11


Variational Physics Informed Neural Networks: the role of quadratures and test functions

In this work we analyze how Gaussian or Newton-Cotes quadrature rules of...

An energy-based error bound of physics-informed neural network solutions in elasticity

An energy-based a posteriori error bound is proposed for the physics-inf...

Notes on Exact Boundary Values in Residual Minimisation

We analyse the difference in convergence mode using exact versus penalis...

Residual-type a posteriori error analysis of HDG methods for Neumann boundary control problems

We study a posteriori error analysis of linear-quadratic boundary contro...

Error estimates of residual minimization using neural networks for linear PDEs

We propose an abstract framework for analyzing the convergence of least-...

Adaptive VEM: Stabilization-Free A Posteriori Error Analysis

In the present paper we initiate the challenging task of building a math...

Pointwise A Posteriori Error Analysis of a Discontinuous Galerkin Method for the Elliptic Obstacle Problem

We present a posteriori error analysis in the supremum norm for the symm...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The possibility of using deep-learning tools for solving complex physical models has attracted the attention of many scientists over the last few years. We have in mind in this paper models that are mathematically described by partial differential equations, supplemented by suitable boundary and initial conditions. In the most general setting, if no information on the model is available except the knowledge of some of its solutions, the model may be completely surrogated by one or more neural network, trained by data (i.e., by the known solutions). However, in most situations of interest, the mathematical model is known (e.g., the Navier-Stokes equations describing an incompressible flow), and such information may be suitably exploited in training the network(s): one gets the so-called Physics Informed Neural Networks (PINNs). This approach was first proposed in

[raissi2019physics], and it inspired further works such as e.g. [tartakovsky2018learning] or [yang2019adversarial], until the recent paper [LanthalerMishraKarniadakis2021] which presents a very general framework for the solution of operator equations by deep neural networks. PINNs are trained by using the strong form of the differential equations, which are enforced at a set of points in the domain by suitably defining the loss function. In this sense, PINNs can be viewed as particular instances of least-square/collocations methods.

Based on the weak formulation of the differential model, the so-called Variational Physics-Informed Neural Networks (VPINNs), proposed in [kharazmi2019variational], enforce the equations by means of suitably chosen test functions, not necessarily represented by neural networks [khodayi2020varnet]; they are instances of least-square/Petrov-Galerkin methods. While the construction of the loss function is generally less expensive for PINNs than for VPINNs, the latter allow for the treatment of models with less regular solutions, as well as an easier enforcement of boundary conditions. In addition, the error analysis for VPINNs takes advantage of the available results for the discretization of variational problems, in fulfilling the assumptions of Lax-Richmyer’s theorem ‘stability plus consistency imply convergence’. Actually, consistency results follow rather easily from the recently established approximation properties of neural networks in Sobolev spaces (see, e.g., [elbrachter2021deep], [guhring2020error], [opschoor2020deep], [kutyniok2021theoretical], [opschoor2021exponential], [gonon2021deep]), whereas the derivation of stability estimates for the neural network solution appears to be a less trivial task: indeed, a neural network is identified by its weights, which are usually much more than the conditions enforced in its training. In other words, the training of a neural network is functionally an ill-posed problem.

To this respect, we considered in [BeCaPi2021]

a Petrov-Galerkin framework in which trial functions are defined by means of neural networks, whereas test functions are made of continuous, piecewise linear functions on a triangulation of the domain. Relying on an inf-sup condition between spaces of piecewise polynomial functions, we derived an a priori error estimate in the energy norm between the exact solution of an elliptic boundary-value problem and a high-order interpolant of a deep neural network, which minimizes the loss function. Numerical results indicate that the error follows a similar behavior when the interpolation operator is turned off.

The purpose of the present paper is to perform an a posteriori error analysis for VPINNs, i.e., to get estimates on the error which only depend on the computed VPINN solution, rather than the unknown exact solution. This is important to get a practical and quantitative information on the quality of the approximation. After setting the model elliptic boundary-value problem in Sect. 2, and the corresponding VPINN discretization in Sect. 2.1, we define in Sect. 3 a computable residual-type error estimator, and prove that it is both reliable and efficient in controlling the energy error between the exact solution and the VPINN solution. Reliability means that the global error is upper bounded by a constant times the estimator, efficiency means that the estimator cannot over-estimate the energy error, since the latter is lower bounded by a constant times the former up to data oscillation terms. The proposed estimator is obtained by summing up several terms: one is the classical residual-type estimator in finite elements, measuring the bulk error inside each element of the triangulation as well as the inter-element gradient jumps; another term accounts for the magnitude of the loss function after minimization is performed; the remaining terms measure data oscillations, i.e., the errors committed by locally projecting the equation’s coefficients and right-hand side upon suitable polynomial spaces. The estimator can be written as a sum of elemental contributions, thereby allowing its use within an adaptive discretization strategy which refines the elements carrying the largest contributions to the estimator.

2 The model boundary-value problem

Let be a bounded polygonal/polyhedral domain with Lipschitz boundary .

Let us consider the model elliptic boundary-value problem


where , satisfy , in for some constant , whereas .

Setting , define the bilinear and linear forms


denote by the coercivity constant of the form , and by , the continuity constants of the forms and . Problem (1) is formulated variationally as follows: Find such that


[Other boundary conditions] The forthcoming formulation of the discretized problem and the a posteriori error analysis can be extended without pain to cover the case of mixed Dirichlet-Neumann boundary conditions, namely on , on , with . We just consider homogeneous Dirichlet conditions to avoid an excess of technicalities.

2.1 The VPINN discretization

We aim at approximating the solution of Problem (1) by a generalized Petrov-Galerkin strategy.

To define the subset of

of trial functions, let us choose a fully-connected feed-forward neural network structure

, with input variables and 1 output variable, identified by the number of layers , the layer widths ,

, and the activation function

. Thus, each choice of the weights defines a mapping , which we think as restricted to the closed domain ; let us denote by the manifold containing all functions that can be generated by this neural network structure. We enforce the homogeneous Dirichlet boundary conditions by multiplying each by a fixed smooth function (we refer to [sukumar2022] for a general strategy to construct this function); we assume that belongs to for any . In conclusion, our manifold of trial functions will be

To define the subspace of of test functions, let us introduce a conforming, shape-regular triangulation of with meshsize and let be the linear subspace formed by the functions which are piecewise linear polynomials over the triangulation . Furthermore, let us introduce computable approximations of the forms and by numerical quadratures. Precisely, for any , let be the nodes and weights of a quadrature formula of precision on . Then, assuming that all data , , , are continuous in each element of the triangulation, we define the approximate forms


With these ingredients at hand, we would like to approximate the solution of Problem (4) by some satisfying


In order to handle this problem by the neural network, let us introduce a basis in , say , and for any let us define the residuals


as well as the loss function


Then, we search for a global minimum of the loss function in , i.e., we consider the following minimization problem: Find such that


Note that any solution of (7) annihilates the loss function, hence it is a solution of (10); such a solution may not be unique, since the set of equations (7) may be underdetermined (in particular, for one may obtain a non-zero , see [BeCaPi2021, Sect. 6.3]). On the other hand, system (7) may be overdetermined, and admit no solution; in this case, the loss function will have strictly positive minima.

[Discretization with interpolation] In order to reduce and control the randomic effects related to the use of a network depending upon a large number of weights, in [BeCaPi2021] we proposed to locally project the neural network upon a space of polynomials, before computing the loss function.

To be precise, we have considered a conforming, shape-regular partition of , which is equal to or coarser than (i.e., each element is contained in an element ) but compatible with (i.e., its meshsize satisfies ). Let be the linear subspace formed by the functions which are piecewise polynomials of degree over the triangulation , and let be the associated element-wise Lagrange interpolation operator.

Given a neural network , let us denote by its piecewise polynomial interpolant. Then, the definition (8) of local residuals is modified as


consequently, the loss function takes the form


and we define a new approximation of the solution of Problem (4) by setting


In [BeCaPi2021] we derived an a priori error estimate for the error , and we documented the error decay as , which turns out to have a more regular behavior that the error , although the latter is usually smaller.

The subsequent a posteriori error analysis could be extended to give a control on the error produced by as well. For the sake of simplicity, we do not pursue such a task here.

3 The a posteriori error estimator

In order to build an error estimator, let us first choose, for any and any , a projection operator satisfying


This allows us to introduce approximate bilinear and linear forms


which are useful in the forthcoming derivation. Indeed, the coercivity of the form allows us to bound the -norm of the error as follows:


We split the numerator as


and we proceed to bound each term on the right-hand side.

The terms and account for the element-wise projection error upon polynomial spaces; they are estimated in the next two Lemmas.

The quantity defined in (18) satisfies




Setting and using (14), we get

and we conclude using the bound . ∎

The quantity defined in (18) satisfies




It holds

where we have used again (14). We conclude as in the proof of Lemma 3.∎

Let us now focus on the quantity , which can be written as


in turn, the quantity can be written as


The bound of is standard in finite-element a posteriori error analysis: it involves the local bulk residuals


and the interelement jumps at each edge shared by two elements, say and

with opposite normal unit vectors

and , namely


in addition, one defines if .

To derive the bound, the test function in (23) is chosen as , the Clément interpolant of on [clement1975], which satisfies


where .

The quantity defined in (23) satisfies




with defined in (25) and defined in (26).


We refer e.g. to [verfurth1996] for more details. ∎

Before considering the quantity , let us state a useful result of equivalence of norms. For any , let be the vector of its coefficients. There exist constants , possibly depending on such that


where .


The result expresses the equivalence of norms in finite dimensional spaces. If the triangulation is quasi uniform, then one can prove by a standard reference-element argument that whereas . ∎

We are now able to bound the quantity in terms of the loss function introduced in (9), as follows. The quantity defined in (24) satisfies




and the constant is defined in (30).


Writing , it holds


We conclude by using (30) and observing that


since we have chosen and (27) holds. ∎

We are left with the problem of bounding the terms and in (24). They are similar to the terms and , respectively, but reflect the presence of the quadrature formula introduced in (5) and (6). In the forthcoming analysis, it will be useful to introduce the following notation for the quadrature-based discrete (semi-)norm on :


Let us start with the quantity . Recalling that the adopted quadrature rule has precision and test functions are piecewise linear polynomials, it holds


On the one hand, recalling the assumption and inequality (33) one has


On the other hand, we first observe that, by the exactness of the quadrature rule and (14), we get



Summarizing, we obtain the following result, which is anologous to that in Lemma 3. The quantity defined in (24) satisfies




The last term in (24), , can be written as


Concerning , by the exactness of the quadrature rule and the fact that is piecewise constant, one has

which easily gives

The terms and are similar to the term above, in which is replaced by and , respectively. Hence, they can be bounded as done for . Summarizing, we obtain the following result, which is anologous to that in Lemma 3. The quantity defined in (24) satisfies




At this point, we are ready to derive the announced a posteriori error estimates. In order to get an upper bound of the error, we concatenate (17), (18), (23), (24), and use the bounds given in Lemmas 3 to 3, arriving at the following result.

[a posteriori upper bound of the error] Let satisfy (10). Then, the error can be estimated from above as follows:




We realize that the global estimator is the sum of four contributions: is the classical residual-based estimator, measures how small the minimized loss function is, i.e., how well the discrete variational equations (7) are fulfilled, whereas and reflect the error in approximating elementwise the coefficients of the operator and the right-hand side by polynomials of degrees related to the precision of the quadrature formula.

It is possible to derive from (43) an element-based a posteriori error estimator, which can be used to design an adaptive strategy of mesh refinement (see, e.g. [NochettoCIME2012]). To this end, from now on we assume that the basis of , introduced to define (8), is the canonical Lagrange basis associated with the nodes of the triangulation . Given any , we introduce the elemental index set , where is the support of , and we define a local contribution to the term as follows:


which satisfies

With this definition at hand, we can introduce the following elemental error estimator. [elemental error estimator] For any , let us set


where the addends in this sum are defined, respectively, in (29), (45), (22) and (42), (20) and (39).

Then, Theorem 3 can be re-formulated in terms of these quantities. [localized a posteriori error estimator] The error can be estimated as follows:


Inequality (47) guarantees the reliability of the proposed error estimator, namely the estimator does provide a computable upper bound of the discretization error. Next result assures that the estimator is also efficient, namely it does not overestimate the error. [a posteriori lower bound of the error] Let satisfy (10). Then, the error can be locally estimated from below as follows: for any it holds


To derive (48), let us first consider the bulk contribution to the estimator. We apply a classical argument in a posteriori analysis, namely we introduce a non-negative bubble function with support in and such that and for all .

Let us set . Then,


we obtain