DeepAI

# An artificial neural network approximation for Cauchy inverse problems

A novel artificial neural network method is proposed for solving Cauchy inverse problems. It allows multiple hidden layers with arbitrary width and depth, which theoretically yields better approximations to the inverse problems. In this research, the existence and convergence are shown to establish the well-posedness of neural network method for Cauchy inverse problems, and various numerical examples are presented to illustrate its accuracy and stability. The numerical examples are from different points of view, including time-dependent and time-independent cases, high spatial dimension cases up to 8D, and cases with noisy boundary data and singular computational domain. Moreover, numerical results also show that neural networks with wider and deeper hidden layers could lead to better approximation for Cauchy inverse problems.

• 18 publications
• 1 publication
12/27/2017

### Neural network augmented inverse problems for PDEs

In this paper we show how to augment classical methods for inverse probl...
02/24/2020

### An Outer-approximation Guided Optimization Approach for Constrained Neural Network Inverse Problems

This paper discusses an outer-approximation guided optimization method f...
01/26/2021

### Benchmarking Invertible Architectures on Inverse Problems

Recent work demonstrated that flow-based invertible neural networks are ...
11/07/2019

### Solving Inverse Problems for Steady-State Equations using A Multiple Criteria Model with Collage Distance, Entropy, and Sparsity

In this paper, we extend the previous method for solving inverse problem...
07/19/2022

### Contaminant source identification in groundwater by means of artificial neural network

In a desired environmental protection system, groundwater may not be exc...
05/16/2019

### AlgoNet: C^∞ Smooth Algorithmic Neural Networks

Artificial neural networks revolutionized many areas of computer science...
10/26/2022

### Towards a machine learning pipeline in reduced order modelling for inverse problems: neural networks for boundary parametrization, dimensionality reduction and solution manifol

In this work, we propose a model order reduction framework to deal with ...

## 1. Introduction

The approximation to Cauchy inverse problem is an important objective in the last few decades. Let us consider the following two classical cases: (I) for time independent problem, and (II) for time dependent problem: Let be a domain with continuous boundary , where is the spatial dimension. It is worth to mention that is part but not all of , and the aim of Cauchy inverse problem is to recover solution on the rest of boundary , with proper initial and boundary conditions .

 (1.1) (I)⎧⎪ ⎪⎨⎪ ⎪⎩Lu(x)=0x  $in$  Ωu(x)=fx  $on$  Γ∂u(x)∂n=gx  $on$  Γ

and

 (1.2) (II)⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩∂u(x,t)∂t+Lu(x,t)=0[x,t]  $in$  Ω×Tu(x,t)=f[x,t]  $on$  Γ×T∂u(x,t)∂n=g[x,t]  $on$  Γ×Tu(x,0)=hx  $in$  Ω

where is the outer unit normal with respect to , is a linear operator and represents time.

There are many works concerning the implementation and analysis for their numerical methods. For time-independent case (1.1

), Carleman-type estimates for the discrete scheme are used for Laplace’s case in

[Santosa1991]. After that, many authors propose various algorithms for Cauchy inverse problem for Laplace’s equation, such as conjugate gradient method[Lesnic2000], Backus-Gilbert algorithm[Hon_2001], regularization methods[Reinhardt1999], and some methods from linear algebra[Nachaoui2002, Nachaoui2004]. Meanwhile some convergence and stability analysis are constructed. Chakib et al.[Chakib2001] proved the existence of a solution to Cauchy problem, and it is first proved that the desired solution is the unique fixed point of some appropriate operator in [Cimeti_re_2001]. For time-dependent case (1.2), Besela et al. [Besala1966] proved the uniqueness of solutions with direct case in 1966. After that there are many authors considering various types of stable numerical algorithms in different fields such as heat equation[Cannon1967, Elden1987], Helmholtz equation[Berntsson2014] and time-space fractional diffusion equation[Sakamoto2011, SLi2018].

The key point of numerical methods for Cauchy inverse problem is the ways to treat the ill-posedness. It is well known that there exists at most one solution to the above two Cauchy problem. However, they are typically ill-posed, which means that a small change of the initial data may induce large changes of the solutions. It is referred to [Isakov1998] and the reference there in for more details on this issue. Regularization method is an effective and general technique to deal with the ill-posedness of inverse problem [Ito2014]. During the last few decades, different regularization methods have been proposed to solve various PDE inverse problem, such as Tikhonov regularization method [Jin_2008, Tomoya2008], boundary element method [Cheng_2014, Marin2003], variational method[Jin2010] and dynamical regularization algorithm [Zhang_2018], etc. As for Cauchy problem, some numerical analysis and experiments on the regularization for different equations are proposed, such as Laplace’s equation[Bourgeois_2005, Wei2013], elliptic equation[Feng_2013], Helmholtz equation[Qin2010, Berntsson2017] and so on.

Artificial neural networks(ANN) methods for approximating physical models described by PDE systems, as well as other kinds of non-linear problems, have attracted significant interest. Lagaris et al. [Lagaris1998, Lagaris2000] used this idea early in 1998 for low-dimensional solutions. Then the idea was extended in several follow-up works on various direct problems, including high order differential equations[Malek2006]

[Aarts2001]. The approaches of these methods essentially rely on the universal approximation property of ANN, which was proved in the pioneering work [Cybenko1989] for one single hidden layer, and then extended and refined in [Kurt1989, Kurt1991].

It is well known that deeper networks approximate things better in the field of deep learning. In this sense, deep neural network(DNN) is also popular to solve PDE, especially for high-dimensional PDE problems like Hamilton-Jacobi-Bellman equation

[Justin2018, Giuseppe2017]. Recently, physics informed neural network models [Raissi2019] are developed to solve PDE by demonstrated approach on the nature and arrangement of the available data, which are effective for various problems, including fractional ADEs [Pang2019], stochastic problems [Zhang2019] and so on. A method to solve unknown governing equations with DNN is proposed in [Xiu2019]. Long et.al [Dong2019] propose a new deep neural network, named PDE-Net 2.0, to discover (time-dependent) PDEs.

There are some other interesting topics combining ANN to enhance the performance of traditional methods. Mishra[Mishra2018] combined existing finite differential method with ANN and White et al.[White2019] used neural network surrogate in topology optimization. Li et al. [Li2018] recast the training in deep learning as a control problem which is allowed to formulate necessary optimality conditions in continuous time using the Pontryagin’s maximum principle (PMP). Moreover, Yan and Zhou[Yan2019] propose an adaptive procedure to construct a multi-fidelity polynomial chaos surrogate model in inverse problems.

In this paper, we propose a novel numerical method for solving Cauchy inverse problems using artificial neural network. As the spatial dimension grows, the computational cost of ANN method grows not so quickly as the traditional numerical methods. Within the proposed approach, we use a neural network instead of a linear combination of Lagrangian basis functions to represent the solution of PDEs, and impose the PDE constraint and boundary conditions via a collocation type method.

The rest of this paper is organized as follows. In Section 2, we describe the neural network model for solving PDEs with linear operators and some initial and boundary conditions. In Section 3, the convergence theorems are discussed in details. We prove the denseness and m-denseness of a network with which ensure the approximation capabilities of multi-hidden layer networks. Then we prove a theorem about convergence of ANN to approach the Cauchy inverse problems. Numerical examples are presented in Section 4. We use the physical model’s information(operator or initial and boundary datas with noise) rather than any other exact or experiment solutions to train the neural networks. At last some conclusions are given in Section 5.

## 2. Artificial neural network(ANN) method for Cauchy inverse problem

Let us consider deep, fully connected feedforward ANNs to solve the Cauchy inverse problem. Given a network consisting of hidden layers. For convenience, the input and output layer are denoted as layer and layer

, respectively. There are some nonlinear functions being used in the hidden layers, says activation functions

. The network defined above can mathematically be regarded as a mapping . Fig. 1 shows the structure of such a network.

As it can be seen, in layer , let denote the weights and bias and be the activation functions. With above definitions, , the inputs of layer can be represented as

 zl+1 = wl+1yl+bl+1, yl+1 = σl+1(zl+1)

We use the notation inputs . For the simplicity, the notation of outputs is defined as

 yL+1:=NET(x;w,b),

which is used to indicate that network takes as input and parametrized by the weights and biases , .

### 2.1. Network model for time-independent problem (1.1)

The main idea of this method is to find a solution for Problem (1.1) in the form of network output . Defining the cost function

 (2.1) J(¯u)=∥L¯u∥2L2(Ω)+∥¯u−f∥2L2(Γ)+∥∂¯u∂n−g∥2L2(Γ),

then ANN approach for problem (1.1) can be written as

 (2.2) minw,bJ(¯u)s.t. ¯u=NET(x;w,b).

The equivalence of problem (1.1) and (2.2) will soon be shown in the next section. Here, let us first introduce the back propagation algorithm(gradient based method) to solve problem (2.2). Denote as random sampling in space , among which there are sampling points belonging to (Dirichlet boundary), (Neumann boundary), respectively and it is required that . For the purpose of verifying the stability of the approximation, certain statistical noise is added manually to the label data , such that

 ∥fδ−f∥Γ≤δ,  ∥gδ−g∥Γ≤δ,

where represents the level of statistical noise. For the ease of representation, the cost function (2.1) is written in discrete form as

 (2.3) J(¯u)=Jo(¯u)+Jd(¯u)+Jn(¯u)=No∑i=1(L¯ui)2+Nd∑i=1(¯ui−fδi)2+Nn∑i=1(∂¯ui∂n−gδi)2,

where and . To this point, the back propagation can be formulated as

 (2.4) ∂J(¯u)∂w=∂Jo(¯u)∂w+∂Jd(¯u)∂w+∂Jn(¯u)∂w=2(No∑i=1L¯ui∂L¯ui∂w+Nd∑i=1(¯ui−fδi)∂¯ui∂w+Nn∑i=1(∂¯ui∂n−gδi)∂2¯ui∂n∂w)

Similarly,

 (2.5) ∂J(¯u)∂b=∂Jo(¯u)∂b+∂Jd(¯u)∂b+∂Jn(¯u)∂b=2(No∑i=1L¯ui∂L¯ui∂b+Nd∑i=1(¯ui−fδi)∂¯ui∂b+Nn∑i=1(∂¯ui∂n−gδi)∂2¯ui∂n∂b)

We supply the details to compute the and their corresponding back propagation in B.

To summarize, the structure of ANN method to solve time independent problem is shown in the following figure 2

### 2.2. Network model for time-dependent problem

The main idea of this method is to find a solution for Problem (1.2) in the form of network output . Defining the cost function

 (2.6) J(¯u)=∥L∂¯u∂t−¯u∥2L2(Ω×T)+∥¯u−f∥2L2(Γ×T)+∥∂¯u∂n−g∥2L2(Γ×T)+∥¯u−h∥2L2(Ω),

then ANN approach for problem (1.2) can be written as

 (2.7) minw,bJ(¯u)¯u=NET(x,t;w,b).

The equivalence of problem (1.2) and (2.7) will soon be shown in the next section. Here, let us first introduce back propagation algorithm to solve problem (2.7). Denote as random sampling in space , in which there are sampling points belonging to (Dirichlet boundary), (Neumann boundary), , respectively, and it is required that . For the purpose of verifying the stability of the approximation, certain statistical noise is added manually to the label data , such that

 ∥fδ−f∥Γ≤δ,  ∥gδ−g∥Γ≤δ,  ∥hδ−h∥Ω≤δ.

where represents the level of statistical noise. For the ease of representation, the cost function (2.6) is written in discrete form as

 (2.8) J(u)=Jo(u)+Jd(u)+Jn(u)+Jt(u)=No∑i=1(∂¯ui∂t+L¯ui)2+Nd∑i=1(¯ui−fδi)2+Nn∑i=1(∂¯ui∂n−gδi)2+Nt∑i=1(¯ui−hδi)2,

where and . To this point, the back propagation can be formulated as

 (2.9) ∂J(¯u)∂w=∂Jo(¯u)∂w+∂Jd(¯u)∂w+∂Jn(¯u)∂w+∂Jt(¯u)∂w=2(No∑i=1(∂¯ui∂t+L¯ui)(∂2¯ui∂t∂w+∂L¯ui∂w)+Nn∑i=1(∂¯ui∂n−gδi)∂2¯ui∂n∂w+Nd∑i=1(¯ui−fδi)∂¯ui∂w+Nt∑i=1(¯ui−hδi)∂¯ui∂w)

Similarly,

 (2.10) ∂J(¯u)∂b=∂Jo(¯u)∂b+∂Jd(¯u)∂b+∂Jn(¯u)∂b+∂Jt(¯u)∂b,=2(No∑i=1(∂¯ui∂t+L¯ui)(∂2¯ui∂t∂b+∂L¯ui∂b)+Nn∑i=1(∂¯ui∂n−gδi)∂2¯ui∂n∂b+Nd∑i=1(¯ui−fδi)∂¯ui∂b+Nt∑i=1(¯ui−hδi)∂¯ui∂b)

We supply the details to compute the and their corresponding back propagation in B. To summarize, the structure of ANN method to solve time-independent problem is shown in the following figure 3

### 2.3. Training algorithm for networks

The original algorithm for neural networks is gradient decent(GD) method. In this approximation it can be formulated as:

 wn+1 = bn+1 =

where is the iterations and is time step. It is well know that ADAM algorithm is a stable and fast stochastic algorithm in the field of optimization. In this sense, ADAM algorithm is used in this research, and we would like to remark here that GD method can not reach a satisfying result in our numerical experiments. The main formulas for weights are shown as following, and formula for bias is similar to it.

 (2.11) ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩wn+1=wn−Δtvn+1w√sn+1w+ϵ∂J(¯u)∂wvn+1w=(β1vnw+(1−β1)∂J(¯u)∂wn)/(1−βn1)sn+1w=(β2snw+(1−β2)(∂J(¯u)∂wn)2)/(1−βn2)

where is the matrix of parameters. and are constant closed to and is a small constant. To summarize, the ANN algorithm to solve the Cauchy problem is constructed as follows:

## 3. Convergence of the neural network approximation

In this section, we discuss some conclusions on equivalence between PDE problem (1.2) and optimization problem (2.7). To fulfill this, the definitions of dense and m-dense networks following [Kurt1991] are necessary.

###### Definition 1 (denseness).

A network is dense, if it satisfies

 (3.1) ∥Net(x;w,b)−f(x)∥≤ϵ,  ∀f∈C(¯Ω),
###### Definition 2 (m-denseness).

A network is m-dense, if it satisfies

 (3.2) max|α|≤m∥∇αNet(x;w,b)−∇αf(x)∥≤ϵ.

The proof is carried out in two steps. In the first step(Section 3.1 and Section 3.2), we show that networks with hidden layers are dense and m-dense in design domain . In the second step(Section 3.3), equivalence between PDE problem (1.2) and optimization problem (2.7) are given. It is worth to mention that all the proofs in Section 3.1 and Section 3.2 only depend on properties of networks. Suppose networks with hidden layers can be regarded as a mapping like

 And(σ)={ξ(x;t):Rd+1→R|ξ(x;t)=n∑i=1ξiσ(wLizL(x;t)+bLi)},

where

is the sigmoid function

is the spatial dimensions and is the number of units in the layer. It is straightforward to define for briefness.

### 3.1. The denseness of Ad(σ)

Consider a bounded set in with boundary . Kurt Hornik has proved the denseness and m-denseness of single hidden layer networks in [Kurt1991], and it will be extend ed to multi hidden layers’ type in theorem 1. Let us consider an important lemma at first.

###### Lemma 1.

Define hidden layer neural network function as

 zl(x;t)=wlσ(zl−1(x;t))+bl,  zl∈Rnl

for all there exists such that

 (3.3) ∥Alσ(zl(x;t))−x∥¯Ω:=dsupi=1|aliσ(zl(x;t))−xi|<ϵ

Let us use the method of induction to verify this lemma.

I. Verify that equation (3.3) holds when .

Following theorem 2 in [Kurt1991], it is clear that for any and , there exists such that

 ∥A1σ(z1(x;t))−x∥¯Ω=|a1iσ(z1(x;t))−xi|<ϵ,

which verify equation (3.3).

II. Assume equation (3.3) is true for to verify that it also holds when .

Fix , since sigmoid function satisfies the Lipschitz continuity, it yields that

 supi|ak+1iσ(zk+1(x;t))−xi|=supi|ak+1iσ(wk+1σ(zk(x;t))+bk)−xi|=supi|∑jak+1ijσ(wk+1jσ(zk(x;t))+bkj)−xi|≤supi|∑jak+1ij(σ(wk+1jσ(zk(x;t))+bkj)−xi)|+supi|∑jak+1ijxi−xi|≤supi|∑jak+1ijϵ|+supi|∑jak+1ijxi−xi|≤ϵ,  by choosing∑jak+1ij=1,

which completes the proof of lemma 1. ∎

With the above lemma, we can extend theorem in [Kurt1991] into multi-hidden layers neural networks, sees in the following theorem:

###### Theorem 1.

For sigmoid function , network is dense in .

According to theorem 1 in [Kurt1991], it follows that

 (3.4) ∥A1σ(w1x+b1)−f(x)∥≤ϵ,  ∀f∈C(¯Ω×T).

It is obviously that

Hence the statements in theorem 1 are proved. ∎

### 3.2. The m-denseness of Ad(σ)

Let us consider an important lemma at first.

###### Lemma 2.

Define hidden layer neural network function as

 zl(x;t)=wlσ(zl−1(x;t))+bl,  zl∈Rnl

then for all there exists such that

 (3.6) max|α|≤msupx∈¯Ω|∇αAlσ(zl(x;t))−∇αA1σ(w1x+b1)|<ϵ.

Let us use the method of induction to verify this lemma.

I. Verify that equation (3.6) holds when .

it is clear that for any and , there establish

 (3.7) max|α|≤msupx∈¯Ω|∇αAlσ(zl(x;t))−∇αA1σ(w1x+b1)|=0<ϵ,

which verify equation (3.6).

II. Assume equation (3.6) is true for to verify that it also holds when .

Fix , since sigmoid function satisfies the Lipschitz continuity and is bounded, it yields that

 max|α|≤msupx∈¯Ω|∇αAk+1σ(zk+1(x;t))−∇αA1σ(w1x+b1)|=max|α|≤msupx∈¯Ωsupi|∇αak+1iσ(zl(x;t))−∇αa1iσ(w1x+b1)|=max|α|≤msupx∈¯Ωsupi|∇α(∑jak+1ijσ(wk+1jσ(zk(x;t))+bkj)−∑ja1ijσ(w1jx+b1j))|≤max|α|≤msupx∈¯Ωsupi∇α(|∑jak+1ij(σ(wk+1jσ(zk(x;t))+bkj)−σ(w1jx+b1j))|+|∑j(ak+1ij−a1ij)σ(w1jx+b1j))|)≤Lϵ  by choosing∑jak+1ij=∑ja1ij=1,

which completes the proof of lemma 2. ∎

With the above lemma, we can extend the theorem 3 in [Kurt1991] into multi-hidden layers neural networks, which yields following theorem:

###### Theorem 2.

For sigmoid function , we have is uniformly m-dense on .

According to theorem 3 in [Kurt1991], it follows that

 (3.8) max|α|≤m∥∇αA1σ(w1x+b1)−∇αf(x)∥≤ϵ,∀f∈Cm(¯Ω).

It is obviously that

 (3.9) max|α|≤m∥∇αAL+1σ(zL+1(x;t))−∇αf(x)∥≤max|α|≤m(∥∇αAL+1σ(zL+1(x;t))−∇αA1σ(w1x+b1))∥+∥∇αA1σ(w1x+b1)−∇αf(x)∥)≤(L+1)ϵ,∀f∈Cm(¯Ω×T).

Hence the statements in theorem 2 are proved. ∎

### 3.3. Equivalence between PDE problem (1.2) and optimization problem (2.7)

Let us assume initially that problem (1.2) owns the following conditions

###### Condition 1.
• There exist a unique solution in Problem (1.2), moreover .

• is Lipschitz continuous on .

• and its first derivative bounded in .

• .

It follows two important theorems according to these conditions :

###### Theorem 3.

For all , there exists a series of neural network approach such that , where and is in the case of equation (2.6).

Let us define as a neural network approach. Assume that operator and sigmoid function is non-constant and bounded in . It is clear that is a compact subset in . According to theorem 2, there follows that

 (3.10) max|α≤m|supx∈¯Ω|∇αu(x;t)−∇αψ|<ϵ4,  for all u∈Cm(Rd×T),

which yields

 (3.11) supx∈Γ,t∈T|u−ψ|+supx∈Γ,t∈T∣∣∣∂u∂n−∂ψ∂n∣∣∣+supx∈Ω,t∈T∣∣∣∂u∂t+Lu−(∂ψ∂t+Lψ)∣∣∣+supx∈Ω,t=0|u−ψ|<4m∑α=0supx∈¯Ω|∇αu(x;t)−∇αψ|<ϵ

Let be a solution of Problem (1.2) and using the conclusion of equation (3.11) and Hlder inequality, there establishes

 (3.12)

which completes the proof of theorem 3. ∎

###### Theorem 4.

The series of neural network approach converges to the solution of (1.2), as .

Since is a solution of (2.7), it is clear that for all . In particular, there establishes . When , it involves that

 (∂∂t+L)ψn=gn in Ω×T, ψn−f=gnf on Γ×T, ∂ψn∂n−g=gng on Γ×T ψn(x;0)−h=gnh in Ω,

for some such that

 (3.13) ∥gn∥2L2(Ω×T)+∥gnf∥2L2(Γ×T)+∥gng∥2L2(Γ×T)+∥gnh∥2L2(Ω×(t=0))→0,  as  n→∞

Assume condition 1 establishes and . It is clear that is uniformly bounded with respect to in , which imply that there exist a subsequence, denoted by , converging to some in the weak-* sense in .

Next following condition 1 and theorem 7.3 in [Justin2018], it is obviously that there exists a constant such that

 ∫¯Ω×T(∂∂tL)^ψndxdt

which lead that converges almost everywhere to in Then it can be proved that is the solution of problem (1.2) when , which completes the proof of theorem 4. ∎

To this end, we have proved that problem (1.2) is equivalent to problem (2.7)(as well as problem (1.1) and (2.2)) if condition 1 holds. In addition, the neural network solution is convergence to the exact solution.

## 4. Numerical examples

In this section, we present extensive numerical results to demonstrate the ANN method for Cauchy problem. Firstly examples of low and high dimensional problems are displayed to verify the accuracy of this method both on time- dependent and independent cases.

### 4.1. Numerical validation of time-dependent case

Let operator in problem (1.1) be (Laplace operator). The structure of ANN is chosen with layers , where is the dimension of input data. By setting initial and randomly in and for parameters in ADAM algorithm, example of 2D case time-dependent problem is illustrated to

###### Example 1 (parabolic case).

The equation of time dependent problem is given as

 (4.1) ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩∂u(x;t)∂t+Δu(x;t)=0x,t  $in$  Ω,Tu(x;t)=ex1sin(x2)cos(t)x,t  \$on<