1. Introduction
The approximation to Cauchy inverse problem is an important objective in the last few decades. Let us consider the following two classical cases: (I) for time independent problem, and (II) for time dependent problem: Let be a domain with continuous boundary , where is the spatial dimension. It is worth to mention that is part but not all of , and the aim of Cauchy inverse problem is to recover solution on the rest of boundary , with proper initial and boundary conditions .
(1.1) 
and
(1.2) 
where is the outer unit normal with respect to , is a linear operator and represents time.
There are many works concerning the implementation and analysis for their numerical methods. For timeindependent case (1.1
), Carlemantype estimates for the discrete scheme are used for Laplace’s case in
[Santosa1991]. After that, many authors propose various algorithms for Cauchy inverse problem for Laplace’s equation, such as conjugate gradient method[Lesnic2000], BackusGilbert algorithm[Hon_2001], regularization methods[Reinhardt1999], and some methods from linear algebra[Nachaoui2002, Nachaoui2004]. Meanwhile some convergence and stability analysis are constructed. Chakib et al.[Chakib2001] proved the existence of a solution to Cauchy problem, and it is first proved that the desired solution is the unique fixed point of some appropriate operator in [Cimeti_re_2001]. For timedependent case (1.2), Besela et al. [Besala1966] proved the uniqueness of solutions with direct case in 1966. After that there are many authors considering various types of stable numerical algorithms in different fields such as heat equation[Cannon1967, Elden1987], Helmholtz equation[Berntsson2014] and timespace fractional diffusion equation[Sakamoto2011, SLi2018].The key point of numerical methods for Cauchy inverse problem is the ways to treat the illposedness. It is well known that there exists at most one solution to the above two Cauchy problem. However, they are typically illposed, which means that a small change of the initial data may induce large changes of the solutions. It is referred to [Isakov1998] and the reference there in for more details on this issue. Regularization method is an effective and general technique to deal with the illposedness of inverse problem [Ito2014]. During the last few decades, different regularization methods have been proposed to solve various PDE inverse problem, such as Tikhonov regularization method [Jin_2008, Tomoya2008], boundary element method [Cheng_2014, Marin2003], variational method[Jin2010] and dynamical regularization algorithm [Zhang_2018], etc. As for Cauchy problem, some numerical analysis and experiments on the regularization for different equations are proposed, such as Laplace’s equation[Bourgeois_2005, Wei2013], elliptic equation[Feng_2013], Helmholtz equation[Qin2010, Berntsson2017] and so on.
Artificial neural networks(ANN) methods for approximating physical models described by PDE systems, as well as other kinds of nonlinear problems, have attracted significant interest. Lagaris et al. [Lagaris1998, Lagaris2000] used this idea early in 1998 for lowdimensional solutions. Then the idea was extended in several followup works on various direct problems, including high order differential equations[Malek2006]
and partial differential equations
[Aarts2001]. The approaches of these methods essentially rely on the universal approximation property of ANN, which was proved in the pioneering work [Cybenko1989] for one single hidden layer, and then extended and refined in [Kurt1989, Kurt1991].It is well known that deeper networks approximate things better in the field of deep learning. In this sense, deep neural network(DNN) is also popular to solve PDE, especially for highdimensional PDE problems like HamiltonJacobiBellman equation
[Justin2018, Giuseppe2017]. Recently, physics informed neural network models [Raissi2019] are developed to solve PDE by demonstrated approach on the nature and arrangement of the available data, which are effective for various problems, including fractional ADEs [Pang2019], stochastic problems [Zhang2019] and so on. A method to solve unknown governing equations with DNN is proposed in [Xiu2019]. Long et.al [Dong2019] propose a new deep neural network, named PDENet 2.0, to discover (timedependent) PDEs.There are some other interesting topics combining ANN to enhance the performance of traditional methods. Mishra[Mishra2018] combined existing finite differential method with ANN and White et al.[White2019] used neural network surrogate in topology optimization. Li et al. [Li2018] recast the training in deep learning as a control problem which is allowed to formulate necessary optimality conditions in continuous time using the Pontryagin’s maximum principle (PMP). Moreover, Yan and Zhou[Yan2019] propose an adaptive procedure to construct a multifidelity polynomial chaos surrogate model in inverse problems.
In this paper, we propose a novel numerical method for solving Cauchy inverse problems using artificial neural network. As the spatial dimension grows, the computational cost of ANN method grows not so quickly as the traditional numerical methods. Within the proposed approach, we use a neural network instead of a linear combination of Lagrangian basis functions to represent the solution of PDEs, and impose the PDE constraint and boundary conditions via a collocation type method.
The rest of this paper is organized as follows. In Section 2, we describe the neural network model for solving PDEs with linear operators and some initial and boundary conditions. In Section 3, the convergence theorems are discussed in details. We prove the denseness and mdenseness of a network with which ensure the approximation capabilities of multihidden layer networks. Then we prove a theorem about convergence of ANN to approach the Cauchy inverse problems. Numerical examples are presented in Section 4. We use the physical model’s information(operator or initial and boundary datas with noise) rather than any other exact or experiment solutions to train the neural networks. At last some conclusions are given in Section 5.
2. Artificial neural network(ANN) method for Cauchy inverse problem
Let us consider deep, fully connected feedforward ANNs to solve the Cauchy inverse problem. Given a network consisting of hidden layers. For convenience, the input and output layer are denoted as layer and layer
, respectively. There are some nonlinear functions being used in the hidden layers, says activation functions
. The network defined above can mathematically be regarded as a mapping . Fig. 1 shows the structure of such a network.As it can be seen, in layer , let denote the weights and bias and be the activation functions. With above definitions, , the inputs of layer can be represented as
We use the notation inputs . For the simplicity, the notation of outputs is defined as
which is used to indicate that network takes as input and parametrized by the weights and biases , .
2.1. Network model for timeindependent problem (1.1)
The main idea of this method is to find a solution for Problem (1.1) in the form of network output . Defining the cost function
(2.1) 
then ANN approach for problem (1.1) can be written as
(2.2) 
The equivalence of problem (1.1) and (2.2) will soon be shown in the next section. Here, let us first introduce the back propagation algorithm(gradient based method) to solve problem (2.2). Denote as random sampling in space , among which there are sampling points belonging to (Dirichlet boundary), (Neumann boundary), respectively and it is required that . For the purpose of verifying the stability of the approximation, certain statistical noise is added manually to the label data , such that
where represents the level of statistical noise. For the ease of representation, the cost function (2.1) is written in discrete form as
(2.3) 
where and . To this point, the back propagation can be formulated as
(2.4) 
Similarly,
(2.5) 
We supply the details to compute the and their corresponding back propagation in B.
To summarize, the structure of ANN method to solve time independent problem is shown in the following figure 2
2.2. Network model for timedependent problem
The main idea of this method is to find a solution for Problem (1.2) in the form of network output . Defining the cost function
(2.6) 
then ANN approach for problem (1.2) can be written as
(2.7) 
The equivalence of problem (1.2) and (2.7) will soon be shown in the next section. Here, let us first introduce back propagation algorithm to solve problem (2.7). Denote as random sampling in space , in which there are sampling points belonging to (Dirichlet boundary), (Neumann boundary), , respectively, and it is required that . For the purpose of verifying the stability of the approximation, certain statistical noise is added manually to the label data , such that
where represents the level of statistical noise. For the ease of representation, the cost function (2.6) is written in discrete form as
(2.8) 
where and . To this point, the back propagation can be formulated as
(2.9) 
Similarly,
(2.10) 
2.3. Training algorithm for networks
The original algorithm for neural networks is gradient decent(GD) method. In this approximation it can be formulated as:
where is the iterations and is time step. It is well know that ADAM algorithm is a stable and fast stochastic algorithm in the field of optimization. In this sense, ADAM algorithm is used in this research, and we would like to remark here that GD method can not reach a satisfying result in our numerical experiments. The main formulas for weights are shown as following, and formula for bias is similar to it.
(2.11) 
where is the matrix of parameters. and are constant closed to and is a small constant. To summarize, the ANN algorithm to solve the Cauchy problem is constructed as follows:
3. Convergence of the neural network approximation
In this section, we discuss some conclusions on equivalence between PDE problem (1.2) and optimization problem (2.7). To fulfill this, the definitions of dense and mdense networks following [Kurt1991] are necessary.
Definition 1 (denseness).
A network is dense, if it satisfies
(3.1) 
Definition 2 (mdenseness).
A network is mdense, if it satisfies
(3.2) 
The proof is carried out in two steps. In the first step(Section 3.1 and Section 3.2), we show that networks with hidden layers are dense and mdense in design domain . In the second step(Section 3.3), equivalence between PDE problem (1.2) and optimization problem (2.7) are given. It is worth to mention that all the proofs in Section 3.1 and Section 3.2 only depend on properties of networks. Suppose networks with hidden layers can be regarded as a mapping like
where
is the sigmoid function
is the spatial dimensions and is the number of units in the layer. It is straightforward to define for briefness.3.1. The denseness of
Consider a bounded set in with boundary . Kurt Hornik has proved the denseness and mdenseness of single hidden layer networks in [Kurt1991], and it will be extend ed to multi hidden layers’ type in theorem 1. Let us consider an important lemma at first.
Lemma 1.
Define hidden layer neural network function as
for all there exists such that
(3.3) 
Let us use the method of induction to verify this lemma.
I. Verify that equation (3.3) holds when .
Following theorem 2 in [Kurt1991], it is clear that for any and , there exists such that
which verify equation (3.3).
II. Assume equation (3.3) is true for to verify that it also holds when .
Fix , since sigmoid function satisfies the Lipschitz continuity, it yields that
which completes the proof of lemma 1. ∎
With the above lemma, we can extend theorem in [Kurt1991] into multihidden layers neural networks, sees in the following theorem:
Theorem 1.
For sigmoid function , network is dense in .
According to theorem 1 in [Kurt1991], it follows that
(3.4) 
It is obviously that
(3.5) 
Hence the statements in theorem 1 are proved. ∎
3.2. The mdenseness of
Let us consider an important lemma at first.
Lemma 2.
Define hidden layer neural network function as
then for all there exists such that
(3.6) 
Let us use the method of induction to verify this lemma.
I. Verify that equation (3.6) holds when .
II. Assume equation (3.6) is true for to verify that it also holds when .
Fix , since sigmoid function satisfies the Lipschitz continuity and is bounded, it yields that
which completes the proof of lemma 2. ∎
With the above lemma, we can extend the theorem 3 in [Kurt1991] into multihidden layers neural networks, which yields following theorem:
Theorem 2.
For sigmoid function , we have is uniformly mdense on .
According to theorem 3 in [Kurt1991], it follows that
(3.8) 
It is obviously that
(3.9) 
Hence the statements in theorem 2 are proved. ∎
3.3. Equivalence between PDE problem (1.2) and optimization problem (2.7)
Let us assume initially that problem (1.2) owns the following conditions
Condition 1.

There exist a unique solution in Problem (1.2), moreover .

is Lipschitz continuous on .

and its first derivative bounded in .

.
It follows two important theorems according to these conditions :
Theorem 3.
For all , there exists a series of neural network approach such that , where and is in the case of equation (2.6).
Let us define as a neural network approach. Assume that operator and sigmoid function is nonconstant and bounded in . It is clear that is a compact subset in . According to theorem 2, there follows that
(3.10) 
which yields
(3.11) 
Let be a solution of Problem (1.2) and using the conclusion of equation (3.11) and Hlder inequality, there establishes
(3.12) 
which completes the proof of theorem 3. ∎
Theorem 4.
The series of neural network approach converges to the solution of (1.2), as .
Since is a solution of (2.7), it is clear that for all . In particular, there establishes . When , it involves that
for some such that
(3.13) 
Assume condition 1 establishes and . It is clear that is uniformly bounded with respect to in , which imply that there exist a subsequence, denoted by , converging to some in the weak* sense in .
4. Numerical examples
In this section, we present extensive numerical results to demonstrate the ANN method for Cauchy problem. Firstly examples of low and high dimensional problems are displayed to verify the accuracy of this method both on time dependent and independent cases.
4.1. Numerical validation of timedependent case
Let operator in problem (1.1) be (Laplace operator). The structure of ANN is chosen with layers , where is the dimension of input data. By setting initial and randomly in and for parameters in ADAM algorithm, example of 2D case timedependent problem is illustrated to
Example 1 (parabolic case).
The equation of time dependent problem is given as
(4.1) 