 # A numerical approach to Kolmogorov equation in high dimension based on Gaussian analysis

For Kolmogorov equations associated to finite dimensional stochastic differential equations (SDEs) in high dimension, a numerical method alternative to Monte Carlo simulations is proposed. The structure of the SDE is inspired by stochastic Partial Differential Equations (SPDE) and thus contains an underlying Gaussian process which is the key of the algorithm. A series development of the solution in terms of iterated integrals of the Gaussian process is given, it is proved to converge - also in the infinite dimensional limit - and it is numerically tested in a number of examples.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Kolmogorov equations are parabolic equations with a structure directly related to stochastic differential equations (SDEs). The SDEs considered here are in a finite dimensional space but they are inspired by the spatial discretization of stochastic Partial Differential Equations (SPDE). When the noise is additive and the nonlinearity is time-independent, a general form of such SDEs is

 {dXt=(AXt+B(Xt))dt+σ√QdWt,X0=x, (1.1)

where , is a Brownian motion in (namely where the

’s are independent real valued Brownian motions), defined on a probability space

with a filtration , is a positive real number measuring the strength of the noise, is a positive definite symmetric matrix (the so called covariance matrix of the noise) describing the spatial structure of the noise and is its square root, is a matrix and is a function with the degree of regularity specified below. Obviously we could include the scalar inside the matrix but for certain practical arguments it is useful to distinguish between them. The solution is a continuous adapted process in . The associated Kolmogorov equation is

 (1.2)

where , and

denote respectively the vector of first partial derivatives and the matrix of second partial derivatives,

is the trace of the matrix and denotes the scalar product in . Both for the SDE and the Kolmogorov equation we have used notations which may be adapted to the infinite dimensional case, when is replaced by a Hilbert space (see Section 2 for the general theory); however, the aim of this work is numerical and all objects in the introduction will belong to . The link between the Kolmogorov equation and the SDE is

 u(t,x)=E[u0(Xxt)],

where  denotes the mathematical expectation on and is the solution of the SDE above, where the initial condition is explicitly indicated. Several elements of theory both in finite and infinite dimensions for SDEs and associated Kolmogorov equations can be found in many books, like [6, 8, 9, 16, 17].

Solving the Kolmogorov equation with suitable initial condition is a way to compute relevant expected values and probabilities associated to the solution of an SDE. For instance, when , is the probability that the solution exceeds a threshold :

 u(t,x)=E[1{∥x∥>R}(Xxt)]=P(∥∥Xxt∥∥>R).

The classical method of computing these expected values is the Monte Carlo method (with important variants, see for instance [13, 19]): several realizations of the process are simulated by solving the SDE – typically by Euler method – and then the corresponding values of are averaged. Going beyond this strategy is a fundamental issue, due to its limitations in relevant applications like Geophysics and Climate change projections , especially concerning extreme events. The question is whether Kolmogorov equation can be efficiently solved numerically without using the simulation of the SDE. But the problem is that the dimension is extremely high in these examples and common numerical methods for solution of parabolic equations already require strong computational power when , [5, 18]. A grid of points in , repeated for all dimensions, give rise to grid points, numerically impossible when, for instance, (which still would be an extremely poor approximation). Spectral methods seem to meet the same restrictions:

is the cardinality of basis elements obtained by tensorization of

basis elements for each space variable.

The problem of dimensionality, the limitations of present methodologies and several motivations are recalled in two recent works [2, 14]

which also aim to go beyond Monte Carlo and propose a method based on deep artificial neural networks. We address to these brilliant works for other comments on the problem, see also

[7, Introduction]. The approach developed here is however completely different.

Our aim is to take advantage of the probabilistic structure of the problem to devise numerical schemes for the Kolmogorov equation, in particular using Gaussian analysis. We implement a perturbative scheme which links the solution of Kolmogorov equation to a Gaussian process, the solution of the linear stochastic equation

 {dZt=AZtdt+√QdWt,Z0=0.

The idea comes from the theoretical investigations of infinite dimensional Kolmogorov equations associated to SPDEs, see for instance [9, 10]. We modify and adapt that idea giving an explicit formula in terms of a series of Gaussian integrals. We provide here a first glance at the strategy by writing the final formula:

 u(t,x)=∞∑n=0vn(t,x),

where

 v0(t,x)=E[u0(etAx+σZt)]

and for

 vn(t,x)= ∫t0drn∫rn0drn−1⋯∫r20dr1 E[u0(etAx+σZt)n∏i=1⟨Ξσ(ri+1−ri)B(eriAx+σZri),Zri+1−e(ri+1−ri)AZri⟩].

The matrix will be defined in the next sections, see (2.3); it is easily computed by and , and it depends on the parameters and . A theoretical analysis of this series is made, proving the following result.

###### Theorem 1.1.

Assume that and are bounded. Then, under suitable conditions on and (see Hypothesis 2.1

for details), we have the following uniform estimate:

 ∥vn(t)∥∞≤∥u0∥∞∥B∥n∞Cnδtn(1−δ)Γ(1−δ)nΓ(1+n(1−δ)),t>0,

where is the Gamma function, is a constant and the parameter in (iv) of Hypothesis 2.1.

This theorem sustains the numerical method and stresses the independence on the dimension of certain issues of the method (obviously others, like getting a sample of , have a cost which increases with ). When is replaced by a Hilbert space (and below we shall formulate the theorem with assumptions in a Hilbert space) it contains also some theoretical novelties with respect to the literature, especially because it provides an explicit formula.

The numerical evaluation of the terms is made here, in this paper, by Monte Carlo method based on a sample of the process obtained by solving the linear SDE by Euler method. These are the most obvious choices, but other possibilities exist, since is a centered Gaussian process with known covariance function. A main strategy invoked here is to store once for ever a large and accurate sample of (this requires the pair to be given) and use it later in the formula for different values of the other parameters, and even .

This new method is aimed to replace direct Monte Carlo simulations. We should therefore accurately compare them. If the purpose is to make one single computation, classical Monte Carlo wins: the Gaussian method above still requires Monte Carlo simulations of the linear problem, which is less expensive than the nonlinear one but then one has to compute possibly several terms ; some experiments clearly show that classical Monte Carlo is less expensive for a comparable degree of precision. The advantage comes when we want to vary parameters, since the Gaussian method for given allows to store a possibly expensive sample of the process and reuse it for several values of the parameters, just having to compute the averages over the Gaussian sample which give us the terms . On the contrary, classical Monte Carlo method requires to repeat the simulation of the nonlinear problem for each new value of the parameters. By “parameters”, as we have already mentioned above, we mean . Let us comment on the interest in changing them.

The interest in changing is obvious. In certain applications it is necessary to change the initial condition and compare or collect the results. We have in mind for instance the ensemble methods used in weather prediction where the initial condition is uncertain, a first guess is made on the basis of physical observations, but then the initial condition is perturbed in various directions and the final results averaged by suitable methods. See also [2, 14], where the need to change is stressed.

Changing the strength of the noise is a very important issue, related also to Large Deviation Theory. We have to advise that the precision of our simulations degenerates as , or the number of iterates needed to maintain a reasonable precision blows-up, but at least one can detect some tendency by moving in a finite range without arriving to too small values.

Concerning the change of function , unfortunately the main comment is in favor of Monte Carlo: having at disposal a sample of the process immediately gives a way to compute for different functions . Hence the best we can say on this issue is that our formula allows for such computations with a moderate additional effort – but not with an improvement over Monte Carlo.

Finally, changing the nonlinearity is of theoretical interest for the investigation of the performances of the method, and in applications it may be of interest in those – very common – cases when some parameters of are not precisely known and different simulations may be useful for comparison or for ensemble averaging methods performed over the range of those parameters.

Let us finally come to a brief description of numerical results. In Section 3, we present some numerical results based on the method proposed here in the finite dimensional settings with . The results, even if not fully satisfactory yet, should be compared with the fact that the innovative attempts to solve the Kolmogorov equation in by direct methods, see , are often restricted to dimensions smaller than . Large dimension is therefore a very difficult problem that deserves strong effort for improvement, and some of our results – although not in all examples – are quite promising.

As a final comment, let us explicitly mention that the class of Kolmogorov equations studied here is particular, because of the additive and very non-degenerate noise and because we have treated only relatively mild nonlinearities. We have not considered relevant cases from fluid mechanics which have more severe nonlinearities and activation of more scales; after a few initial tests on dyadic models – we point in particular to the recent models on trees which may be very relevant for turbulence theory, see [1, 3, 4] – it was clear that covering these examples with this approach requires further research and improvements. Extension to multiplicative transport noises [11, 12] is another challenging open question.

## 2 The iteration scheme for Kolmogorov equations on Hilbert spaces

In this section we work in an infinite dimensional separable Hilbert space and study the iteration scheme for the Kolmogorov equation:

 ∂tu(t,x)=12Tr(QD2u(t,x))+⟨Ax+B(x),Du(t,x)⟩,u(0,⋅)=u0. (2.1)

Here is an unbounded linear operator, is a nonnegative self-adjoint bounded linear operator on , is a nonlinear measurable mapping and is a real valued measurable function. In this section plays the role of to simplify notation. In the following we write for the Banach space of bounded linear operators on with the norm .

Throughout this section we assume the following conditions:

###### Hypothesis 2.1.
• is the infinitesimal generator of a strongly continuous semigroup .

• is a nonnegative self-adjoint operator in satisfying , and for any the linear operator

 Qt=∫t0esAQesA∗ds (2.2)

is of trace class.

• We have for any .

• Letting , we assume there exist and such that

 ∥Λ(t)∥L(H)≤Cδ/tδ,t>0.

The assumptions (i)–(iii) are quite standard in the literature, see for instance [8, Hypothesis 2.1 and 2.24]. The operator appeared in the introduction has the form

 Ξσ(t)=σQ−1/2tΛ(t)=σQ−1tetA; (2.3)

we remark that, in the setting of the introduction, the operator in (2.2) should be replaced by when computing . The following example is taken from [8, Example 2.5] which verifies all the assumptions.

###### Example 2.2.

Let with . We choose , and

 Ax=Δx,x∈D(A)=H2(O)∩H10(O),

where is the Laplacian operator with Dirichlet boundary condition. is a self-adjoint negative operator in , and

 Aek=−|k|2ek,k∈Nd,

where for , and

 ek(ξ)=(2/π)d/2sin(k1ξ1)⋯sin(kdξd),ξ∈[0,π]d.

Choose , so that

 Qx=∑k∈Nd|k|−2α⟨x,ek⟩ek,x∈H.

For any , if , then

 Tr(Qt)=∑k∈Nd12|k|2+2α(1−e−2t|k|2)<∞.

So (ii) is satisfied.

Next, (iii) can be checked by explicit computations. Moreover,

 Λ(t)x=∑k∈Nd√2|k|1+α√e2t|k|2−1⟨x,ek⟩ek,x∈H.

From this we deduce that

 ∥Λ(t)∥L(H)≤√2Cαt(1+α)/2,

where

 Cα=supθ>0θ1+αe2θ−1<+∞.

Thus (iv) holds with .

We also need the following technical conditions.

###### Hypothesis 2.3.

The initial datum and the nonlinear part in (2.1) are bounded and measurable.

This section is organized as follows. In Subsection 2.1, we recall some basic facts in Gaussian analysis on Hilbert space and give the formula for the first term of the iteration (2.8). We give in Section 2.2 the details for calculating the second term , which will help us to guess and prove the formula for general terms in Section 2.3. In the last part, we estimate the uniform norm of and show the convergence of the iteration scheme. The limit is the unique mild solution of (2.1), see Theorem 2.15.

### 2.1 Some preparations

Let be a cylindrical Brownian motion on :

 Wt=∞∑k=1Wktek,t≥0,

where is a complete orthonormal basis of and is a family of independent one dimensional standard Brownian motions defined on some probability space . Under the conditions (i) and (ii) in Hypothesis 2.1, the linear SDE

 dZxt=AZxtdt+√QdWt,Zx0=x∈H (2.4)

has a unique solution with the expression

 Zxt=etAx+WA(t),t>0,

where is the stochastic convolution:

 WA(t)=∫t0e(t−s)A√QdWs.

For any , is a centered Gaussian variable on with covariance operator . We denote its law by . Accordingly, the law of is denoted as . Recall that for any ,

is a centered real Gaussian variable with variance

 E⟨h,Q−1/2tWA(t)⟩2=|h|2H.

We shall write for the space of bounded measurable functions on and the space of Fréchet differentiable functions, bounded with bounded derivatives. When , its Fréchet derivative will be denoted by . For any and , let

 Stf(x):=Ef(Zxt)=∫Hf(y)NetAx,Qt(dy)=∫Hf(etAx+y)NQt(dy).

This defines a Markov semigroup on . We have the following important result which implies is strong Feller (see [8, Proposition 2.28] for a proof).

###### Proposition 2.4.

Assume the conditions (i)–(iii) in Hypothesis 2.1. Then for all and , we have and for any ,

 ⟨h,DStf(x)⟩=E[f(Zxt)⟨Λ(t)h,Q−1/2t(Zxt−etAx)⟩]. (2.5)

Moreover,

 ∥DStf∥∞≤∥f∥∞∥Λ(t)∥L(H). (2.6)

Using the semigroup , the mild formulation of the Kolmogorov equation (2.1) is

 u(t,x)=(Stu0)(x)+∫t0(St−s⟨B,Du(s)⟩)(x)ds. (2.7)

This suggests us to consider the iterative scheme:

 un+1(t,x)=(Stu0)(x)+∫t0(St−s⟨B,Dun(s)⟩)(x)ds

with . We define and

 vn(t,x)=un(t,x)−un−1(t,x),n≥1,

then the new functions satisfy the iteration procedure:

 (2.8)

Before concluding this section, we show how to obtain the first term . Since , Proposition 2.4 implies for any , and thus . Denote by the filtration generated by the cylindrical Brownian motion .

It holds that

###### Proof.

Use the property of conditional expectation:

 E[u0(Zxt)⟨Λ(s)B(Zxt−s),Q−1/2s(Zxt−esAZxt−s)⟩] = =

where the second step follows from the Markov property. Again by the Markov property,

 E[u0(Zxt)⟨Λ(s)B(Zxt−s),Q−1/2s(Zxt−esAZxt−s)⟩∣∣Zxt−s] = E[u0(Zys)⟨Λ(s)B(y),Q−1/2s(Zys−esAy)⟩]y=Zxt−s = k0s(y)∣∣y=Zxt−s=k0s(Zxt−s),

where the second step is due to (2.5). Substituting this equality into the previous one we obtain the identity. ∎

The above lemma implies

###### Corollary 2.6.

For any and ,

 (2.9)

Moreover,

 ∥v1(t)∥∞≤∥u0∥∞∥B∥∞∫t0∥Λ(s)∥L(H)ds

and

 ∥Dv1(t)∥∞≤∥u0∥∞∥B∥∞∫t0∥Λ(t−s)∥L(H)∥Λ(s)∥L(H)ds.
###### Proof.

The formula (2.9) follows directly from Lemma 2.5. Next, by the definition (2.8) of the iteration, for any and ,

 ∣∣k0s(y)∣∣≤|B(y)||Dv0(s,y)|≤∥B∥∞|DSsu0(y)|≤∥B∥∞∥u0∥∞∥Λ(s)∥L(H), (2.10)

where the last inequality follows from (2.6). Therefore,

 |v1(t,x)|≤∫t0∣∣(St−sk0s)(x)∣∣ds≤∫t0∥∥k0s∥∥∞ds≤∥u0∥∞∥B∥∞∫t0∥Λ(s)∥L(H)ds

which yields the estimate on . The inequality (2.10) implies that for all , hence by Proposition 2.4, and

 Dv1(t,x)=∫t0D(St−sk0s)(x)ds.

Finally, by (2.6),

 ∥Dv1(t)∥∞≤∫t0∥∥D(St−sk0s)∥∥∞ds≤∫t0∥∥k0s∥∥∞∥Λ(t−s)∥L(H)ds,

which, together with (2.10), gives us the last estimate. ∎

### 2.2 The term v2(t,x)

In this part, we compute the second term in the iteration to illustrate the ideas. First we prove

###### Lemma 2.7.

One has

 k1t(x) =∫t0E[u0(Zxt)⟨Λ(s)B(Zxt−s),Q−1/2s(Zxt−esAZxt−s)⟩ ×⟨Λ(t−s)B(x),Q−1/2t−s(Zxt−s−e(t−s)Ax)⟩]ds.
###### Proof.

By Corollary 2.6, for any , and

 k1t(x)

Recall that (2.10) implies , thus by Proposition 2.4,

According to the proof of Lemma 2.5, we have

 k0s(Zxt−s)=E[u0(Zxt)⟨Λ(s)B(Zxt−s),Q−1/2s(Zxt−esAZxt−s)⟩∣∣Ft−s].

Note that is -measurable. Substituting this equality into the one above and using the property of conditional expectation, we obtain the desired result. ∎

Now we are ready to present the expression and estimates for the second iteration.

###### Proposition 2.8.

For any and ,

 v2(t,x) ×⟨Λ(s−r)B(Zxt−s),Q−1/2s−r(Zxt−r−e(s−r)AZxt−s)⟩]drds.

Furthermore,

 ∥v2(t)∥∞≤∥u0∥∞∥B∥2∞∫t0∫s0∥Λ(s−r)∥L(H)∥Λ(r)∥L(H)drds

and

 ∥Dv2(t)∥∞≤∥u0∥∞∥B∥2∞∫t0∫s0∥Λ(t−s)∥L(H)∥Λ(s−r)∥L(H)∥Λ(r)∥L(H)drds.
###### Proof.

By Lemma 2.7, for any and ,

 k1s(y) =∫s0E[u0(Zys)⟨Λ(r)B(Zys−r),Q−1/2r(Zys−erAZys−r)⟩ ×⟨Λ(s−r)B(y),Q−1/2s−r(Zys−r−e(s−r)Ay)⟩]dr.

We have

 E[k1s(Zxt−s)] =E{∫s0E[u0(Zys)⟨Λ(r)B(Zys−r),Q−1/2r(Zys−erAZys−r)⟩ ×⟨Λ(s−r)B(y),Q−1/2s−r(Zys−r−e(s−r)Ay)⟩]y=Zxt−sdr} =∫s0E[u0(Zxt)⟨Λ(r)B(Zxt−r),Q−1/2r(Zxt−erAZxt−r)⟩ ×⟨Λ(s−r)B(Zxt−s),Q−1/2s−r(Zxt−r−e(s−r)AZxt−s)⟩]dr,

where the second step follows from the Markov property. Therefore,

 v2(t,x) =∫t0(St−sk1s)(x)ds=∫t0E[k1s(Zxt−s)]ds ×⟨Λ(s−r)B(Zxt−s),Q−1/2s−r(Zxt−r−e(s−r)AZxt−s)⟩]drds.

Next, by the definition of and the last inequality in Corollary 2.6,

 ∥∥k1s∥∥∞≤∥B∥∞∥Dv1(s)∥∞≤∥u0∥∞∥B∥2∞∫s0∥Λ(s−r)∥L(H)∥Λ(r)∥L(H)dr. (2.11)

This immediately implies

 |v2(t,x)| ≤∫t0∥∥k1s∥∥∞ds≤∥u0∥∞∥B∥2∞∫t0∫s0∥Λ(s−r)∥L(H)∥Λ(r)∥L(H)drds,

and we obtain the estimate on . Moreover, by Proposition 2.4,

 |Dv2(t,x)|≤∫t0∣∣D(St−sk1s)(x)∣∣ds≤∫t0∥∥k1s∥∥∞∥Λ(t−s)∥L(H)ds,

which, combined with (2.11), gives us the second estimate. ∎

### 2.3 The general terms vn(t,x)

In order to do further iteration, we rewrite the formula in Proposition 2.8 as

 v2(t,x) =∫t0ds2∫s20ds1E[u0(Zxt)⟨Λ(s1)B(Zxt−s1),Q−1/2s1(Zxt−es1AZxt−s1)⟩ ×⟨Λ(s2−s1)B(Zxt−s2),Q−1/2s2−s1(Zxt−s1−e(s2−s1)AZxt−s2)⟩].

Moreover, denoting by , then we have

 v2(t,x)= ∫t0ds2∫s20ds1 E[u0(Zxt)2∏i=1⟨Λ(si−si−1)B(Zxt−si),Q−1/2si−si−1(Zxt−si−1−e(si−si−1)AZxt−si)⟩].

From this we can guess the general formulae.

###### Theorem 2.9.

Let . For any ,

 vn(t,x)= ∫t0dsn∫sn0dsn−1⋯∫s20ds1 (2.12) E[u0(Zxt)n∏i=1⟨Λ(si−si−1)B(Zxt−si),Q−1/2si−si−1(Zxt−si−1−e(si−si−1)AZxt−si)⟩].

Moreover,

 ∥vn(t)∥∞≤∥u0∥∞∥B∥n∞∫t0dsn∫sn0dsn−1⋯∫s20ds1n∏i=1∥Λ(si−si−1)∥L(H)

and, letting ,

 ∥Dvn(t)∥∞≤∥u0∥∞∥B∥n∞∫t0dsn∫sn0dsn−1⋯∫s20ds1n+1∏i=1∥Λ(si−si−1)∥L(H).
###### Proof.

We proceed by induction. Indeed, in view of the proofs in Section 2.2, we shall also prove inductively the formula

 knt(x)= ∫t0dsn∫sn0dsn−1⋯∫s20ds1

where and . The discussions in Sections 2.1 and 2.2 show that the assertions on hold for , and the above formula of holds with . Now we assume the assertions on (resp. on ) hold for (resp. for ), and try to prove them in the next iteration.

By the induction hypotheses, we have for all and thus, by the definition of the iteration (2.8), with

 ∥∥kns∥∥∞ ≤∥B∥∞∥Dvn(s)∥∞ ≤∥u0∥∞∥B∥n+1∞∫s0dsn∫sn0dsn−1⋯∫s20ds1n+1∏i=1∥Λ(si−si−1)∥L(H),

where . Proposition 2.4 implies for all , and from the formula

 vn+1(t,x)=∫t0(St−skns)(x)ds

we deduce readily the estimates on and .

Next we prove the formula for (note that the induction hypothesis gives us the expression of