 # Error estimates of the backward Euler-Maruyama method for multi-valued stochastic differential equations

In this paper, we derive error estimates of the backward Euler-Maruyama method applied to multi-valued stochastic differential equations. An important example of such an equation is a stochastic gradient flow whose associated potential is not continuously differentiable, but assumed to be convex. We show that the backward Euler-Maruyama method is well-defined and convergent of order at least 1/4 with respect to the root-mean-square norm. Our error analysis relies on techniques for deterministic problems developed in [Nochetto, Savaré, and Verdi, Comm. Pure Appl. Math., 2000]. We verify that our setting applies to an overdamped Langevin equation with a discontinuous gradient and to a spatially semi-discrete approximation of the stochastic p-Laplace equation.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

In this paper, we investigate the numerical approximation of multi-valued stochastic differential equations (MSDE). An important example of such equations is provided by stochastic gradient flows with a convex potential. More precisely, let and

be a filtered probability space satisfying the usual conditions. By

, , we denote a standard -adapted Wiener process. For instance, let us consider the numerical treatment of nonlinear, overdamped Langevin-type equations of the form

 (1.1) {dX(t)=−∇Φ(X(t))dt+g0dW(t),t∈(0,T],X(0)=X0,

where , , , and

are given. These equations have many important applications, for example, in Bayesian statistics and molecular dynamics. We refer to

[10, 22, 23, 44, 49] and the references therein.

We recall that, if the gradient is of superlinear growth, then the classical forward Euler–Maruyama method is known to be divergent in the strong and weak sense, see . This problem can be circumvented by using modified versions of the explicit Euler–Maruyama method based on techniques such as taming, truncating, stopping, projecting, or adaptive strategies, cf. [4, 6, 17, 19, 29, 48].

In this paper, we take an alternative approach by considering the backward Euler–Maruyama method. Our main motivation for considering this method lies in its good stability properties, which allow its application to stiff problems arising, for instance, from the spatial semi-discretization of stochastic partial differential equations. Implicit methods have also been studied extensively in the context of stochastic differential equations with superlinearly growing coefficients. For example, see

[1, 15, 16, 30, 31].

The error analysis in the above mentioned papers on explicit and implicit methods typically requires a certain degree of smoothness of such as local Lipschitz continuity. The purpose of this paper is to derive error estimates of the backward Euler–Maruyama method for equations of the form (1.1), where the associated potential is not necessarily continuously differentiable, but assumed to be convex.

For the formulation of the numerical scheme, let be the number of temporal steps, let be the step size, and let

 (1.2) π={0=t0<…

be an equidistant partition of the interval , where for . The backward Euler–Maruyama method for the Langevin equation (1.1) is then given by the recursion

 (1.3) {Xn=Xn−1−k∇Φ(Xn)+g0ΔWn,n∈{1,…,N},X0=X0,

where .

An example of a non-smooth potential is found by setting and , , for . Evidently, the gradient of is not locally Lipschitz continuous at for . Moreover, if , then the gradient

has a jump discontinuity of the form

 (1.4) ∇Φ(x)=⎧⎨⎩−1, if x<0,c, if x=0,1, if x>0.

Here, the value at is not canonically determined. We have to solve a nonlinear equation of the form in each step of the backward Euler method (1.3). However, if , then the sole candidate for a solution is , since otherwise . But is only a solution if . Therefore, the mapping is not surjective for any single-valued choice of .

This problem can be bypassed by considering the multi-valued subdifferential of a convex potential , which is given by

 ∂Φ(x)={v∈Rd:Φ(x)+⟨v,y−x⟩≤Φ(y)for all y∈Rd}.

Recall that if the gradient exists at in the classical sense. See [45, Section 23] for further details.

In the above example, one easily verifies that

 ∂Φ(x)=⎧⎨⎩{−1}, if x<0,, if x=0,{1}, if x>0.

This allows us to solve the nonlinear inclusion where we want to find with for any .

For this reason, we study the more general problem of the numerical approximation of multi-valued stochastic differential equations (MSDE) of the form

 (1.5) {dX(t)+f(X(t))dt∋b(X(t))dt+g(X(t))dW(t),t∈(0,T],X(0)=X0.

Here, we assume that the mappings and are globally Lipschitz continuous. Moreover, the multi-valued drift coefficient function is assumed to be a maximal monotone operator, cf. Definition 2.1 below. See also Section 4 for a complete list of all imposed assumptions on the MSDE (1.5). Let us emphasize that the subdifferential of a proper, lower semi-continuous and convex potential is an important example of a possibly multi-valued and maximal monotone mapping , cf. [45, Corollary 31.5.2].

The backward Euler–Maruyama method for the approximation of the MSDE (1.5) on the partition is then given by the recursion

 (1.6) {Xn∈Xn−1−kf(Xn)+kb(Xn)+g(Xn−1)ΔWn,n∈{1,…,N},X0=X0.

We discuss the well-posedness of this method (1.6) under our assumptions on , , and in Section 5. In particular, it will turn out that both problems, (1.5) and (1.6), admit single-valued solutions and , respectively.

The main result of this paper, Theorem 6.4, then states that the backward Euler–Maruyama method is convergent of order at least with respect to the norm in . For the error analysis we rely on techniques for deterministic problems developed in . An important ingredient is the additional condition on that there exists with

 ⟨fv−fz,z−w⟩≤γ⟨fv−fw,v−w⟩

for all and , , . This assumption is easily verified for a subdifferential of a convex potential, cf. Lemma 3.2. As already noted in  for deterministic problems, this inequality allows us to avoid Gronwall-type arguments in the error analysis for terms involving the multi-valued mapping .

Before we give a more detailed outline of the content of this paper let us mention that multi-valued stochastic differential equations have been studied in the literature before. The existence of a uniquely determined solution to the MSDE (1.5) has been investigated, e.g., in [7, 21, 41]. We also refer to the more recent monograph  and the references therein. In [14, 51] related results have been derived for multi-valued stochastic evolution equations in infinite dimension. The numerical analysis for MSDEs has also been considered in [3, 26, 42, 53, 55]. However, these papers differ from the present paper in terms of the considered numerical methods, the imposed conditions, or the obtained order of convergence.

Further, we also mention that several authors have developed explicit numerical methods for SDEs with discontinuous drifts in recent years. For instance, we refer to [9, 24, 25, 33, 34, 35, 36]. While these results often apply to more irregular drift coefficients, which are beyond the framework of maximal monotone operators, the authors have to employ more restrictive conditions such as the global boundedness of the drift, which is not required in our framework.

This paper is organized as follows: In Section 2 we fix some notation and recall important terminology for multi-valued mappings. In Section 3 we demonstrate how to apply the techniques from  to the simplified setting of the Langevin equation (1.1). In addition, we also show that if the gradient is more regular, say Hölder continuous with exponent , then the order of convergence increases to . Moreover, it turns out that the error constant does not grow exponentially with the final time . This is an important insight if the backward Euler method is used within an unadjusted Langevin algorithm , which typically requires large time intervals. See Theorem 3.7 and Remark 3.8 below.

In Section 4 we turn to the more general multi-valued stochastic differential equation (1.5). We state all assumptions imposed on the appearing drift and diffusion coefficients and collect some properties of the exact solution. In Section 5 we show that the backward Euler–Maruyama method (1.6) is well-posed under the assumptions of Section 4. In Section 6 we prove the already mentioned convergence result with respect to the root-mean-square norm. Finally, in Section 7 we verify that the setting of Section 4 applies to a Langevin equation with the discontinuous gradient (1.4). Further, we also show how to apply our results to the spatial discretization of the stochastic -Laplace equation which indicates their usability for the numerical analysis of stochastic partial differential equations. However, a complete analysis of the latter problem will be deferred to a future work.

## 2. Preliminaries

In this section, we collect some notation and introduce some background material. First we recall some terminology for set valued mappings and (maximal) monotone operators. For a more detailed introduction we refer, for instance, to [47, Abschn. 3.3] or [39, Chapter 6].

By , , we denote the Euclidean space with the standard norm and inner product . Let be a set. A set-valued mapping maps each to an element of the power set , that is, . The domain of is given by

 D(f)={x∈M:f(x)≠∅}.
###### Definition 2.1.

Let be a non-empty set. A set-valued mapping is called monotone if

 ⟨fu−fv,u−v⟩≥0

for all , , and .

Moreover, a set-valued mapping is called maximal monotone if is monotone and for all and satisfying

 ⟨y−fv,x−v⟩≥0 for all v∈D(f),fv∈f(v),

it follows that and .

Next, we recall a Burkholder–Davis–Gundy-type inequality. For a proof we refer to [28, Chapter 1, Theorem 7.1]. For its formulation we take note that the Frobenius or Hilbert–Schmidt norm of a matrix is also denoted by .

###### Lemma 2.2.

Let and be stochastically integrable. Then, for every with , the inequality

 E[∣∣∫tsg(τ)dW(τ)∣∣p] ≤(p(p−1)2)p2(t−s)p−22E[∫ts|g(τ)|pdτ]

holds.

Let us also recall a stochastic variant of the Gronwall inequality. A proof that can be modified to this setting can be found in . Compare also with .

###### Lemma 2.3.

Let be -adapted and almost surely continuous stochastic processes such that is a local -martingale with . Moreover, suppose that and are nonnegative. In addition, let be integrable and nonnegative. If, for all , we have

 Z(t)≤ξ(t)+∫t0φ(s)Z(s)ds+M(t),P-almost surely,

then, for every , the inequality

 E[Z(t)]≤exp(∫t0φ(s)ds)E[sups∈[0,t]ξ(s)]

holds.

Moreover, we often make use of generic constants. More precisely, by we denote a finite and positive quantity that may vary from occurrence to occurrence but is always independent of numerical parameters such as the step size and the number of steps .

## 3. Application to the Langevin equation with a convex potential

In order to illustrate our approach, we first consider a more regular stochastic differential equation with single-valued (Hölder) continuous drift term. More precisely, we consider the overdamped Langevin equation [23, Section 2.2]

 (3.1) {dX(t)=−∇Φ(X(t))dt+g0dW(t),t∈[0,T],X(0)=X0,

where , , and is a standard -valued Wiener process. In addition, we impose the following assumption on the potential .

###### Assumption 3.1.

Let be a convex, nonnegative, and continuously differentiable function.

In the following, we denote by the gradient of , that is . It is well-known that the convexity of implies the variational inequality

 (3.2) ⟨f(v),w−v⟩≤Φ(w)−Φ(v),v,w∈Rd,

see, for example, [45, § 23].

In the following lemma, we collect some properties of , which are direct consequences of Assumption 3.1. Both inequalities are well-known. The proof of (3.4) is taken from .

###### Lemma 3.2.

Under Assumption 3.1 and with , the inequalities

 (3.3) ⟨f(v)−f(w),v−w⟩≥0

and

 (3.4) ⟨f(v)−f(z),z−w⟩≤⟨f(v)−f(w),v−w⟩

are fulfilled for all .

###### Proof.

The first inequality follows directly from (3.2) since

 ⟨f(v)−f(w),v−w⟩ =−⟨f(v),w−v⟩−⟨f(w),v−w⟩ ≥−(Φ(w)−Φ(v))−(Φ(v)−Φ(w))=0

for all . For the proof of the second inequality we start by rewriting its left-hand side. For arbitrary we rearrange the terms to obtain

 ⟨f(v)−f(z),z−w⟩ =⟨f(v),z⟩−⟨f(v),w⟩+⟨f(z),w−z⟩±⟨f(v),v⟩ =⟨f(v),z−v⟩+⟨f(v),v−w⟩+⟨f(z),w−z⟩±⟨f(w),v−w⟩ =⟨f(v),z−v⟩+⟨f(v)−f(w),v−w⟩+⟨f(z),w−z⟩+⟨f(w),v−w⟩.

Setting for all , we see that

 ⟨f(v)−f(z),z−w⟩ =⟨f(v)−f(w),v−w⟩+Φ(z)−Φ(v)−σ(v,z) +Φ(w)−Φ(z)−σ(z,w)+Φ(v)−Φ(w)−σ(w,v) =⟨f(v)−f(w),v−w⟩−σ(v,z)−σ(z,w)−σ(w,v).

But (3.2) says that for all , which completes the proof. ∎

It follows from Assumption 3.1 and Lemma 3.2 that the drift of the stochastic differential equation (3.1) is continuous and monotone. Therefore, by [43, Thm. 3.1.1] the stochastic differential equation (3.1) has a solution in the strong (probabilistic) sense satisfying -a.s. for all

 (3.5) X(t)=X0−∫t0f(X(s))ds+g0W(t).

Moreover, the solution is unique up to -indistinguishability and it is square-integrable with

 supt∈[0,T]E[|X(t)|2]≤C(1+E[|X0|2]).

Next, we turn to the numerical approximation of the solution of (3.1). Recall that for a single-valued drift the backward Euler–Maruyama method is given by the recursion

 (3.6) {Xn=Xn−1−kf(Xn)+g0ΔWn,n∈{1,…,N},X0=X0,

where , and .

The next lemma contains some a priori estimates for the backward Euler–Maruyama method (3.6).

###### Lemma 3.3.

Let be given and let Assumption 3.1 be satisfied. For an arbitrary step size , , let be a family of

3.6). If , then

 (3.7) maxn∈{1,…,N}E[|Xn|2]≤E[|X0|2]+2T(Φ(0)+|g0|2)

and

 (3.8) N∑n=1E[|Xn−Xn−1|2]+4kN∑n=1E[Φ(Xn)]≤2E[|X0|2]+4T(Φ(0)+|g0|2).
###### Proof.

First, we recall the identity

 ⟨Xn−Xn−1,Xn⟩=12(|Xn|2−|Xn−1|2+|Xn−Xn−1|2).

Using also (3.6), we then get

 |Xn|2−|Xn−1|2+|Xn−Xn−1|2 =2⟨Xn−Xn−1,Xn⟩ =−2k⟨f(Xn),Xn⟩+2⟨g0ΔWn,Xn⟩,

for every . Hence, an application of (3.2) yields

 |Xn|2−|Xn−1|2+|Xn−Xn−1|2 ≤2k(Φ(0)−Φ(Xn))+2⟨g0ΔWn,Xn⟩,

for every . From applications of the Cauchy–Schwarz inequality and the weighted Young inequality we then obtain

 |Xn|2−|Xn−1|2+|Xn−Xn−1|2+2kΦ(Xn) ≤2kΦ(0)+2⟨g0ΔWn,Xn−Xn−1⟩+2⟨g0ΔWn,Xn−1⟩ ≤2kΦ(0)+2∣∣g0ΔWn∣∣2+12|Xn−Xn−1|2+2⟨g0ΔWn,Xn−1⟩,

for every .

The third term on the right-hand side is absorbed in the third term on the left-hand side. Summation then yields

 |Xn|2+12n∑j=1|Xj−Xj−1|2+2kn∑j=1Φ(Xj) ≤|X0|2+2tnΦ(0)+2n∑j=1∣∣g0ΔWj∣∣2+2n∑j=1⟨g0ΔWj,Xj−1⟩.

An inductive argument over then yields that is square-integrable due to the assumption . Therefore, after taking expectation the last sum vanishes. Moreover, an application of the Itō isometry then gives

 ≤E[|X0|2]+2tnΦ(0)+2n∑j=1E[∣∣g0ΔWj∣∣2] =E[|X0|2]+2tn(Φ(0)+|g0|2).

Since this is true for any the assertion follows. ∎

As the next theorem shows, Assumption 3.1 is also sufficient to ensure the well-posedness of the backward Euler–Maruyama method. The result follows directly from the fact that is continuous and monotone due to (3.3). For a proof we refer, for instance, to [4, Sect. 4], [38, Chap. 6.4], and [52, Theorem C.2]. The assertion also follows from the more general result in Theorem 5.3 below.

###### Theorem 3.4.

Let as well as be given and let Assumption 3.1 be satisfied. Then, for every equidistant step size , , there exists a uniquely determined family of square-integrable and -adapted random variables satisfying (3.6).

We now turn to an error estimate with respect to the -norm. Since we do not impose any (local) Lipschitz condition on the drift , classical approaches based on discrete Gronwall-type inequalities are not applicable. Instead we rely on an error representation formula, which was introduced for deterministic problems in .

For the formulation of this, we introduce the following notation: For a given equidistant partition with step size , we denote by

the piecewise linear interpolant of the sequence

generated by the backward Euler method (3.6). It is defined by and

 (3.9) X(t)=t−tn−1kXn+tn−tkXn−1,for all t∈(tn−1,tn],n∈{1,…,N}.

In addition, we introduce the processes , which are piecewise constant interpolants of and defined by and

 (3.10) ¯¯¯¯X(t)=XnandX––(t)=Xn−1, for all t∈(tn−1,tn],n∈{1,…,N}.

Analogously, we define the piecewise linear interpolated process by and

 (3.11) W(t)=t−tn−1kW(tn)+tn−tkW(tn−1)=W(tn−1)+t−tn−1kΔWn,

for all , .

We are now prepared to state Lemma 3.5. The underlying idea of this lemma was introduced in , where it is used to derive a posteriori error estimates for the backward Euler method. In fact, in the absence of noise, only the first term on the right-hand side of (3.12) is non-zero. In  this term is used as an a posteriori error estimator, since it is explicitly computable by quantities generated by the numerical method.

###### Lemma 3.5.

Let as well as be given and let Assumption 3.1 be satisfied. Let , , be an arbitrary equidistant step size and let , . Then, for every the estimate

 (3.12) E[|X(tn)−Xn|2]≤kn∑i=1E[⟨f(Xi)−f(Xi−1),Xi−Xi−1⟩]+2∫tn0E[⟨f(¯¯¯¯X(t))−f(X(t)),g0(W(t)−W(t))⟩]dt

holds, where and are the solutions of (3.1) and (3.6), respectively.

###### Proof.

From (3.6) we directly deduce that for every

 Xn=X0−kn∑i=1f(Xi)+g0W(tn).

Then, one easily verifies for all , , that

 X(t)=X0−∫t0f(¯¯¯¯X(s))ds+g0W(t).

Hence, due to (3.5) the error process fulfills

 (3.13)

for all . Here, we have , since is an interpolant of . Hence, for all ,

 (3.14) |E(tn)|2=|E1(tn)|2.

To estimate the norm of , we first note that has absolutely continuous sample paths with . Hence,

 12ddt|E1(t)|2=⟨˙E1(t),E1(t)⟩

is fulfilled for almost all . Therefore, by integration with respect to , we get

 (3.15) 12|E1(tn)|2=∫tn0⟨˙E1(t),E1(t)⟩dt=∫tn0⟨˙E1(t),E(t)⟩dt−∫tn0⟨˙E1(t),E2(t)⟩dt.

Next, we write

 X(t)=t−tn−1k¯¯¯¯X(t)+tn−tkX––(t),t∈(tn−1,tn],

and use (3.3) and (3.4) to obtain, for almost every , that

 ⟨˙E1(t),E(t)⟩ =⟨f(¯¯¯¯X(t))−f(X(t)),X(t)−X(t)⟩ =t−tn−1k⟨f(¯¯¯¯X(t))−f(X(t)),X(t)−¯¯¯¯X(t)⟩ +tn−tk⟨f(¯¯¯¯X(t))−f(X(t)),X(t)−X––(t)⟩ =tn−tk⟨f(Xn)−f(Xn−1),Xn−Xn−1⟩.

Furthermore, the expectation of the second integral on the right-hand side of (3.15) is equal to

 E[∫tn0⟨˙E1(t),E2(t)⟩dt]

Therefore,

 E[|E1(tn)|2] =2∫tn0E[⟨˙E1(t),E(t)⟩]dt−2∫tn0E[⟨˙E1(t),E2(t)⟩]dt ≤2n∑i=1∫titi−1ti−tkdtE[⟨f(Xi)−f(Xi−1),Xi−Xi−1⟩] +2∫tn0E[⟨f(¯¯¯¯X(t))−f(X(t)),g0(W(t)−W(t))⟩]dt.

Since the assertion follows. ∎

The next lemma contains an estimate of the difference between the Wiener process and its piecewise linear interpolant .

###### Lemma 3.6.

For every and every step size , , the equality

 (3.16) (∫T0E[|g0(W(t)−W(t))|2]dt)12=1√6T12|g0|k12

holds.

###### Proof.

From the definition (3.11) of it follows that

 ∫T0E[|g0(W(t)−W(t))|2]dt =1k2N∑n=1(∫tntn−1|g0|2(tn−t)2(t−tn−1)dt+∫tntn−1|g0|2(t−tn−1)2(tn−t)dt),

where we used that the two increments of the Wiener process are independent for every , , and we also applied Itō’s isometry. By symmetry of the two terms it then follows that

 ∫T0E[|g0(W(t)−W(t))|2]dt =16T|g0|2k,

and the proof is complete. ∎

The error estimates in Lemma 3.5 and Lemma 3.6 allow us to determine the order of convergence of the backward Euler–Maruyama method without relying on discrete Gronwall-type inequalities. The following theorem imposes the additional assumption that the drift is Hölder continuous. We include the parameter value , which simply means that is continuous and globally bounded. The case of less regular is treated in Section 6.

Observe that we recover the standard rate if , that is, if the drift is assumed to be globally Lipschitz continuous. Compare also with the standard literature, for example, [20, Chap. 12] or [32, Sect. 1.3].

For processes and exponents , we define the family of Hölder semi-norms by

###### Theorem 3.7.

Let as well as be given, let Assumption 3.1 be fulfilled and let be Hölder continuous with exponent , i.e., there exists such that

 |f(x)−f(y)|≤Lf|x−y|α, for all x,y∈Rd.

Then there exists such that for every step size , , the estimate

 maxn∈{0,…,N}∥X(tn)−Xn∥L2(Ω;Rd)≤Ck1+α4

holds, where and are the solutions to (3.1) and (3.6), respectively.

###### Proof.

Since is assumed to be -Hölder continuous it follows that

 |f(x)|≤max(Lf,|f(0)|)(1+|x|α),for all x∈Rd.

In particular, grows at most linearly. Therefore, as stated in [28, Chap. 2, Thm 4.3], the solution of (3.1) satisfies .

We will use Lemma 3.5 to prove the error bound. To this end, we first show that

 (3.17) kN∑i=1E[⟨f(Xi)−f(Xi−1),Xi−Xi−