 # Deep Euler method: solving ODEs by approximating the local truncation error of the Euler method

In this paper, we propose a deep learning-based method, deep Euler method (DEM) to solve ordinary differential equations. DEM significantly improves the accuracy of the Euler method by approximating the local truncation error with deep neural networks which could obtain a high precision solution with a large step size. The deep neural network in DEM is mesh-free during training and shows good generalization in unmeasured regions. DEM could be easily combined with other schemes of numerical methods, such as Runge-Kutta method to obtain better solutions. Furthermore, the error bound and stability of DEM is discussed.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Many problems in science and engineering can be modeled into a set of ordinary differential equations (ODEs)

 G(x,y,y′,y′′,⋯)=0,x∈[a,b]⊂R.

In most cases, it can not be easy to obtain the analytic solution and so one must typically rely on a numerical scheme to accurately approximate the solution. The important issues confronting the numerical study appear in the initial value problems since higher-order ODEs can be converted into the system of the first-order ODEs. Basic methods for initial value problems are the extremely popular Euler method or the Runge-Kutta method. However, numerical methods have often to balance the discretization step size and computation time. Furthermore, the class of stiff ordinary differential equations may still present a more serious challenge to numerical computation.

In recent years, there has been a growing interest in solving the differential equations and the inverse problems by deep learning. The works include numerical solutions of ODEs and PDEs ([rudy2019deep], [raissi2018deep], [sun2019neupde], [farimani2017deep]), recovery of the involving systems ([both2019deepmod], [khoo2017solving], [khoo2019switchnet]), overcoming the curse of dimension of high-dimensional PDEs ([han2017overcoming], [hutzenthaler2018overcoming]), uncertainty quantification ([tripathy2018deep], [winovich2019convpde]) etc. Besides, several works have focused on the combination of traditional numerical methods and deep neural networks. ([sirignano2018dgm]

) proposed a merger of Galerkin methods and deep neural networks (DNNs) to solve high-dimensional partial differential equations (PDEs). They trained DNNs to satisfy the differential operator, initial condition, and boundary conditions. (

[raissi2019physics]) introduced physics-informed neural networks (PINNs), which is a deep learning framework for the synergistic combination of mathematical models and data. Following the physical laws of the control dynamics system, PINNs can deduce the solution of PDE and obtain the surrogate model. ([weinan2018deep]) presented a deep Ritz method for the numerical solution of variational problems based on the Ritz method. ([he2018relu]

) theoretically analyzed the relationship between DNN and finite element method(FEM). They explored the ReLU DNN representation of a continuous piecewise linear basis function in the finite element method. (

[long2019pde]) proposed PDE-Net to predict the dynamics of complex systems. The underlying PDEs can be discovered from the observation data by establishing the connections between convolution kernels in CNNs and differential operators. Based on the integral form of the underlying dynamical system, ([qin2019data]) considered ResNet block as a one-step method and recurrent ResNet and recursive ResNet as multi-step methods. ([wu2020data]) approximated the evolution operator by a residual network to solve and recover unknown time-dependent PDEs. ([raissi2018multistep]) blended the multi-step schemes with deep neural networks to identify and forecast nonlinear dynamical systems from data. ([regazzoni2019machine]) proposed neural networks based Model Order Reduction technique to solve dynamical systems arising from differential equations. ([wang2019learning]

) used reinforcement learning to empower Weighted Essentially Non-Oscillatory Schemes(WENO) for solving 1D scalar conservation laws.

It is well known that the forward Euler method is very easy to implement but it can’t give accurate solutions. The main reason is that the Euler method has only one order approximation accuracy, which requires a very small step size for any meaningful result. This makes the Euler method rarely used in practical applications and motivates us to propose a new Euler method combined with DNNs. We call the new method as deep Euler method (DEM). DEM only has the most general structure of a fully connected neural network, without any special designs in its structure, such as residual connections. As with some other deep neural network models, DEM also learns its representation using supervised pre-training. After the neural network gets trained satisfactorily, we post-process it to predict the solution of the ODE. The key difference is that in DEM, we explicitly capture information of the local truncation error of the Euler method instead of directly approaching the solution of the ODE. DEM has achieved state-of-the-art performance in solving ODEs, which is much better than the conventional numerical method, especially than the classical Euler method. This success can be attributed to the ability of the deep neural network in learning very strong hierarchical nonlinear representation. In particular, breakthroughs in supervised learning training are essential for deep neural networks to effectively and robustly predict.

The paper is organized as follows. We introduce the main idea of DEM in section 2 and give theoretical results of DEM in section 3. Based on DEM, we also derive other schemes of the single-step method for solving ODEs in section 4. In Section 5, numerical examples are given to demonstrate the capability and effectiveness of DEM. Finally, we conclude the paper in section 6.

## 2 Deep Euler Method

### 2.1 Formulation

Considering the following initial value ordinary differential equation:

 {dydx=f(x,y),x∈I=[a,b],y(a)=c, (1)

where the solution and satisfies the Lipschitz condition in , i.e.,

 ∥f(x,y1)−f(x,y2)∥

We introduce the discretization mesh (or sampling points) in ,

 a=x0

Let be the mesh size and be the numerical approximation of . The forward Euler method for (1) is

 {ym=ym−1+hm−1f(xm−1,ym−1),m=1,⋯,My0=c. (2)

Note that in most cases, we always adopt the uniform mesh for the forward Euler method, and , . The local truncation error and the global error are defined as

 Rm=y(xm+1)−y(xm)−hf(xm,y(xm))=∫xm+1xmf(x,y(s))ds−hf(xm,y(xm)). (3)

and

 e=|y(xm)−ym|,

respectively. It is well known that and ([Stig2008book]).

To obtain higher accuracy than the Euler method, a direct scheme is to separate a part of to improve the Euler step , so as to reduce the local truncation error and the global error of Euler method. To this end, we introduce a feedforward neural network in DEM that infers the update of an Euler step. As universal approximators, multilayer fully connected feedforward neural networks can approximate any continuous function arbitrarily ([leshno1993multilayer]). From (3), we could consider as a continuous function of variables . Thus, we utilize the fully connected neural network trained with enough measured data to approximate .

Let be the nonlinear operator defined by a multilayer fully connected neural network. The parameter includes all the weights and the biases in the neural network. DEM for (1) can be written as

 {ym+1=ym+hmf(xm,ym)+h2mN(xm,xm+1,ym;θ),m=0,⋯,M−1,y0=c. (4)

Formula (4) consists of Euler approximation and neural network approximation. The first part makes full use of the information of to express the linearity of ODE. The latter corrects the results of Euler approximation to obtain higher accuracy and express nonlinear features. Abstractly, can be thought of as a parametric function that learns how to represent the local truncation error so that their most salient characteristics can be reconstructed from its inputs and outputs. The output of

contains all features extracted in training to update the formula of the Euler method. Compared with using a neural network to approximate the solution of ODE directly, DEM separates the nonlinear part from the numerical scheme and then makes full use of neural networks to approximate the local truncation error of Euler method. Moreover,

has the same function as the nonlinear denoising process. This provides a very powerful and flexible method for solving ODEs because we can impose high order error correction and reduce the constrain of the step size in the Euler method. Hence, DEM can either improve the accuracy of the Euler method or speed up the computations of ODEs.

There are underlying principles for designing neural network architecture. In fact, we have another design of neural network . The output of the neural network is still an approximation of the local truncation error, while the input only has and . In this case, if , the neural network becomes very difficult to train. Because the dimension of the output is much larger than that of the input, it is almost impossible for the neural network to predict the sophisticated target with such few features.

### 2.2 Details of DEM

DEM is only a standard multilayer fully connected neural network, without any special designs in its structure, such as residual connections. With the input , the neural network in DEM can be written as

 N(x;θ)=LK∘σ∘LK−1⋯σ∘L1(x).

The nonlinear activation function

is rectified linear units (ReLU) function. The

-th hidden layer has the following form

 Lk(z)=Wkz+bk,1≤k≤K,

where the weight matrix , the bias ,

is the number of neurons in the

-th layer.

We assume that the measurement data is contaminated by noise, so that the training dataset has the form and satisfies

 1NN∑j=0δ2j≤δ2,

where the scalar is called the noise level.

For any pair of measurements (), we introduce the local truncation error function

 R(xi,xj,zi,zj)=1(Δx)2[zj−zi−Δxf(xi,zi)], (5)

where . With the following supervised loss

 J(θ)=2N(N−1)∑1≤i,j≤N∥N(xi,xj,zi;θ)−R(xi,xj,zi,zj)∥L1, (6)

DEM learns to approximate the local truncation error of the Euler method. The coefficient comes from the number of pairs in dataset .

Note that for any input pair , each of the training captures the features in the local truncation error, which have close relations with . Once features extractors corresponding to all pairs are trained and the strong hierarchical non-linear representations are generated, any new () is then represented by (4). On the other hand, DEM is mesh-free because all training data can be generated randomly and not necessary to locate at mesh points. Moreover, recalling the local truncation error of the Euler method is proportional to the square of the step size, we have . Hence, the neural network of DEM approaches a non-linear continuous function of , which is much easier than directly approximating the solution of the ODE. This makes DEM easier to train and requires fewer data in training.

## 3 Theoretical Analysis

### 3.1 Error Bound

###### Lemma 1.

Assume that the trained neural network satisfies

 ∣∣N(xm,xm+1,zm;θ)−R(xm,xm+1,zm,zm+1)∣∣

If and , then

###### Proof.

From Lemma 4 in [xu2012robustness], we have the conclusion that the neural network in DEM is Lipschitz continuous. That is, for any ,

 |N(x1;θ)−N(x2;θ)|≤LN∥x1−x2∥,

where , , , is the number of layers of the neural network. From

 ∣∣N(xm,xm+1,y(xm);θ)−N(xm,xm+1,zm;θ)∣∣ ≤ LN|y(xm)−zm| ≤ CLNδ

and

 ∣∣∣R(xm,xm+1,zm,zm+1)−1h2Rm∣∣∣ = 1h2∣∣[zm+1−y(xm+1)]−[zm−y(xm)]−h[f(xm,zm)−f(xm,y(xm))]∣∣ ≤ 2Cδh2+Lδh ≤ O(η),

we have

 ∣∣∣N(xm,xm+1,y(xm);θ)−1h2Rm∣∣∣ < ∣∣N(xm,xm+1,y(xm);θ)−N(xm,xm+1,zm;θ)∣∣ + ∣∣∣R(xm,xm+1,zm,zm+1)−1h2Rm∣∣∣ + ∣∣N(xm,xm+1,zm;θ)−R(xm,xm+1,zm,zm+1)∣∣ ≤ O(η)

For each pair of measurements , we use (5) to construct the training samples of the neural network. In the proof of Lemma 1, we have known that is smaller than a quantity which contains a factor . This indicates the big is a good choice. Therefore, we will only select the measurement pair with the large to construct training samples.

###### Theorem 1.

Under the assumptions of Lemma 1, the local truncation error of DEM is and the global truncation error is .

###### Proof.

From (4), the local truncation error (LTE) of DEM is

 LTE = ∣∣y(xm+1)−y(xm)−hf(xm,y(xm))−h2N(xm,xm+1,y(xm);θ)∣∣ ≤ h2∣∣∣N(xm,xm+1,y(xm);θ)−1h2Rm∣∣∣ < O(ηh2).

Hence, we can also conclude that the global truncation error is . ∎

Compared with the Euler method, the errors of DEM are reduced by times. The solution with high accuracy can be obtained. Besides, the size constrains of in the Euler method can be relaxed to speed up the computation of the solutions. For example, if the global error should be , the step size in the Euler method is at most . Starting from the initial , it takes Euler steps to get the solution . If , the step size in DEM can be and the number of steps can be reduced to , which is much smaller than the Euler method.

### 3.2 Numerical stability

###### Theorem 2.

Under the assumptions of Theorem 1, DEM is stable.

###### Proof.

For the initial values and (), DEM generates two approaches and with

 (7)

Since and , we have

 |ym−zm| ≤ |ym−1−zm−1|+hm−1|f(xm−1,ym−1)−f(xm−1,zm−1)| +h2m−1LN|ym−1−zm−1| ≤ (1+hm−1L+h2m−1LN)|ym−1−zm−1| ≤ m−1∏n=0(1+hnL+h2nLN)|y0−z0| ≤ C|y(0)−z(0)|

where is constant.

Considering the stiff equation , where and the initial value . The stability domain of DEM is , while the Euler method is . Although it can not be proved theoretically that the former must be larger than the latter, DEM can use a large step in numerical experiments. Under such a step size, the forward Euler method is certainly unstable. On the other hand side, a large stability domain can be obtained by adjusting . Recall that , where is determined by the activation function. If ReLU activation function is adopted then . We can change the norm of weight matrix by using the techniques of weight clipping ([salimans2016weight]) and weight normalization ([arjovsky2017wasserstein]), to adjust . For example, when and , The stability domain of DEM is , while the Euler method is . Hence we can use a larger step size to solve the equation than the Euler method.

## 4 Single Step methods

Based on the idea of approximating the local truncation error with a deep neural network, DEM could be generalized to other linear single-step methods. For instance, Heun’s method

 ym+1=ym+h2[f(xm,ym)+f(xm+1,ym+hf(xm,ym))]

is a second-order Runge-Kutta method. We can also add a deep neural network in it and get

 ym+1=ym+h2[f(xm,ym)+f(xm+1,ym+hf(xm,ym))]+h3mN(xm,xm+1,ym;θ). (8)

More generally, a order single-step method of (1) can be written as:

 {ym+1=ym+hψ(xm,ym,h),y0=c. (9)

The local truncation error is

 Rm=ym+1−ym−hψ(xm,ym,h)=O(hp+1).

The method (9) can also be modified as

 {ym+1=ym+hψ(xm,ym,h)+hp+1N(xm,xm+1,ym;θ),y0=c. (10)

For (8) and (10), the local truncation errors are and , respectively. The corresponding global truncation errors are and .

## 5 Numerical Example

### 5.1 Example 1

Considering the following initial value problem:

 {dydx=32yx+1+√x+1,x∈[0,10]y(0)=0 (11)

where the exact solution is .

At first, we highlight the performance of DEM with different step sizes with noise-free measured data. We generate 200 random noise-free measured data

, by sampling from a uniform distribution

, then we train the deep neural network

by minimizing the loss function of (

6). All norms used in this paper are norm. The neural network, with layers,

neurons and the ReLU activation function in each layer, is trained for 50 epochs, optimized with Adam. All the learning rate in this paper is set to be

. The same neural network architecture is used in Deep Heun’s method (8) for comparison. Figure 1 shows the evolution of the trained in DEM and the local truncation error function in the Euler method. The four different step sizes, that is , and are displayed. Since it is trained in , almost coincides with in .

Figure 2 shows the exact solution and four approximations of (11). The four different step sizes are also displayed. It is observed that only in the small step can the Euler method and the Huen obtain a more accurate approximation of the solution. We also note the fact that is only trained in . However, in , DEM and DHM can get the accurate approximation of the solution even for the bigger step size . This indicates the efficient prediction of the deep neural network.

In Table 1, we discuss the results of the comparison among four methods for the different step sizes. The Euler method, the Heun’s method, DEM and DHM are adopted for solving (11). The first four columns of Table 1 show the prediction errors between the exact solution and the approximation in norm, i.e. . Since the global truncation error of Deep Euler Method and Deep Heun’s method are and . That is the reason that when

, DEM and DHM get more accuracy than the Euler method and the Heun method. Our goal is to get the estimates of

. In view of Lemma 1, . In the column of , we present the mean of the difference between and , which is . In the last column, we present the results of DEM (the fourth column) divided by the result of the Euler method (the second column), i.e., . From the last two columns, we can conclude that , which also indicates the efficiency of DEM.

Table 2 shows for different network architectures (the number of hidden layers and neurons) and the different number of random measured points. We evolution the neural networks with . To avoid uncertainty during the training process, we simulate each case ten times and take the average value of them. It can be observed that the prediction accuracy increase with the number of measured points. If the networks with too small layers and neurons per layer (such as layers and neurons per layer), it is not suitable in the case of a small number of measured data since it has a high bias in the training region

and a high variance in the testing region

. If the networks with the more layers and neurons per layer (such as layers and neurons per layer), it may be overfitted in the case of the small number of measured data since it has low bias and high variance. More measured data can reduce the variance. We choose the networks with layers and neurons per layer for all experiments. Because no matter in a small amount or a large number of measured data, it has low variance and low bias than others.

Figure 3 shows the DEMs and the local truncation error function together for comparison. Each of DEM is trained for one case of noise levels (). The four different step sizes, that is , , are displayed. It can be observed from the subfigures that the change of step size has little effect on the result. In fact, from Table 3, we also noted that. When the step size changes from to , under various noise levels () are almost the same. However, obviously, is the smallest in the case of the smallest and the lest noise level (noise-free).

### 5.2 Example 2

We now consider a system of first-order nonlinear differential equations, the Lotka-Volterra equation ([lotka1925elements]),

 ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩dy1dx=αy1−βy1y2,dy2dx=−γy2+δy1y2. (12)

This equation describes the dynamics of the populations of two species, one as a predator and the other as prey. In (12), and are the number of prey and predator, respectively. Let , represent time and be the parameters describing the relationship of two species. In this work, we take , and the initial conditions

For comparison, the exact solution is gotten by the numerical method of in scipy.integrate [2020SciPy-NMeth] with relative tolerances. We sample 1000 random points from a uniform distribution to construct the noise-free training dataset . The deep neural network with hidden layers and neurons per layer is trained with super parameters the same as example 1.

Figure 4 shows the neural network of DEM and the local truncation error function in both the training region and the testing region . The four different cases of are displayed. It shows that DEM can accurately approximate the solution in the whole region even for . In Figure 5, we show the evolutions of the Euler method, the Huen method and DEM for comparison. For the cases of and , the results of the Euler method over are displayed in the left sub-figure. For the case of , the solutions of the Euler method diverge from the exact value very much near . When , the solution of the Euler method is close to , but the solution of is near , both of which are far away from the exact value. The Huen method has similar defects. When , the solutions of the Huen method are close to , which is far away from the exact value. It can be observed that no matter the small or the big , the results of DEM and the curve of the exact solution almost coincide.

### 5.3 Example 3

Considering the Kepler problem :

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩dy1dx=y3,dy2dx=y4,dy3dx=−y1(y21+y22)3/2,dy4dx=−y2(y21+y22)3/2. (13)

It describes the motion of the sun and a single planet which is a special case of the two-body problem. We denote the time by. Let be the positions of the planet in rectangular coordinates centered at the sun and be the velocity components in the and directions. The initial value are , and the exact solution is .

Similar to example 2, we use the uniform distribution to generate noise-free data points and select the same values of the super parameters as in Example 1. In Figure 6, four different cases of are displayed. It can be observed that of DEM well approximates for all of . We can also note that the approximations of DEM (circles in the figure) almost coincides with the curve of the exact solution in Figure 7, even for .

## 6 Conclusion

In this work, we proposed a Deep Euler Method with the idea of approximating the truncation error in the Euler method via deep learning. When deep neural network is trained to approximate the local truncation error function with accuracy , the global truncation error of DEM with the step size would be while the Euler method is only . Since can be small enough, it could achieve high accuracy solutions even with a big step size (). DEM significantly improves the accuracy of the Euler method and reduces the constrain of the step size in the Euler method. On the other hand, since the training objective function of the deep neural network in DEM is always , the deep neural network can be easily trained and fast to converge, even if only the simplest architecture of the fully connected network and only a few training data are used. Moreover, DEM shows good robustness with the noise of the measured data.