Matrix Lie Maps and Neural Networks for Solving Differential Equations

08/16/2019 ∙ by Andrei Ivanov, et al. ∙ 0

The coincidence between polynomial neural networks and matrix Lie maps is discussed in the article. The matrix form of Lie transform is an approximation of the general solution of the nonlinear system of ordinary differential equations. It can be used for solving systems of differential equations more efficiently than traditional step-by-step numerical methods. Implementation of the Lie map as a polynomial neural network provides a tool for both simulation and data-driven identification of dynamical systems. If the differential equation is provided, training a neural network is unnecessary. The weights of the network can be directly calculated from the equation. On the other hand, for data-driven system learning, the weights can be fitted without any assumptions in view of differential equations. The proposed technique is discussed in the examples of both ordinary and partial differential equations. The building of a polynomial neural network that simulates the Van der Pol oscillator is discussed. For this example, we consider learning the dynamics from a single solution of the system. We also demonstrate the building of the neural network that describes the solution of Burgers' equation that is a fundamental partial differential equation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Traditional methods for solving systems of differential equations imply a numerical step-by-step integration of the system. For some problems, this integration leads to time-consuming algorithms because of the limitations on the time interval that is used to achieve the necessary accuracy of the solution. From this perspective, neural networks as universal function approximation can be applied for the construction of the solution in a more performance way.

In the article [1], the method to solve initial and boundary value problems using feedforward neural networks is proposed. The solution of the differential equation is written as a sum of two parts. The first part satisfies the initial/boundary conditions. The second part corresponds to a neural network output. The same technique is applied for solving Stokes problem in [2, 3] and implemented in code in [4].

In the article [5], the neural network is trained to satisfy the differential operator, initial condition, and boundary conditions for the partial differential equation (PDE). The authors in [6]

translate a PDE to a stochastic control problem and use deep reinforcement learning for an approximation of derivative of the solution with respect to the space coordinate.

Other approaches rely on the implementation of a traditional step-by-step integrating method in a neural network basis [7, 8]. In the article [8]

, the author proposes such an architecture. After fitting, the neural network produces an optimal finite difference scheme for a specific system. The backpropagation technique through an ordinary differential equation (ODE) solver is proposed in

[9]. The authors construct a certain type of neural network that is analogous to a discretized differential equation. This group of methods requires a traditional numerical method to simulate dynamics.

Polynomial neural networks are also widely presented in the literature [10, 11, 12]. In the article [10]

, the polynomial architecture that approximates differential equations is proposed. The Legendre polynomial is chosen as a basis function of hidden neurons in

[11]. In these articles, the polynomial architectures are used as black box models, and the authors do not explain its connection to the theory of ODEs.

In all the described approaches, the neural networks are trained to consider the initial conditions of the differential equations. This means that the neural network should be trained each time when the initial conditions are changed. The above-described techniques are applicable to the general form of differential equations but are able to provide only a particular solution of the system.

In the article, we consider polynomial differential equations. Such nonlinear systems arise in different fields such as automated control, robotics, mechanical and biological systems, chemical reactions, drug development, molecular dynamics, and so on. Moreover, often it is possible to transform a nonlinear equation to a polynomial view with some level of accuracy.

For polynomial differential equations, it is possible to build a polynomial neural network that is based on the matrix Lie transform and approximates the general solution of the system of equations. Having a Lie transform–based neural network for such a system, dynamics for different initial conditions can be estimated without refitting of the neural network. Additionally, we completely avoid numerical ODE solvers in both simulation and data-driven system learning by describing the dynamics with maps instead of step-by-step integrating.

2 Proposed Neural Network

The proposed architecture is a neural network representation of a Lie propagator for dynamical systems integration that is introduced in [13] and is commonly used in the charged particle dynamics simulation [13, 14]. We consider dynamical systems that can be described by nonlinear ordinary differential equations,

(1)

where is an independent variable,

is a state vector, and

means -th Kronecker power of vector . There is an assumption that function can be expanded in Taylor series with respect to the components of .

The solution of (1) in its convergence region can be presented in the series [15, 16],

(2)

where In [14], it is shown how to calculate matrices by introducing new matrices . The main idea is replacing (1) by the equation

(3)

This equation should be solved with initial condition , where

is the identity matrix. Theoretical estimations of accuracy and convergence of the truncated series in solving of ODEs can be found in

[17].

The transformation can be considered as a discrete approximation of the evolution operator of (1) for initial time and interval . This means that the evolution of the state vector during time can be approximately calculated as . Hence, instead of solving the system of ODEs numerically, one can apply a calculated map and avoid a step-by-step integrating.

2.1 Neural Network Representation of Matrix Lie Transform

The proposed neural network implements map in form of

(4)

where , are weight matrices, and means -th the Kronecker power of vector . For a given system of ODEs (1), one can compute matrices in accordance with (3) up to the necessary order of nonlinearity.

Fig. 1 presents a neural network for map (4) up to the third order of nonlinearities for a two-dimensional state. In each layer, the input vector is consequently transformed into and , where weighted sum is applied. The output Y equals to the sum of results from every layer. In the example, we reduce Kronecker powers for decreasing of weights matrices dimension (e.g., ).

Figure 1: Neural network representation of third order matrix Lie map.

2.2 Fitting Neural Network

If the differential equation is provided, the training of neural network is not necessary. The weights of the network can be calculated directly from the equation following the relation (3). On the other hand, for data-driven system learning, the weights in form of (4) can be fitted without any assumptions on view of differential equations.

To fit a proposed neural network, the training data is presented as a multivariate time series (table 1) that describes the evolution of the state vector of the dynamical system in a discrete time. In a general case, each step should be described as map , but if the system (1) is time independent, then weights depends only on time interval .

INPUT OUTPUT
Table 1: Discrete states of a dynamical system for training the proposed network.

3 Ordinary Differential Equations

In this section, we consider the Van der Pol oscillator. The equation is widely used in the physical sciences and engineering and can be used for the description of the pneumatic hammer, steam engine, periodic occurrence of epidemics, economic crises, depressions, and heartbeat. The equation has well-studied dynamics and is widely used for testing of numerical methods (e.g., [18]).

3.1 Simulation of the Van der Pol Oscillator

The Van der Pol oscillator is defined as the system of ODEs that can be presented in the form of

(5)

The results of numerical integration of the system with the implicit Adams method of eighth order with the maximum time step are presented in Fig. 2 with red lines. The four different particular solutions with initial conditions, , and , were calculated.

Figure 2: Simulation of the Van der Pol oscillator. Red lines for the implicit Adams method of eighth order, blue dots for matrix Lie map of third order.

Another method for simulating the dynamics is mapping approach. The weights of the matrix Lie map can be calculated up to the necessary order of nonlinearity based on the equation (3). For instance, for the third order and the same time interval, it yields weight matrices

The corresponding polynomial neural network implements transformation

The results of the numerical integration of the system with the neural network are presented in Fig. 2 with blue dots. Note that for the matrix Lie maps, the accuracy of the truncation of the series (order of nonlinearity of the transformation) and the accuracy of weights calculation should be considered separately. The theory of the accuracy and convergence of the truncated series (2) in solving ODEs can be found in [13, 17].

From a practical perspective, the accuracy of the simulation provided by a polynomial neural network can be estimated with respect to the traditional numerical solver. For example, the mean relative errors between the predictions of the Lie map–based networks of the third, fifth, and seventh orders of nonlinearity with respect to the numerical solution calculated with the Adams method of eighth order are equal to , , and , respectively.

3.2 Learning of the Van der Pol Oscillator

In the previous section, we described how weights for the proposed polynomial neural network can be calculated based on the equation. On the other hand, when the equation is not known, but a particular solution is provided, the weights can be fitted by a neural network without any assumptions in view of differential equations.

A particular solution of the system with the initial condition can be generated by numerically integrating system (5) with time step during time

. Having this training data set, the proposed neural network can be fitted with the mean squared error (MSE) as a loss function based on the norm

We implemented the above-described technique in Keras/TensorFlow and fitted a third-order Lie transform–based neural network with an Adamax optimizer.

Figure 3: Training data for the neural network (red dots) and provided predictions (lines).

The generalization property of the network can be investigated by examining prediction not only from the training data set but also for new initial conditions. Fig. 3 (a) shows the training set as a particular solution of the system with initial conditions . Fig. 3 (b) demonstrates predictions that are calculated starting at both the same initial condition and for the new points , and . For the prediction starting from the training initial condition, the mean relative error of the predictions is . For the new initial conditions, the mean error is .

4 Partial Differential Equations

Burgers’ equation is a fundamental partial differential equation that occurs in various areas, such as fluid mechanics, nonlinear acoustics, gas dynamics, and traffic flow. This equation is also often used as a benchmark for numerical methods. For example, one of the problems proposed in the Airbus Quantum Computing Challenge [19] is building a neural network that solves Burgers’ equation with at least the same level of accuracy and higher computational performance as the traditional numerical methods. In the article [20], a feedforward neural network is trained to satisfy Burgers’ equation and certain initial conditions, but the computational performance of the approach is not estimated. In this section, we demonstrate how to build a Lie transform–based neural network that solves Burgers’ equation.

4.1 The Finite Difference Method for Burgers’ Equation

Burgers’ equation has a form

(6)

Following the [19] for benchmarking, we use an analytic solution

and a traditional numerical method

(7)

where stands for the time step, and stands for the grid node.

The equation (7) presents a finite difference method (FDM) that consists of an Euler explicit time discretization scheme for the temporal derivatives, an upwind first-order scheme for the nonlinear term, and finally a centered second-order scheme for the diffusion term. The time step for benchmarking is fixed to with the uniform spacing of . Thus, for the numerical solution for times from to 5 on , the method requires the mesh with 1000 steps on space coordinate and 2000 time steps.

It is indicated in [19] that the FDM introduces a dispersion error in the solution (see Fig. 4, a). Such error can be reduced by increasing the mesh resolution, but then the time step should be decreased to respect the stability constraints of the numerical scheme.

Figure 4: A benchmark (a) on mesh and Lie transform–based neural network (b) on mesh .

4.2 Lie Transform–Based Neural Network

Though there are Lie group methods that directly apply Lie theory to PDEs [21, 22], we utilize a different approach. We convert the equation (6) to a system of ODEs and build a matrix Lie map in accordance with Section 2 for this new system.

Assuming that the right-hand side of the equation (6) can be approximated by a function and considering this approximated equation as a hyperbolic one, it is possible to derive the system of ODEs

(8)

where , , and is vector of discrete stamps on space. This transformation from PDE to ODE is well known and can be derived using the method of characteristics and direct method [23]. If is the same discretization as in (7), then the equation (8) leads to the system of 2000 ODEs

which can be easily expanded to the Taylor series with respect to the and up to the necessary order of nonlinearity.

Using this system of ODEs, we have built a Lie transform–based neural network for a time interval . This time step is five times larger than that used in the benchmarking (see Fig. 5). The numerical solution provided by the neural network is presented in Fig. 4 (b), and the accuracy and performance are compared in Table 2.

Method Time Mesh Elapsed MSE for
step size time
FDM               0.055 sec     
Lie transform–based
neural network 0.016 sec
Table 2: Comparison of simulation by FDM and matrix Lie map.

The built polynomial neural network provides better accuracy with less computational time. If the FDM scheme is adjusted to a higher accuracy, the computational time will be increased even more. Accuracy is calculated as the MSE metric between the numerical solution and its analytic form at final time .

Figure 5: Numerical schemes for FDM and Lie transform–based neural network for (6).

5 Code

The implementation of the Lie transform–based neural network in Keras/TensorFlow and the algorithm for map building for autonomous systems are provided at the GitHub repository: https://github.com/andiva/DeepLieNet. The notebook https://github.com/andiva/DeepLieNet/tree/master/demo/VanderPol.ipynb
corresponds to Section 3 and consists of simulation, the definition of metrics for accuracy estimation, neural network configuration, and fitting. The notebook https://github.com/andiva/DeepLieNet/tree/master/demo/Burgers.ipynb reproduces the results presented in Section 4.

6 Conclusion

In the article, we demonstrate the solving of differential equations with polynomial neural networks that are based on matrix Lie maps. Since the weights of the proposed neural network can be directly calculated from the equations, it does not require fitting with respect to the initial condition. Built at once, the neural network can be considered as a model of the system and can be used for simulation with different initial conditions.

In the case of large time steps for map calculating, the proposed approach can significantly outperform traditional numerical methods. For Burgers’ equation, the computational performance is increased several times with the same level of accuracy. For some problems in the charged particle dynamics simulation, the performance is increased a thousand times with an appropriate accuracy in comparison with the traditional step-by-step integrating [24, 25].

The proposed neural network can be used for data-driven identification of the systems. It may provide a high level of generalization when learning dynamical systems from data. As shown with the Van der Pol oscillator, learning the dynamics of the system with only a particular solution is possible. The neural network presented in Section 4 can be additionally fitted to satisfy the initial conditions. In this sense, the training will provide an optimal numerical approach with respect to certain initial conditions.

The limitations of the data-driven approach for large-scale systems, optimal network configuration, and noisy data consideration should be examined in further research.

References