Physics-based polynomial neural networks for one-shot learning of dynamical systems from one or a few samples

05/24/2020 ∙ by Andrei Ivanov, et al. ∙ Bosch Saint-Petersburg State University 0

This paper discusses an approach for incorporating prior physical knowledge into the neural network to improve data efficiency and the generalization of predictive models. If the dynamics of a system approximately follows a given differential equation, the Taylor mapping method can be used to initialize the weights of a polynomial neural network. This allows the fine-tuning of the model from one training sample of real system dynamics. The paper describes practical results on real experiments with both a simple pendulum and one of the largest worldwide X-ray source. It is demonstrated in practice that the proposed approach allows recovering complex physics from noisy, limited, and partial observations and provides meaningful predictions for previously unseen inputs. The approach mainly targets the learning of physical systems when state-of-the-art models are difficult to apply given the lack of training data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The traditional approach for representing a dynamical system behavior is physical-based models that are derived on conservation laws, e.g. mass, momentum, energy. These laws include an infinity number of data, that are not explicit data but implicit one. To solve real problems, additional approximate models (drag or friction coefficient, boundary conditions, etc.) are used. Such models introduce some level of simplicity to meet the scale-accuracy trade-off. Also, this approach is generally computationally expensive and requires highly accurate numerical solvers. Replacing such physics-based models with high-performance approximate ones can be based on machine learning (ML) approaches for modeling and identification of dynamical systems.

Another very important approach in dynamical system identification are gray box models. They come in various flavors, using black box models to both infer parameters of a system derived from first principles or using them as error models for such systems. This group of methods requires both statistical data and numerical solvers to build the model and estimate its parameters.

Applying machine learning (ML) methods for dynamical systems learning can avoid numerical solvers completely and extract the complex behaviour of the systems, but this generally requires lots of data for training. The model trained with limited observations is highly likely to lead to unsatisfactory performance. Moreover, no guarantee exists that the black-box model can correctly predict system dynamics for completely new and unseen inputs.

Though some studies demonstrate the application of neural networks (NNs) for physical systems learning [Mohajerin, Jia, Koppe, Bieker, Yu], the described methods require large volumes of measured or simulated data for NN training. The idea of these methods is building surrogate models that can replace physics-based models. Some authors suggest the gray-box models with incorporating NNs into the differential equation to approximate unknown terms. For example, the authors in [Ayed]

propose dynamical systems learning from partial observations with ordinary differential equations (ODEs), while the authors in

[raissi2017physicsII]

add NNs to partial differential equations. A back-propagation technique through an ODE solver is proposed in

[ref9] but requires traditional numerical solvers to simulate the dynamics.

By physics-inspired NNs [Thuerey]

, authors generally mean either incorporating domain knowledge in the traditional NN architectures or providing additional loss functions for the physical inconsistency of the predictions

[PINN]. All these papers use various architectures of NN for model-free systems learning and control [Nagabandi, ChenY]. Moreover, authors do not consider the predictions with inputs that significantly differ from the training data. The physical inconsistency term is estimated only for training data without generalization on unseen inputs.

a) forecasting b) surrogate models c) one-shot learning
Figure 1: Various problems for learning of dynamical system. Solid lines represent training data, dashed lines corresponds to predictions that are required to be produced by a model.

Traditional and state-of-the-art ML/NN models are suitable for either forecasting or building surrogate models (see Fig. 1a, 1b). Forecasting means the extrapolation of dynamics in time, while a surrogate model extrapolates dynamics to new inputs that stem from the same distribution as training data. In the paper, we focus on the problems of one-shot learning of the dynamical systems from only one training sample (Fig. 1c). The model is requested to predict the dynamics for new inputs beyond the training sample. Since the proposed approach is not an incremental extension of previously studied problems and instead formulates a new problem, we do not compare it with state-of-the-art models or the gray-box approach, which are difficult to apply given the lack of training data.

To solve the problem of one-shot learning of dynamical systems, we suggest incorporating prior physical knowledge into the NN to improve data efficiency and the generalization of predictive models. We also completely avoid the numerical solvers for dynamical systems learning. The approach presented in the paper is based on [MLM, TMPNN], where the authors demonstrate how to construct the polynomial neural network (PNN) that approximates the exact system of ODEs and use it for solving differential equations. In contrast to this, we do not target solving exact equations but rely only on an approximate form of the ODEs and the identification of the dynamical system from one training sample.

In the next section, we introduce a brief description of the Taylor mapping approach that is used to translate differential equations into the PNN. Sec. 3 considers simple examples of free fall and nonlinear oscillation to demonstrate the limitations of existing ML and NN models for learning the simplest dynamical systems when only limited data is available. Sec. 4 and 5 describes training of the Taylor map-based PNN (TM-PNN) with one sample. The example of one-shot learning of the real pendulum is described in the Sec. 4 in detail. The same technique is directly adopted to the practical experiments with the largest worldwide X-ray source and the symplectic regularization is introduced in Sec. 5 Sec. 6 briefly describes the application of the discussed methods to other domains where problems of identification of dynamical systems arise widely.

2 Taylor maps for solving ODEs

The transformation defines a Taylor map in form of

(1)

where , matrices are weights, and means the

th Kronecker power of vector

with the same terms reduction. For example, if , then and , . The transformation (1) is linear in weights and nonlinear with respect to the . In the literature, the transformation (1) can be referred to Taylor maps and models [ref_tm1]

, tensor decomposition

[ref_tm2], matrix Lie transform [ref16], exponential machines [ref201], and others. In fact, transformation (1) is just a polynomial regression with respect to the components of

that directly defines the polynomial neuron (see Fig. 

2).


Figure 2: Polynomial neuron of third order nonlinearity.

The Taylor map (1) approximates the general solution of the system of differential equations [ref13, ref15]. If the systems of ODE is known, the weights in (1) can be calculated directly. The initialized from ODEs Taylor map accurately represents the dynamics of the system without the necessity of using numerical solver [TMPNN]. Indeed, for the differential equation with polynomial right-hand side

(2)

one can find the solution in form of (1) by differentiating it with respect to the

where is an independent variable, is a state vector. The last formula combined with the (2) yields a new system of ODEs with respect to the weight matrices

(3)

where are functions of matrices and . For instance, . Solving (3) for gives the dynamics of the system for all initial conditions by means of (1). Since the transformation (1) is assumed to be valid for any initial value , the last equation for does not depend on . Moreover, it can be solved only once with the unified initial condition with

as identity matrix. A more detailed description of the Taylor mapping approach along with the theoretical estimations of accuracy and convergence of the truncated series (

1) for solving systems of ODEs can be found in [ref13, ref15].


Figure 3: Numerical solutions by Euler method

and second order Taylor map.

Figure 4: MSE between numerical solutions and analytical one depending on integrating time step.

For example, let us consider the model of a body free fall with air resistance

(4)

with velocity , mass , the gravitational acceleration , resistant coefficient and for derivative on time. The traditional approach for solving (4) is using numerical step-by-step solvers. For instance, the Euler method with time step for and results

(5)

The Euler method produces a Taylor map (1) of the second order that is calculated from a first-order discretization of (4). Using the described in (1)-(3) algorithm, one can estimate the second-order Taylor map for (4) more precisely:

(6)

To compare the accuracy of maps (5) and (6) one can use the analytical solution [Lindemuth] of the equation (4) :

(7)

Fig. 4 shows the numerical solution of the eq. (4) calculated with the Euler method and Taylor mapping. Both methods result second-order approximation of the exact solution for the time step . Fig. 4 presents the mean squared error (MSE) between numerical and analytical solutions for both methods depending on the time step . One of the Taylor mapping approach advantages is the possibility to solve ODE with a larger step with the necessary level of accuracy in comparison to traditional numerical schemes. For example, the Runge–Kutta method of fourth-order results a 16th order Taylor map for eq. (4), while a true Taylor map of 16th order can be calculated more accurately for the given equation.


a) training solutions

b) predictions based on Random Forest regression

Figure 5: Left plot shows two solutions for free fall of a body with two different masses as training samples. Right plot demonstrates memorization of the dynamics for a surrogate model based on Random Forest Regressor (RFR) trained with these two solutions.

Figure 6: Generalization of the dynamics with a Taylor map trained from data.

3 Training TM-PNN from scratch for simple physical systems

In the paper [TMPNN], it is demonstrated how to use the Taylor mapping technique to construct PNN that solves the given system of ODEs in various domains. In the rest of the paper, we focus on inverse problems, when the physical laws and equations are not known but the measurements from the system are available. The next section corresponds to examples with virtual measurement data that are generated from simple dynamical models. We use such examples only to demonstrate a problem of one-shot learning of dynamical systems with one or a few training samples.

3.1 Free fall of a body

Fig. 5a presents two solution of the system (4) with initial velocity and masses and . These solutions are obtained by integrating the equation with a constant time step during time interval . So, each solution is represented by a univariate time-series . Let us use these data for training an ML-based model and validate its prediction for unseen masses. For example, for Random Forest Regression that is commonly used in practice the behavior is presented in Fig. 5b. The model accurately predicts the known dynamics but can not represent dynamics for unseen masses. The traditional ML-based model just memorizes these two solutions and attempts to predict them whatever masses are used as inputs. To perform well the ML models require lots of training solutions with different masses. Thus, this questions the ability of such a surrogate model to generalize physics rather than to predict some average behavior based on the presented samples.

Since the Taylor map (1) corresponds to some ODEs, one can simply estimate the weights of a Taylor map directly from the data and achieve more physical predictions for unseen inputs. Fig. 6 demonstrates the predictions of a second-order Taylor map that is fitted with the same training solutions:

(8)

In contrast to traditional ML models, this simple Taylor map can predict dynamics for new masses with a fair accuracy. In data-driven training, we do not know the equation and therefore can not calculate weights. Instead, we can estimate weights in a probabilistic way with the given dataset. Though, the estimated from the training data map (8) differs from the true map (6) calculated from the equations, it still represents the dynamics of the system and, more importantly, corresponds to some unknown ODE.


Figure 7: Training (solid line) and testing unseen (dashed lines) data in phase space (left) and time space (right) for Lotka–Volterra system.

3.2 Identification of Lotka–Volterra system

Let us now demonstrate that traditional neural networks are also difficult to apply for one-shot dynamics learning by an example of the Lotka–Volterra system

that describes the predator-prey population dynamics by nonlinear oscillations (see Fig. 15). Starting with an initial point , we calculate discrete states by numerical integrating of the differential equations with a constant time step. After data is generated, the system of ODEs is not used in further training.

We generate four different particular solutions that are presented as time series . As a training set, we use only one solution starting at , while three other solutions with initial coordinates , and are used for testing (see Fig. 15). For clarity, we generate solutions by integrating the equation from to with a constant time step . The problem we address is following. Is it possible to recover dynamics of the whole system for unseen inputs knowing only a particular training solution. We consider three neural network architectures: proposed PNN with a third order map (1

), multilayer perceptron with sigmoid activation functions (MLP) and 5 hidden states, and long shot-term memory network (LSTM) with 5 inner cells.

After fitting NNs by the training particular solution, both MLP and LSTM networks tend to learn the given solution without any kind of generalization (see Fig. 17). They are not able to predict something that has not been presented in training data. At the same time, the proposed PNN predicts unknown dynamics in both nonlinear areas and near-linear oscillation around the stationary point. Moreover, it can even predict the stationary point without oscillation. Perhaps, it is possible to get the same level of generalization as for PNN by applying more intense training or slightly different settings of a state-of-the-art NN, but it is not clear how to achieve it.

For example, we focus on LSTM architecture and try different parameters and training epochs. Namely, we vary the number of inner cells from 1 to 100 along with different regularization parameters and has not achieved the generalization of the dynamics from one sample. Fig. 

9 shows that LSTM either memorizes data or is under-fitted if the number of training epochs is too small or regularization terms are too large. The range of the regularization rate from 0.0 to 1.0 is scanned by the simple grid search procedure but it does not yield any improvement in generalization ability.


Figure 8: Results of training and prediction for the new initial conditions by PNN (left), MLP (center), and LSTM (right). Dashed lines are predictions provided by the models. First row is a phase space, second row is a state space. PNN generalizes dynamics from one sample, while MLP and LSTM memorize the training sample.

1 inner cell 5 inner cells 10 inner cells 100 inner cells
Training without kernel without recurrent without without
epochs regularization L1L2(0.1, 0.1) regularization L1L2(0.9, 0.9) regularization regularization
100 underfitting underfitting underfitting underfitting underfitting underfitting
1000 underfitting underfitting memorization underfitting underfitting underfitting
5000 underfitting underfitting memorization underfitting memorization underfitting
Table 1: Training of LSTM with different hyper-parameters from one solution of Lotka–Volterra system

Traditional neural networks have to be trained with lots of different solutions in order to perform well. It is still an open question if a state-of-the-art neural network can be trained with only one solution of the dynamical system and achieve generalization for other ones. On the other hand, the Taylor map-based PNN is strongly associated with the theory of differential equations and is more suitable for dynamical systems learning. Fig. 10 shows the mean squared error (MSE) between the true solutions and predictions provided by TM-PNN trained with one sample starting at initial conditions . The increasing number of training epochs leads to decreasing MSE for unseen initial conditions.

This section demonstrates from both theoretical and practical points of view that the PNN is a more suitable architecture for dynamical systems learning rather than traditional ML models and neural networks. In this section, we consider virtual noise-free measurement just for demonstration of traditional ML and NN models limitations in the formulated problem of one-shot learning. The ODEs are used only for data generation, while weights of the PNN are estimated from the data without a priory knowledge of physical laws. On the other hand, if the system dynamics follows approximately a system of ODEs, the PNN can be initialized from these ODEs using the Taylor mapping approach (TM-PNN) and additionally fine-tuned with the data. The next sections consider this use case and correspond to real measurements with noisy and partial observations. We also do not provide a comparison with traditional ML and NN models due to their inapplicability for the problem of learning dynamical systems with one sample.


a) memorization of the training solution b) underfitting
Figure 9: Examples of memorization (a) and underfitting (b) of the LSTM with different hyper parameters.

Figure 10: Convergence of the TM-PNN during different number of training epochs for predictions with unseen inputs for one-shot learning of Lotka–Volterra system.

4 Fine-tuning of the real pendulum from idealized mathematical description

Let us explain the proposed approach with a simple example of a real pendulum. Having the measured oscillation of the pendulum for one initial angle, we would like to predict oscillations with new initial angles. In other words, we aim at a generalized model of a real pendulum from only one observation. To measure data, we created a pendulum with targeted length , but the true length became m because of some uncertainties. Instead of operating with the exact length, we consider the initial assumption as only available a priori knowledge about the system.

To measure the pendulum oscillations, we recorded video streams of with . To introduce additional noise, the camera is not calibrated, and the measurements are not filtered. The angle of the pendulum is estimated in each th video frame, while the angular velocity remains non-observable. In this way, each sample of the pendulum oscillation during is represented by time series for with time step . The oscillations are damped, which leads to oscillation amplitude decay.

4.1 Translating the ODE of ideal pendulum into a Taylor map

The simple physics-based model of the pendulum can be described by the differential equation , where is derivative on time, is the angle of the pendulum, and are parameters. This equation can be written in a matrix form up to the third order nonlinearities:

(9)

The mathematical pendulum (9) theoretically continues the oscillation with the same amplitude and frequency indefinitely. Though the behaviour differs from the real damped oscillation, this simplified physics-based model can still initialize the PNN with some level of accuracy. Let us calculate a Taylor map for the time step following the algorithm presented in [TMPNN]:

(10)

By denoting and substituting (10) to (9), one can write

(11)

where can be calculated from the relation . For instance, for ,

Taking the derivative of (10) and comparing it with (11), one can obtain a system of ODEs that do not depend on and represent the dynamics of matrices , and :

(12)

Solving (12) for the time interval with the initial conditions and as the identity matrix results in a Taylor map that describes the dynamics of the ideal pendulum during . For instance, for and , the solution of (12) up to the two digits is

(13)

Figure 11: Predictions of initialized from ODE (9) TM-PNN for different initial angles. The TM-PNN initially represents a rough assumtion about the pendulum dynamics.

Fig. 11 shows that map (10) with weights (13) represents a theoretical oscillation of the mathematical pendulum with . Though the initialized with (13) PNN only roughly approximates the real pendulum, it can be used for the physical prediction of the dynamics starting with arbitrary angles.


Figure 12: Multi-output TM-PNN architecture with shared weights and partial observations.

4.2 One-shot learning of the real pendulum from one observation

Instead of fitting the ODE for the real pendulum, we fine-tune the Taylor map–based PNN (TM-PNN). Since angles are measured every for in total, we constructed a Taylor map–based PNN (TM-PNN) with 49 layers with shared weights initialized with (13). Since the angular velocities are not observable, the loss function is MSE only for angles . The TM-PNN propagates the initial angle along the layers and recovers the angular velocities as latent variables.

The TM-PNN presented in Fig. 12

is implemented as a multi-output model in Keras with a TensorFlow backend. The Adam optimizer with a controlled gradient clipping during 1000 epochs is used for training. The TM-PNN with initial weight (

13) is fine-tuned with one oscillation of the real pendulum with initial angle . Fig. 13 shows the oscillation provided by the initial TM-PNN (13) with a blue solid curve. This infinite oscillation represents the theoretical assumption on the dynamics of the pendulum with length . The orange solid line represents a real pendulum oscillation with true length and the damping of the amplitude that is used for training. The prediction of the fine-tuned TM-PNN is presented by the blue circles.

Figure 13: One-shot tuning of the TM-PNN for real pendulum with initial weights obtained from the theoretical ideal ODEs.

The fine-tuning of the TM-PNN with one oscillation not only increases the accuracy of the prediction for the given training oscillation but also, more importantly, preserves physical consistency for the predictions starting with unseen angles. Fig. 14 compares the predictions provided by the fine-tuned TM-PNN for unseen angles with measurements of real pendulum oscillation. As a result, we have a TM-PNN model that has been trained only with one initial angle and predicts the dynamics for other angles.

Figure 14: Predictions of the fine-tuned TM-PNN with unseen initial angles.

5 One-shot recovering of complex physics in charged particle accelerators

Since Taylor mapping is commonly used in accelerator physics [ref13, SLAC-PUB-9574], demonstrating the advantages of the proposed TM-PNN is beneficial in this field. In this section, the deep TM-PNN is constructed to recover the complex physics of one of the largest worldwide X-ray sources, PETRAIII [article]. We initialize a deep TM-PNN with the theoretical ODEs that describe the dynamics of particles and then fine-tune the TM-PNN with noisy, limited, and partial observation from the real machine.

5.1 Problem formulation

The PETRAIII storage ring consists of 1519 magnets and provides the transportation of the electron beam along the 2.3 km of ring length. For simplicity, we consider the particle motion only in horizontal and vertical planes without considering energy deviation. Thus, the state vector represents the location and velocities of a particle beam, and is propagated consequentially through all the magnets.

The circular particle accelerator transfers the particle beam with initial coordinates at the beginning of the ring to the state at the end of the ring during the first turn. The multi-turn dynamics is represented by consequentially transferring the beam with the coordinates received at the previous turn. For example, for turns, one can write

(14)

One of the most important characteristics of the charged particles motion is the multi-turn frequencies of the oscillation. Having the beam coordinates at each turn , the main frequencies of the multi-turn oscillation in the horizontal and vertical planes can be calculated. Since these main frequencies have an important role in accelerator design and can be considered as an operational regime, it is important to know the true frequencies in the real accelerator.

The dynamics of the beam in the real accelerator differs from that in the theoretical design given lots of imperfections in the construction and operation conditions. For example, Fig. 15 represents the theoretical and real beam tracks during one turn of the ring in the experiment in PETRAIII. In the example, we demonstrate how the proposed approach can be used to recover the multi-turn dynamics of the real accelerator with the help of one measured beam track from only the first turn.

The measurements of the beam during the first turn define one training sample. A total of 246 beam position monitors (BPMs) measure the locations of the beam along the ring. So the training sample is represented by a time series of two variables with 246 stamps that define the first turn of the beam around the ring (Fig. 15).

The main idea of the approach is to train a TM-PNN with one training sample and estimate multi-turn frequencies by replacing the measurements from the real accelerator in (14) with the predictive model. This means that the TM-PNN has to provide accurate predictions not only for a single initial condition but also for new coordinates of the beam at each turn.

Figure 15: Theoretical one-turn beam trajectory (orange line) and measured trajectory (blue line). Each trajectory is represented by 246 stamps where BPMs are located.

5.2 TM-PNN architecture of the particle accelerator

Each of the 1519 magnets is described by a system of ODEs. For instance, for motion in the horizontal plane, one can write an equation in a general form [ref13]

(15)

where means the particle location, is derivative with respect to the length along the lattice, represents the magnetic field, are parameters, and are functions of .

To represent these ODEs as Taylor maps, we limited ourselves to the second order of nonlinearities and built 1519 Taylor maps for each magnet with the help of the OCELOT framework [Agapov:2014yku]. The architecture of the TM-PNN is presented in Fig. 17. There are 1519 layers with unique weights that are not shared. Since there are 246 BPMs located along the ring, the TM-PNN has 246 outputs. Each output represents beam location () in the horizontal and vertical planes; velocities are not observable and considered as latent variables. The initialized from ODEs TM-PNN accurately represents the theoretical assumption of the beam dynamics.

5.3 One-shot tuning of the TM-PNN

To represent system uncertainty in the experiments, we decreased the strength of only one of the 1519 magnets by 20% and measured one beam trajectory for the first turn (see Fig. 15). This trajectory represents a time series with 246 stamps for BPM measurements of the first turn around the ring. For fine-tuning the TM-PNN, we use the following loss function:

(16)

where training data is the measurement of the th BPM in the first turn, is the th output of the TM-PNN with input from the training data, is the rate, and is the symplectic penalty for each layer that is defined by the symplectic property.

The symplectic property [arnold1989mathematical] is an essential invariant that has to be preserved for the physical consistency of the Hamiltonian system. Since the particle motion can be represented by Hamiltonian dynamics, the symplecticity of each hidden layer has to be preserved during training:

(17)

where is an identity matrix and means the transpose. The symplectic property (17) for the TM-PNN leads to algebraic constraints on weights that do not depend on . It guarantees the physical property of the trained model whatever inputs are used. For example, for the second-order Taylor map with where represents the indices of the rows and those of the columns, the condition (17) yields constraints

(18)

with the penalty as the sum of squares of all left-hand terms in (18). Since this penalty does not depend on the inputs, the physical structure of the layers is preserved for all new inputs, which has a large impact on generalization. If the symplectic regularization is not considered during the training or traditional nonphysical L1L2 regularization is used, the tuning of the maps leads to the overfitting of the model, which causes nonphysical predictions.

Figure 16: Multi-output deep TM-PNN for the                       

PETRAIII storage ring.

Figure 17: Frequencies in the horizontal (Qx) and vertical (Qy) planes.

5.4 Physics recovering with the fine-tuned TM-PNN

The fine-tuned TM-PNN accurately represents the real beam track during the first turn with the one initial beam coordinates and preserves the physical consistency of the predictions for arbitrary inputs via symplectic regularization. So, the TM-PNN can simulate multi-turn dynamics in the accelerator by replacing in (14) with the TM-PNN model and predict the main oscillation frequencies.

Since we know exactly which magnet is affected, we can estimate the main frequencies with the physics-based model based on Equation (15). Fig. 17 shows that the true frequencies calculated in the OCELOT coincide with the TM-PNN prediction with fair accuracy. The main horizontal frequency predicted by the TM-PNN in 500 virtual turns has a relative error less than 1%, and the vertical one has an error less than 5%. Note that to calculate true frequencies, one has to know exactly which magnet was affected. In real operation conditions, this information is not available, but the fine-tuned TM-PNN recovers physical properties from partial, noisy, and limited observations from the accelerator.

6 The representation capacity of the TM-PNN

Though the presented approach has some limitations and relies mostly on the specific task of one-shot learning of complex dynamical systems, it can be widely used for dynamical system identification. In science and engineering, there are tasks when it is impossible, for some reason, to collect or simulate enough data to train black-box models. Also, the presented approach targets situations when the physics-based and gray-box models are ineffective in the computational sense given the complexity of the considered systems. Otherwise, parametrizing the system of ODEs and estimating the parameters with statistical methods or gray-box modeling would be easier.

Since the Taylor maps (1) entail the calculation of Kronecker products, limitations are observed in the scalability of the direct application of the technique with extremely high orders. Further research on this topic should be done but hopefully can be based on the existing works. For example, in [Yu], a tensor-train decomposition is adopted to learn low-dimensional representations of the higher-order weight tensors obtained from Kronecker products, while the authors in [ref201] suggest a stochastic Riemannian optimization procedure to train models based on Kronecker products.

On the other hand, the presented technique can be directly adopted for physical systems when high orders have not arisen or have neglected influence. For example, third-order nonlinearities along with a deep architecture of one thousand layers are often enough in charged particle accelerators. Moreover, even the complex behavior of the dynamical chaos can be described by the second-order polynomial map [Henon].

6.1 Dynamical systems with polynomial ODEs of low orderes

The systems of ODEs with polynomial time-independent right-hand sides are widely used in science and engineering. For example, in [6MP], the authors consider modelling the cell metabolism of 6-mercaptopurine, one of the most important chemotherapy drugs. The paper introduces the system of ODEs with ten equations with polynomial nonlinearities up to the third order. The authors indicate that the physics-based model is overcomplicated and requires a knowledge of multiple kinetic parameters. They suggest considering a Boolean network instead but point to its over-simplicity at the same time. Since the TM-PNN has the same representation capacity as ODEs and is represented simply by weights, it can potentially solve this complexity-accuracy trade-off.

The paper [AC] presents the system of ODEs for stability analysis and the control of the nonlinear dynamics of an articulated car-trailer system. The authors point out possible instabilities in motion and control. The TM-PNN can be used for predictive control in real operation conditions. After initializing from the equations of vehicle motion that are known by design, the TM-PNN can be fine-tuned with sensor-based observations continuously in time and guarantee the physical consistency of the model. The symplectic regularization is also applicable to this example of a Hamiltonian system.

6.2 Cavitation as an example of non-polynomial and time-dependent ODEs

This example briefly demonstrates the application of the TM-PNN for dynamical systems that are described by the ODEs with non-polynomial and time-dependent nonlinear right-hand sides. Cavitation is the formation of gas or vapor bubbles in a liquid [cavitation1989]. The growth, collapse, and rebound of a cavitation bubble traveling along the flow is governed by the Rayleigh–Plesset equation:

(19)

where is the bubble radius, is the derivative on time, is surface tension, is viscosity, is the driving frequency, and is the density of the liquid. For simplicity, we consider , and as constant parameters. Though Equation (19) contains non-polynomial nonlinear functions and even depends on time directly, it can be represented in the form that allows one to apply the Taylor mapping technique. Indeed, after the introduction of new variables , Equation (19) can be presented in the polynomial form with a five-dimensional state vector:

(20)

The system (20) is equivalent to (19) and allows the translation to TM-PNN directly. New variables play a role of new features that should be additionally constructed and incorporated into the TM-PNN. This means that given an oscillation from experimental analysis, the TM-PNN can be used for identifying a true model that represents the real bubble oscillation for new unseen conditions.

7 Results and Further Work

Since Taylor maps can be used for solving systems of ODEs with the necessary level of accuracy, the TM-PNN model is suitable for solving inverse problems of system identification from the measured data. We firstly demonstrate that TM-PNN can successfully recover the general solution of the ODE from a particular solution with examples of free fall and Lotka–Volterra oscillation. For the problem of a body free fall, we demonstrate that surrogates models require lots of data to represent the dynamics of the system. The existing and well-developed NN models are also difficult to apply to recover physics from one solution of the dynamical system. In both cases, traditional and state-of-the-art ML and NN models provide unsatisfactory performance for unseen input beyond the training data. For the considered examples, traditional ML and NN are difficult to apply while the proposed TM-PNN can be trained from scratch with limited data.

On the other hand, if the dynamics of a system follows approximately a given differential equation, the Taylor mapping technique can be used to initialize the weights of the TM-PNN. This allows fine-tuning of the model from one training sample of a real system dynamics. We demonstrate in practice for the real pendulum and X-ray source that the proposed technique allows recovering the physical properties of the systems from noisy and partial observations. In these examples, we use prior but inaccurate physical knowledge about the system to initialize TM-PNN. This initial approach of weights is further fine-tuned from one measurement to represent the real system dynamics.

The symplectic regularization is suggested for Hamiltonian systems. The symplectic property is utilized to preserve the physical properties of the TM-PNN during training. The considered regularization penalties significantly differ from the traditional L1 or L2 norm that are used in the ML field. Further comparison and investigation on this topic are required for the learning of dynamical systems.

One of the directions in the development of the proposed model is its utilization for the automatic derivation of the physics-based model. The key point is that the TM-PNN tuned with data corresponds to some unknown system of ODEs. If it is possible to translate TM-PNN to ODEs, this can result in a new physics-based model. Using the described in (1)-(3) algorithm, one can potentially solve a boundary problem and identify the system of ODEs that should be approximately equivalent to Taylor maps extracted from the trained TM-PNN. The translation of a system of ODEs into the TM-PNN implies truncation of a map (1) and requires solving a new differential equation (3

) for the weight. In contrast to this, the inverse problem will probably consist of an integral equation along with the truncation of the right-hand side of the ODEs system (

2).

8 Conclusion

The paper proposes an approach to incorporate physics-based models into the TM-PNN architecture. This allows the preservation of a priori, physical knowledge in the NN and fine-tuning it with one sample. The physics-based structure of the TM-PNN also provides the possibility of easily introducing a physical constraint for the model, which was demonstrated with the example of symplectic regularization.

If nothing is known about the dynamical system in terms of even approximate ODEs, the option of collecting data and training a state-of-the-art black-box model from large data sets is available. The paper does not compare the proposed approach with such methods and NN architectures because of differences in the problem formulations. Also, while training a state-of-the-art NN model with only one time series of 246 stamps of particle accelerator dynamics and achieving physically accurate predictions for long-term dynamics for 500 new unseen time series are possible in theory, such tasks were not solved before in the community and require additional efforts in a separate study.

In contrast to this, the huge amount of physics-based models in the form of ODEs have been developed over the years and describe processes in mechanics, robotics, thermodynamics, fluid mechanics, and other fields. The paper discusses a clear approach for transferring these physics-based models to the NN and avoiding time-consuming numerical solvers and big data sets for training. To process noisy data, the multi-step architecture of the TM-PNN is used. This means that the input of the TM-PNN is only an initial state vector, while dynamics in time is predicted based on previous predictions.

Based on the theoretical equivalence of the ODE systems and the TM-PNN, the paper demonstrates in practice that fine-tuning the TM-PNN from one sample works not only for a simple pendulum but also for a complex particle accelerator with noisy, limited, and partial observations. The example on cavitation demonstrates that the TM-PNN architecture can be widely applied for physical systems with non-polynomial and time-dependent ODEs in areas other than accelerator physics.

Since this is the first time the connection between ODEs and the TM-PNN is presented in terms of the one-shot learning of dynamical systems, further research on the estimation of accuracy, performance, and limitations of the proposed method should be conducted. Also, the symplectic regularization for the Hamiltonian systems should be investigated in more detail and compared with other regularization penalties as this may be helpful for solving and speeding up real physics problems. How accurate the initial assumption of the system of ODEs and the complex TM-PNN architecture is in terms of nonlinear orders or the number of hidden layers required for a given physical problem can also be openly questioned.

References