1 Introduction
Understanding the behavior of dynamical systems is a fundamental problem in science and engineering. Classical approaches of modeling dynamical systems involve formulating ordinary or partial differential equations (ODEs or PDEs) based on various physical principles like conservation laws, profound reasoning, intuition, knowledge and verifying those with experiments and observations. Recent successes of machine learning methods in complex sequence prediction tasks along with development in sensor technologies and computing systems motivate to predict the evolution of dynamical system directly from observation data without rigorous formalization and experiments by human experts Sutskever et al. (2014); Chung et al. (2014); Xingjian et al. (2015); Finn et al. (2016); Kumar et al. (2016); Lee et al. (2017); Minderer et al. (2019). Consequently, a number of machine learning models which incorporate knowledge from physics or applied mathematics, have been introduced for modeling complex dynamical systems Schaeffer (2017); Rudy et al. (2017); Raissi (2018); Long et al. (2018b, a); de Bezenac et al. (2019).
Realworld dynamical systems are often subjected to perturbation from timevarying external sources. Figure 1(a) shows an example where an elastic membrane under tension is being perturbed with an independent timevarying pressure. The undulation of the membrane can be observed as a regularly sampled spatiotemporal sequence. However, the spatiotemporal variation in the source term is often not known and not observable; but it couples with the wave propagation system to determine the undulation in the membrane. Moreover, although the basic governing dynamics (wave equation) of the system is known, the physical parameters such as propagation speed in that particular medium are often unknown. In such scenarios, we need to be able to predict spatiotemporal evolution of dynamical systems from partial knowledge of the governing dynamics and limited observability of the factors that influence the system behaviors.
In this paper, we consider to model a generic PDEbased dynamical system which is perturbed with external sources that follow another independent dynamics. Our goal is to design a neural network model that can be used for longterm prediction of the entire systems, as well as of the source dynamics. We assume the physical quantity that follows the combined dynamics can be observed as regularly sampled spatiotemporal sequence, but the source or perturbation is not observable separately. It is further assumed that we know what type of system we are observing and therefore scientific knowledge of such system can be incorporated in the model. In particular, we assume that the analytical form the underlying PDE is known a priori, but the physical parameters of the system are unknown.
We propose PhysicsIncorporated Convolutional Recurrent Neural Networks (PhICNet) that combine physical models with datadriven models to learn the behavior of dynamical systems with unobservable timevarying external source term. Figure 1
(b) shows the highlevel diagram of a PhICNet cell. The basic concept of PhICnet is based on two key contributions. First, a generic homogeneous PDE of any temporal order can be mapped into a recurrent neural network (RNN) structure with trainable physical parameters. Second, the basic RNN structure for homogeneous PDE can be modified and integrated with a residual encoderdecoder network to identify and learn the source dynamics. The RNN structure stores the homogeneous solution of the underlying PDE as an internal state which is then compared with the input (observation) at the next step to find out if there exists some source term or perturbation. The residual encoderdecoder network learns to propagate the source term as it is in case of constant perturbation or predict its progress in case of dynamic perturbation. The PhICNet cell stores the estimated perturbation or source term as an internal cell which can be used to understand behaviour of source dynamics. The integrated model can be trained endtoend using stochastic gradient descent (SGD) based backpropagation through time (BPTT) algorithm.
Few existing models in literature can be used or extended to perform the spatiotemporal sequence prediction of dynamical systems with timevarying independent source. Pure datadriven models like ConvLSTM Xingjian et al. (2015), residual networks He et al. (2016); Mao et al. (2016) can be used directly, but these models lack consideration of underlying physical dynamics resulting in limited accuracy. Furthermore, these pure datadriven models cannot identify the source dynamics separately. Deep hidden physics models (DHPM) Raissi (2018) can model the underlying homogeneous PDE, but does not consider any nonlinear source term. A polynomial approximation can be added in DHPM to model nonlinear internal source. This strategy is used in PDENet Long et al. (2018b) to incorporate nonlinear source term. Although they consider only internal source term that is a nonlinear function of the observed physical quantity, we discuss later a minor modification will allow us to extend the model to consider independent source dynamics. DHPM, PDENet only consider PDEs that are firstorder in time, although can be extended to model higher temporal order systems if we know the temporal order a priori.
Our approach fundamentally differs from the past approaches as we model the system as an RNN that couples the known and unknown dynamics within the hidden cells and enables endtoend training. We evaluate our model along with other baselines for two types dynamical systems: a heat diffusion system and a wave propagation system. Experiments show that our model provides more accurate prediction compared to the baseline models. Furthermore, we show that our model can predict the source dynamics separately which is not possible with other models.
2 Related Work
2.1 RNNs for Dynamical Systems
Several studies have interpreted RNNs as approximation of dynamical systems Funahashi and Nakamura (1993); Li et al. (2005); Liao and Poggio (2016); Chen et al. (2018). Recently, a number of RNN architectures have been proposed for datadriven modeling of dynamical systems. Trischler and D’Eleuterio (2016)
proposed an algorithm for efficiently training RNNs to replicate dynamical systems and demonstrated its capability to approximate attractor dynamical systems. A class of RNNs, namely TensorRNNs, has been proposed in
Yu et al. (2017a, b) for longterm prediction of nonlinear dynamical systems. Yeo and Melnyk (2019) use LSTM for longterm prediction of nonlinear dynamics.2.2 Learning PDEs from Data
Recently, numerous attempts have been made on datadriven discovery of PDEbased dynamical systems. Schaeffer (2017); Rudy et al. (2017) use sparse optimization techniques to choose the best candidates from a library of possible partial derivatives of the unknown governing equations. Raissi and Karniadakis (2018) proposed a method to learn scalar parameters of PDEs using Gaussian process. A deep neural network is introduced in Raissi et al. (2017) to approximate the solution of a nonlinear PDE. The predicted solution is then fed to a physicsinformed neural network to validate that solution. The physicsinformed neural network is designed based on the explicit form of the underlying PDE which is assumed to be known except for a few scalar learnable parameters. Raissi (2018) extended Raissi et al. (2017)
to replace the known PDEbased neural network to a generalized neural network which discovers the dynamics of underlying PDE using predicted solution and its derivatives. The inputs of the neural network are the partial derivatives up to a maximum order which is considered as a hyperparameter.
Long et al. (2018b) introduced PDENet that uses trainable convolutional filters to perform differentiations. Filters are initialized as differentiating kernels of corresponding orders, and trained by imposing some constraints to maintain differentiating property. They assumed that the maximum order of derivative is known a priori. In PDENet 2.0 Long et al. (2019), a symbolic neural network is integrated with original PDENet to uncover more complex analytical form. de Bezenac et al. (2019)proposed a convolutional neural network (CNN) that incorporates prior scientific knowledge for the problem of forecasting sea surface temperature. They design their model based on the general solution of the advectiondiffusion equation.
Long et al. (2018a) studied a problem similar to ours where the source or perturbation term of the PDE follows another dynamics. They mapped the known PDE into a cellular neural network with trainable physical parameters and integrate that with ConvLSTM Xingjian et al. (2015) that models the source dynamics. However, they assumed that the source or perturbation is observable and they train the two networks separately.3 Problem Description
We consider dynamical systems governed by the following generic inhomogeneous PDE
(1) 
is the observed physical quantity at the spatial location at time . corresponds to the physical parameters of the system. For example, if we are studying a diffusive system, then corresponds to the diffusivity of the medium. is the source term or perturbation at location at time which is governed by another independent first order dynamics delineated by:
(2) 
Our goal is to predict the spatiotemporal evolution of the system jointly defined by equation 1 and equation 2.
Assumptions We make following assumptions about the problem:

We have the a priori knowledge about what type of physical quantities we are observing and how such system behave in absence of any external perturbation or source. In other words, we know the analytical form of function and the temporal order .

Physical parameters of the system are not known to us.

The source term is constant or follows an unknown first order (temporal) dynamics. The perturbation is not observable or cannot be computed directly from the observed quantity as are unknown.
This problem can be formulated as a spatiotempotral sequence prediction problem. Suppose the observation space is discretized into a grid and is the observed map at timestep . We aim to design a physicsincorporated convolutionalRNN :
(3) 
such that is minimized. is the predicted map at timestep and .
is the loss function between observed map and predicted map. It is noteworthy that we need
previous maps, instead of just , to predict the next map in an order (temporal) system because the source/perturbation is unknown and source follows an first order dynamics.4 The PhICNet Model: Foundation and Design
4.1 Background on Recurrent Neural Networks
The recurrent neural network (RNN) is an elegant generalization of feedforward neural networks to incorporate temporal dynamics of data Rumelhart et al. (1986); Werbos (1990); Schuster and Paliwal (1997). The RNN and its various evolved topologies have proven efficacious in numerous sequence modelling tasks Pearlmutter (1989); Robinson (1994); Hochreiter and Schmidhuber (1997); Sutskever et al. (2014); Xingjian et al. (2015)
. At each time step, an input vector
is fed to the RNN. The RNN modifies its internal state based on the current input and previous internal state. The updated internal state is then used to predict the output . The following set of equation (equation 4) delineates the computation inside a standard RNN.(4) 
are the weight matrices of the RNN and
are bias vectors.
andare activation functions like
. Temporal update in the internal state allows the RNN to make use of past information while predicting the current output.The input , internal state and output of standard RNN are all 1D vectors and the operations are fullyconnected. To deal with 2D image data, Xingjian et al. proposed convolutional LSTM Xingjian et al. (2015) that uses convolutional operations instead of fullyconnected operations of standard LSTM Hochreiter and Schmidhuber (1997), an evolved variant of the RNN. Incorporating convolutional operations in RNN, we can write the computation inside a convolutionalRNN as follows:
(5) 
where, , and are the input, internal state and output, respectively, of the convolutionalRNN at time step and are all 2D images. ‘*’ denotes the convolution operator.
4.2 Proposed RNN Model for Generic Homogeneous PDE
Figure 2(a) illustrates the structure of the RNN we propose for modeling a generic homogeneous PDE (i.e. zero source term). Inputs to the RNN cell are 2D observation maps . The RNN cell keeps an memory that stores the past information required for current step prediction. The concept of cell memory was introduced in LSTM Hochreiter and Schmidhuber (1997). Cell memory in LSTM is controlled by several selfparameterized gates to learn what information to store and what information to forget. In contrast, past information required to be stored in the cell memory in our physicsincorporated RNN is completely determined beforehand based on the known temporal order (in equation 1) of the observed system. For an order (temporal) system, the cell memory stores the current and past observed maps. At time step , the state of the cell memory (we will call it cell state from now on) defined by the following equation
(6) 
where denotes the concatenation operation along a new dimension. The cell state can be seen as a 3D tensor (. Cell state at current time step can be written as a function of previous cell state and current input :
(7) 
is a 2D square matrix of order and the element of , and , is defined as follows.
(8) 
‘’ denotes a matrixtensor product resulting in a tensor such that
(9) 
The operator between 2D observation map and 1D vector performs a vectormatrix product to yield a tensor such that
(10) 
Cell state and input are used to compute as follows.
(11) 
is determined by the temporal order of the dynamics. The elements of are the coefficients of past observation maps in the finite difference approximation of and are given by the following equation.
(12) 
‘’ denotes a vectortensor product resulting in a 2D matrix such that
(13) 
Function (in equation 11) is the implementation of (in equation 1) for discretized observation maps. As mentioned in section 3, the analytical form of or is known to us, but the physical parameters are unknown and trainable. Spatial derivatives of observation map are computed as convolution with differential kernels. denotes the differential kernel corresponds to . The size and elements of a differential kernel are determined by the finite difference approximation of corresponding derivative. represents the solution of the system that is governed by a homogenous PDE. In other words, corresponds to the predicted map at timestep when there is no source term or perturbation.
4.3 Approach to Incorporate Source Dynamics
The basic structure of RNN for homogeneous PDE needs to be modified to incorporate dynamic source term. Figure 2(b) shows the modified structure (PhICNet). An internal state is added in the cell that estimates the perturbation from predicted homogeneous solution from the previous step and the current input. is computed as follows.
(14) 
Finally, the predicted map is computed by the following equation.
(15) 
Function , the implementation of fuction (in equation 2) for discretized source map, captures the source dynamics. We use a residual convolutional network for this purpose such that
(16) 
is the predicted source map which is added to the homogeneous solution to get the predicted map (equation 15).
The motivation behind using residual network for modeling the source dynamics is that several studies have established the connection between residual networks and differential equations Weinan (2017); Lu et al. (2018); Chang et al. (2018); Chen et al. (2018). We use an architecture similar to residual encoderdecoder network (REDNet) Mao et al. (2016). Figure 3 shows the architecture with convolutional and transposed convolutional blocks. Each convoltutional block consists of two convolutional layers. Similarly, each transposed convolutional block comprises two transposed convolutional layers. Convolutional encoder extracts feature at different scales. These feature maps are used by the transposed convolutional decoder with symmetric skip connections from corresponding convolutional block to capture dynamics at different scales. Skip connection also allows to use deeper network for complex dynamics without encountering the problem of vanishing gradient.
The computation at the convoltional block is given by the following equation.
(17) 
The computation at the transposed convolution block from the end is delineated by the following equation.
(18) 
The predicted source map is given by . denotes the transposed convolution operation and is the activation function .
4.4 Training Loss
For a sequence of observation maps and order (temporal) system, the prediction loss is defined as follows.
(19) 
Estimated source map , after observing at timestep , should match with predicted source map from previous timestep. Accordingly, we add a source prediction loss to the training objective.
(20) 
Furthermore, source map can be densely distributed or sparse (may contain only a single source). To deal with source map sparsity, we add a penalty :
(21) 
The overall loss for training is , where is a hyperparameter.
5 Experimental Evaluation
We evaluate our model on two dynamical systems: heat diffusion system and wave propagation system. Heat diffusion system has temporal order of 1, while wave propagation system is a second order system. We compare our model with ConvLSTM Xingjian et al. (2015), PDENet Long et al. (2018b), a residual encoderdecoder network. As we mentioned before, PDENet cannot be used directly for this problem because of the dynamic source term. In original PDENet, the authors use polynomial approximation on observed physical quantity to model internal source term. As the source term in our problem is external and dynamic, we perform the polynomial approximation on the difference between two consecutive observation maps and we call this modified model PDENet*. In contrast to the residual encoderdecoder used in our model to predict only the source dynamics, the baseline residual encoderdecoder network (REDNet) models the combined dynamics. Accordingly, the input and output of the baseline residual encoderdecoder network are observation maps (). We use for residual encoderdecoder networks. For PDENet* and REDNet baselines, we assume the temporal order of the system is known a priori, i.e. for an order system, input to the models is the sequence while predicting . We employ Adam optimizer(with initial learning rate of 0.001) to train the models.
All the models are implemented in PyTorch framework
Paszke et al. (2019). We run all the experiments on a computer equipped with NVIDIA GTX 1080Ti GPU.5.1 Heat Diffusion System
Heat diffusion at the surface of a material is described by:
(22) 
where is the heat density at location at time and is the perturbation due to heat source(s). is the thermal diffusivity of the material. Equation 22 is one of the fundamental PDEs and is used to describe diffusion of heat, chemicals, brownian motion, diffusion models of population dynamics, and many other phenomena Strauss (2007).
The computation space is discretized into regular mesh, i.e. . For heatsource, we consider the source map is divided into 16 equalsized blocks initialized with random values in . All grid points belonging to a block take same value at any time step. The evolution of source map happens in the block level. Each block follows a dynamics given by:
(23) 
where denotes the value of block at timestep , is a constant and represents the 4connected neighborhood of block . Figure 4 shows an example of source map at the initial and final time step. Training and test dataset are generated using numerical solution method starting from initial condition and assuming homogeneous Dirichlet boundary condition. Each sequence is comprises frames and the training set contains such sequences while the test set contains .
In this system, the trainable parameters are diffusivity and the parameters of the residual encoderdecoder network used to model the source dynamics. Since we need secondorder spatial derivatives (equation 22), the minimum size of the corresponding differential kernels should be . Specifically, following two differential kernels are used to compute in equation 11.
(24) 
We use SignaltoNoise Ratio (
), defined in equation 25, as metric for quantitative comparison among different models and ground truth.(25) 
Figure 5(a) shows the comparison over time steps for the heat diffusion system. Qualitative comparison of predicted heat maps by different models along with ground truth is depicted in Figure 6(a). PhICNet outperforms all the baselines. REDNet is the best performing baseline. Effective modeling of dynamics by REDNet is a key factor in the performance of our model as well since we use it for source dynamics modeling. Source maps predicted by our model are compared with ground truth in Figure 7(a).
5.2 Wave Propagation System
Undulation in a stretched elastic membrane due to some perturbation can be described by:
(26) 
where is the deflection at location at time and is the external perturbation. is the wave propagation speed. We assume a circular perturbation at the center of the membrane which follows a dynamics given by:
(27) 
where is a positive constant.
Similar to heat diffusion system, the computation space is discretized into regular mesh, i.e. . Unlike the source map considered for the heat system, the source map for this wave system is sparse as the perturbation is applied only at a small area at the center. is initialized with random values in . Training and test dataset are generated using numerical solution method starting from initial condition and assuming homogeneous Dirichlet boundary condition. Each sequence is comprises frames and the training set contains such sequences while the test set contains . Trainable parameters for this system include propagation speed and the parameters of residual encoderdecoder network that is used to model the source dynamics. The differential kernels used for this system are same as equation 24. Quantitative and qualitative performance of different models for wave propagation system are shown in Figure 5(b) and Figure 6(b) respectively. Figure 7(b) compares the source maps predicted by our model with ground truth.
5.3 Online Learning of Timevarying Physical Parameters
In all aforementioned experiments, we have considered unknown but constant physical parameters which are learned in conjunction with unknown source dynamics. However, physical parameters of realworld physical systems are often not fixed and can change over time. In this experiment, we vary the physical parameter of the system over time and investigate if the trained model can adapt with new values of the physical parameter online. At each time step, the current observation is used to retune the physical parameters of the system, while the other parameters are kept frozen. Figure 8 shows that our model can quickly adapt with changes in physical parameters.
6 Conclusion
We developed a physicsincorporated recurrent neural network PhICNet for spatiotemporal forecasting of dynamical systems with timevarying independent source. Besides forecasting the combined dynamics, our model is also capable of predicting the evolution of source dynamics separately. PhICNet is generalized to a class of generic PDEs perturbed with source that follows an independent firstorder dynamics. It is possible to incorporate higher order source dynamics by combining estimated source maps from past few time steps for current step prediction of source map. We aim to investigate this as a future work. Furthermore, the assumption on knowledge of analytical from of the underlying PDE can also be relaxed by allowing trainable convolution kernels like PDENet.
References

Reversible architectures for arbitrarily deep residual neural networks.
In
ThirtySecond AAAI Conference on Artificial Intelligence
, Cited by: §4.3.  Neural ordinary differential equations. In Advances in neural information processing systems, pp. 6571–6583. Cited by: §2.1, §4.3.
 Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. Cited by: §1.
 Deep learning for physical processes: incorporating prior scientific knowledge. Journal of Statistical Mechanics: Theory and Experiment 2019 (12), pp. 124009. Cited by: §1, §2.2.
 Unsupervised learning for physical interaction through video prediction. In Advances in neural information processing systems, pp. 64–72. Cited by: §1.
 Approximation of dynamical systems by continuous time recurrent neural networks. Neural networks 6 (6), pp. 801–806. Cited by: §2.1.

Deep residual learning for image recognition.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 770–778. Cited by: §1.  Long shortterm memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §4.1, §4.1, §4.2.

Ask me anything: dynamic memory networks for natural language processing
. In International conference on machine learning, pp. 1378–1387. Cited by: §1.  Desire: distant future prediction in dynamic scenes with interacting agents. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 336–345. Cited by: §1.
 Approximation of dynamical timevariant systems by continuoustime recurrent neural networks. IEEE Transactions on Circuits and Systems II: Express Briefs 52 (10), pp. 656–660. Cited by: §2.1.
 Bridging the gaps between residual learning, recurrent neural networks and visual cortex. arXiv preprint arXiv:1604.03640. Cited by: §2.1.
 HybridNet: integrating modelbased and datadriven learning to predict evolution of dynamical systems. In Conference on Robot Learning, pp. 551–560. Cited by: §1, §2.2.
 PDEnet 2.0: learning pdes from data with a numericsymbolic hybrid deep network. Journal of Computational Physics 399, pp. 108925. Cited by: §2.2.
 PDEnet: learning pdes from data. In International Conference on Machine Learning, pp. 3208–3216. Cited by: §1, §1, §2.2, §5.
 Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. In International Conference on Machine Learning, pp. 3282–3291. Cited by: §4.3.
 Image restoration using very deep convolutional encoderdecoder networks with symmetric skip connections. In Advances in neural information processing systems, pp. 2802–2810. Cited by: §1, §4.3.
 Unsupervised learning of object structure and dynamics from videos. In Advances in Neural Information Processing Systems, pp. 92–102. Cited by: §1.
 PyTorch: an imperative style, highperformance deep learning library. In Advances in Neural Information Processing Systems, pp. 8024–8035. Cited by: §5.
 Learning state space trajectories in recurrent neural networks. Neural Computation 1 (2), pp. 263–269. Cited by: §4.1.
 Hidden physics models: machine learning of nonlinear partial differential equations. Journal of Computational Physics 357, pp. 125–141. Cited by: §2.2.
 Physics informed deep learning (part ii): datadriven discovery of nonlinear partial differential equations. Cited by: §2.2.
 Deep hidden physics models: deep learning of nonlinear partial differential equations. The Journal of Machine Learning Research 19 (1), pp. 932–955. Cited by: §1, §1, §2.2.

An application of recurrent nets to phone probability estimation
. IEEE transactions on Neural Networks 5 (2). Cited by: §4.1.  Datadriven discovery of partial differential equations. Science Advances 3 (4), pp. e1602614. Cited by: §1, §2.2.
 Learning representations by backpropagating errors. Nature 323 (6088), pp. 533–536. Cited by: §4.1.
 Learning partial differential equations via data discovery and sparse optimization. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 473 (2197), pp. 20160446. Cited by: §1, §2.2.
 Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45 (11), pp. 2673–2681. Cited by: §4.1.
 Partial differential equations: an introduction. John Wiley & Sons. Cited by: §5.1.
 Sequence to sequence learning with neural networks. Advances in NIPS. Cited by: §1, §4.1.
 Synthesis of recurrent neural networks for dynamical system simulation. Neural Networks 80, pp. 67–78. Cited by: §2.1.
 A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics 5 (1), pp. 1–11. Cited by: §4.3.
 Backpropagation through time: what it does and how to do it. Proceedings of the IEEE 78 (10), pp. 1550–1560. Cited by: §4.1.
 Convolutional lstm network: a machine learning approach for precipitation nowcasting. In Advances in neural information processing systems, pp. 802–810. Cited by: §1, §1, §2.2, §4.1, §4.1, §5.
 Deep learning algorithm for datadriven simulation of noisy dynamical system. Journal of Computational Physics 376, pp. 1212–1231. Cited by: §2.1.
 Longterm forecasting using tensortrain rnns. arXiv preprint arXiv:1711.00073. Cited by: §2.1.
 Learning chaotic dynamics using tensor recurrent neural networks. In ICML Workshop on Deep Structured Prediction, Vol. 17. Cited by: §2.1.
Comments
There are no comments yet.