1 Introduction
Nowcasting refers to the process of producing nearterm predictions of the complex dynamical systems driving our natural world. Such systems have a significant impact on the daily lives and safety of our planets’ population, with important examples including precipitation and sea surface temperature; variables crucial for flood warning and prevention, prediction of cyclogenesis, and maritime safety. A common theme is the dominance of shortterm dynamics by convection – the transport of material along the stream lines of a vector field. Transport and the reduced role of diffusion at these timescales leads to temporal feature preservation, and consequently these systems have received increasing interest from the machine learning (ML) community
[Shi et al., 2015, 2017, Mathieu et al., 2016, Vondrick et al., 2016, Agrawal et al., 2019].The classical approach to solving these problems in the natural sciences is to specify a model of the underling physical properties. A properly calibrated mechanistic model described by parameterised differential operators is able to provide future prediction and generalise to new scenarios, provided this change of physics can be encoded by some parameter. In a lingua franca of ML we would comment on the remarkable generalisation and transferability of fundamental physical theories specified by mathematical models, a demonstration of “the unreasonable effectiveness of mathematics” [Wigner, 1960].
Nevertheless, a successful physical theory often requires multiple simplifying assumptions so removing many of the complexities of realworld phenomena, done correctly this reduction allows the theory to focus on the most salient qualities of the system. However, this reduction can be insufficient for detailed prediction in the most complex open world systems we experience in nature, or else require expensive numerical models [Madec, 2008]. This complexity, combined with the existence of the persistent, informative features that characterise convection has led to the development of deep learning (DL) approaches, notably the dynamics based approaches of [Shi et al., 2015, 2017, Mathieu et al., 2016, Vondrick et al., 2016], or the imagestoimages type approaches adopted by [Ayzel et al., 2019, Lebedev et al., 2019, Agrawal et al., 2019]. As DL methods continue to advance it becomes an open question as to whether one should still attempt to embed mechanistic structure into our models, or if one ought to go “physics free” [Agrawal et al., 2019]. We suggest this stance is information inefficient, ignoring the vast prior scientific knowledge acquired by human endeavour. This viewpoint has led to the emergence of physics informed ML [Raissi et al., 2017, de Bezenac et al., 2018] which aims to combine the expressive power of modern ML methods, with the representative power of fundamental physical models. Typically these employ DL surrogates of the inputoutput map, then ensuring this map is either constructed to obey some physical principle [Cohen et al., 2019, Weiler and Cesa, 2019], or else regularised by a guiding mechanistic model [Raissi et al., 2017, Sirignano and Spiliopoulos, 2018, Berg and Nyström, 2017, Raissi et al., 2019, Yazdani et al., 2019, Zhu et al., 2019]
In this work we add to the argument that physical principles still have an important role to play in modelling by introducing a new physically informed method motivated by two of the guiding principles behind a successful mathematical analysis of a complex dynamical problem; (i) the projection of a high dimensional model onto a lower dimensional representation, and (ii) the existence of a transformation that brings the dynamics closer to linear. We use the feature extraction prowess of DL to perform our reduction, while using physical principles to guide the evolution of these features, constructing a flexible system that is well suited to modelling convection phenomena by ensuring that deep nonlinear components are strongly linked with the minimal physical dynamics.
In summary, we use the powerful feature extraction capabilities of modern ML to perform localised model order reduction by projection onto a subspace spanned by deeplyextracted orthogonal features. We then show how one can combine flexible methods with minimal physical representations to advance the dynamics in this reduced space, crucially we construct our method so as to enable a tighter link between the DLbased encoding and the dynamics than previous approaches. Finally we demonstrate that this synthesis of DL feature extraction and minimal physics outperforms both existing model free approaches and state of the art hybrid approaches on complex real world datasets.
2 Background
The classical approach to model based forecasting predicts the evolution of a variable using past records to calibrate a mechanistic model. These models are typically complex systems of partial differential equations (PDEs), which must be sufficiently rich to embed all relevant physical knowledge. These systems can be used to describe the change in intensity of some target variable, for example sea surface temperature or precipitation rate. Two of the most important physical processes describing these changes are diffusion – a smoothing transformation, and transport – volume preserving translations, and we now provide a brief review of the mathematical models encoding these effects.
(2a)  
(2b)  
(2c) 
2.1 Models for transport and diffusion
Our goal is to introduce a flexible family of models for the evolution of a field variable indexed by spatial coordinates in a region , and temporal coordinate . A starting point for describing diffusion and transport is the PDE (1), which is parameterised by a diffusion coefficient , governing smoothing over time, and a transport vector field, , describing the direction of bulk motion, this a (pseudo)linear PDE and so in principle easy to solve. However, this only defers modelling complexity because one must still motivate an appropriate transport vector field. Mathematically the most well studied models jointly describing convective transport are the system 2 of RayleighBénard equations [Saltzman, 1962, Getling and Spiegel, 1998]. However, solving this system across multiple spatial and temporal scales, even for known parameters, presents a formidable challenge.
Nevertheless, this motivating idea of forming a conditionally linear PDE has been successfully applied by [de Bezenac et al., 2018]. Their approach uses data to inform the vector field , and then use the Gaussian integral kernel obtained by solving (1) to motivate an integralconvolution update of the current image of the field variable. They propose to discard the nonlinear components (2b) – (2c) of the system (2
) when advancing the solution, then reincorporate them in the supervised pertime step loss function
(3) 
where is a loss function on the space of pixelimages, is the predicted surface and is the actual surface. Ultimately, this assumes the linear PDE (2a) is sufficient to capture the dynamics in the original space, instead we shall attempt to project the dynamics onto a space where this assumption becomes more plausible, but first we briefly review how to numerically solve a PDE like (1) or (2).
2.2 RitzGalerkin discretisation of PDEs
It is typically necessary to solve a PDE problem, like (1), numerically via discretisation, one powerful method of doing so is to first multiply the problem by a test function , and then integrate to form a variational problem. It is typical to choose tests functions which vanish on the boundary, and so one arrives at the classical variational form of the PDE problem: find such that
(4) 
for any test function , this is the weak form of the classical problem (1). In order to numerically implement this idea one must also replace the trial and test spaces by finite dimensional subspaces and we take . While there are multiple choices of basis functions , for example one can use the nodal basis functions of the FEM [Reddy, 2005], in this work we shall consider sets of orthonormal basis functions. That is we specify a collection of basis functions such that , where is the inner product.We then search for solutions to (10) with representation , where is a vector of unknown coefficients to be determined. Inserting this representation into (10
) we achieve a finitedimensional projection of the dynamical problem as an ordinary differential equation (ODE)
(5) 
This is the RitzGalerkin projection of the dynamics onto the subspace . We use the notation to make it clear that where as the classical operator in the RHS of (1) depended only up on the coefficient functions parameterising it, the projected problem further depends on the ability of the basis functions to capture information about the problem, entwining the encoding and the dynamics. The less informative the basis functions the higher the dimension of will be needed to faithfully reproduce the dynamics in the function space. In what follows we refer to the process of forming the state dynamic matrix in (5) as the Assembly operation, which involves performing quadratures to compress the parameter fields and basis functions into an matrix.
2.3 Proper orthogonal decompositions
Given an ensemble of field variables over a domain , the proper orthogonal decomposition
(POD) is a technique for extracting an informative modal basis, that is a lowdimensional basis which captures most of the information or “energy” of the ensemble. The decomposition in POD is more familiar to the machine learning community under the name principal components analysis (PCA), or alternatively the KarhunenLoeve decomposition
[Karhunen, 1946, Loeve, 1945], and involves reconstructing elements of the ensemble as where are the firsteigenfunctions of the empirical covariance matrix of the ensemble ordered by largest eigenvalue.
The idea of using the POD eigenfunctions, as an “empircal basis” onto which to perform the above RitzGalerkin projection for modelling turbulent flows was introduced in [Lumley, 1981]. However, the optimality results concern POD as a linear reconstructor, and do not transfer to any optimality on the dynamic prediction problem [Berkooz et al., 1993]. Therefore, in this work we shall instead use deep networks to extract our orthogonal subspace, however motivated by the POD idea we shall attempt to still include some version of this optimal reconstruction to motivate the regulariser in our supervised loss.
3 Methodology
Our objective will be to estimate a future length
sequence of realisations of the process starting from time , using only information coming from the length history process . Infact, we shall also discretise the domain as an pixelated grid, and instead aim to estimate the vectorised field variable . That is we aim to learn an imagestoimages map , which also embodies some minimal dynamical representation. To do so we introduce a DL approach to building a localised version of the PODbasis discussed above.Deep Galerkin Features
In order to project the inputs onto a reduced space it is required to construct a set of functions, , such that , again we shall work with the vectorised images of these functions and replace the orthogonality condition with a quadrature approximation. Furthermore, and without loss of generality, we can rescale the domain to have unit volume so that we seek a collection of vectors such that , under the Euclidean inner product on .
Any given input can then projected onto an dimensional space by the Galerkin projection where is the vector with coefficients , for . While the POD method discussed in the previous section constructs this basis vector using the complete set of inputs, we wish to perform our projection using only the most recent inputs, relying on the power of DL approaches to extract sufficient information from this reduced set. Our chosen feature extractor will be the ubiquitous UNet [Ronneberger et al., 2015], owing to its demonstrated successes in performing feature extraction from image data. Since we seek basis functions we shall consider architectures which takes as input an image sequence of shape , and outputs an image sequence of shape . Our GalerkinBasis model is then a composition of this map, with an orthogonalisation of the outputs and we write
(6) 
Functionally, we have constructed the basis of vectors , which depend only on temporally local values, an overview of the composed process is represented in Fig. 1.
Linear latent convection model
The GalerkinBasis provides our desired nonlinear transformation of the input sequence onto a linear space parameterised by , using only information up to time . For the remainder of the prediction horizon we wish to forward solve the problem using the physically informed convection dynamics on a linear space determined by the extracted features. We shall assume that in this space the dynamics are reasonably well described by the linear PDE (1), and use our learned features to carry out a projection of the dynamics onto a local processes obtained via (5).
It remains to parameterise the dynamics with a diffusion coefficient, and a transport vector field. We shall refer to the model component performing this parameterisation as the Convection model, and we shall also allow it to be informed by the previous observations. Once this component has been specified we shall advance the latent state according to an updating scheme of the form
(7a)  
(7b)  
(7c) 
for . First the previous values of the latent process are used to build new parameters of the transport model (7a), these are then combined with the features to assemble the state dynamic matrix (7b), and finally the linear ODE (5) is solved to give the new state, (7c), and this process is repeated until a complete set of latent states are obtained. Our general approach to parameterisation of the transporting physical model is therefore similar to that in [de Bezenac et al., 2018], but crucially our approach propagates the dynamics in the latent space by way of the Galerkin projection, rather than applied to the complete surface. This leads to an important difference since any surface now has a finite dimensional representation by our convection step (7a) may be taken as a function of the low dimensional latent states, that is we can consider a decoder which takes the lagged latent states and outputs the diffusion and transport field over this is the topright block of Fig. 1(b).
Conversely, the approach taken in [de Bezenac et al., 2018] requires one to first encode the surfaces, and then decode them again to estimate the motion field, see Fig. 1(a). While this step necessarily creates an implicit latent representation as a byproduct of this encode/decode step, this representation has no further role to play in the model dynamics. In contrast, since we have already performed encoding through the Galerkin projection, we require no further compression allowing the latent representations produced by our approach to feature more directly in the dynamical description, see Fig. 1(b).
In summary, our model first uses the output of the GalerkinBasis to construct a set of orthogonal feature images and then uses the RitzGalerkin method to project the dynamics onto this space. The encoded variables are then passed to the Convection model to determine the dynamics, but notably while the convection model in Fig. 1(a) must account for all of the transformation applied to an input image, our model is able to share the dynamical description with the features through the projection (5). This allows for a more parsimonious parameterisation of the the Convection model, and since only this smaller model is called repeatedly during the loop procedure (3) we achieve a more memory efficient implementation. Recognising the partition of our method into an orthogonal decomposition, and then a convection informed forward integration, we shall refer to it as a Deep Orthogonal Decomposition for Convection model, or simply DODConvection.
Reconstruction and supervised loss
Finally, having created a full set of latent states , we may use our Galerkin features to reconstruct the process. Note that we have both the historical states, , and the future states , we also further recall from our discussion of the POD method in Section 2.3 that when combined with the Galerkin features, the historical states should do a good job of reconstructing the inputs, but that the POD method produces the optimal such reconstruction. Together this partition into historical and future states suggests the following decomposition of our supervised loss function
(8a)  
(8b) 
The first two terms of (8b) are analogous to those in (3), playing a similar role with regards to measuring the accuracy of the prediction through a metric , while regularising the divergence of the output of the Convection to ensure a valid divergence free vector field.
The final term is specific to our method, and addresses the optimality condition of the POD basis as discussed in Section 2.3, in particular we commented that when used to reconstruct an ensemble the POD basis gives the minimal reconstruction error for a particular dimension of the subspace. The coefficient therefore controls the extent to which our local basis decomposition attempts to recover a version of this optimally condition. This tradeoff helps ensure that the GalerkinBasis continues to have an identifiable role as an encoder/decoder of the surfaces given its additional role in assembling the discretised differential operator . This dual role of the features is one of the principal strengths of our method, allowing dynamical components lost by the reduction to the overly simplistic PDE (1), to be recaptured by the DLfeatures, at the expense of the optimality of these same features as reconstructors of the surface.
4 Experiments
We now apply our method to the largescale convective systems encountered in the climate sciences in order to assess the accuracy and computational efficiency of our method compared to alternatives. All experiments were conducted on a single Nvidia GeForce RTX 2060 SUPER GPu with 8GB of memory. For additional experiments and visualisations see the supplementary material.
4.1 Convective seasurface temperature
As in [de Bezenac et al., 2018] we demonstrate our model on the SST data obtained from the NEMO (Nucleus for European Modelling of the Ocean) engine [Madec, 2008]. We extract the 64 by 64 pixel subregions shown in Fig. 3
, and use data from 20162018 to train the model, a give a total of 11935 acquisitions. We then test on data from the years 2019 using the same regions giving 3894 test instances. The regions were selected to capture the complicated nonlinear mixing that occurs as warmwater is transported along the gulf stream and meets the colder water of the arctic regions. We assume that on the shortterm horizons we are interested in, the information within a region is sufficient for prediction and do not incorporate information from adjacent regions. Data is normalised to remove trends on a yearly timescale by taking the mean and standard deviations over the day of the year.
We compare the method we have introduced to the physics informed model of [de Bezenac et al., 2018], as well as a convolutional LSTM (ConvLSTM) introduced by [Shi et al., 2015, 2017] which enhances LSTM networks with a spatial convolution to better model spatiotemporal process, and finally to compare with “phyiscsfree” approaches we use a UNet as proposed in [Ayzel et al., 2019, Lebedev et al., 2019, Agrawal et al., 2019]
which treats prediction as an imagestoimages problem, and makes no attempt to embed recurrent dynamical or physical structure. All models were implemented in Tensorflow
[Abadi et al., 2015], apart from [de Bezenac et al., 2018] for which we use publicly available code. ^{1}^{1}1A PyTorch
[Paszke et al., 2019] implementation for the model of [de Bezenac et al., 2018] is available at https://github.com/emited/flowAverage Score (MSE)  No. of Parameters  RunTime [s]  

ConvLSTM[Shi et al., 2017]  0.2179  1,779,073  0.43 
UNet [Lebedev et al., 2019, Agrawal et al., 2019]  0.1452  31,032,446  0.79 
Flow [de Bezenac et al., 2018]  0.1344  22,197,736  0.60 
DODConvec  10,106,339  0.48 
Results are displayed in Table 1, examining the test error for each method we see that our method demonstrates superior performance with a lower test error than the purely datadriven approaches and the more dynamic alternatives. As discussed in Sec. 3 our approach leads to a more parsimonious parameterisation of the convection field than [de Bezenac et al., 2018]. With regards to runtime the UNet model is faster, and more memory efficient than the alternatives, including our own, which make use of a recurrent structure to advance the predictions. In spite of this our method still demonstrates competitive run time, with the increased accuracy a more than acceptable tradeoff.
In Fig. 4 we plot a representative sequence from the test set, in which we observe that our method, row three, seems to be better at identifying the “whorl” pattern formations that characterise turbulence, but that the UNet feature extraction also does a job job of identifying these features, but only on a limit timehorizon with the structure degrading as the sequence continues, this is most noticeable in the loss of structure of the inflowing cold flux in the topright of the images in row four. On the otherhand the method of [de Bezenac et al., 2018] does a better job capturing and preserving linear features, this is likely because this method is ultimately solving a linear PDE (2a), and identifying a convection model from data alone that will do the “violence” [Berkooz et al., 1993] of a nonlinear model from data alone is hard. By projecting to a latent space, and allowing this linear space to be adaptivly determined by nonlinear transformations of the local inputs we are better able to recover nonlinear features with a simpler convection model, and while we do pay a visible price in the smoothness of our reconstructions, overall we achieve better performance as presented in Tab. 1.
5 Related work
Spatiotemporal dynamical modelling Traditional dynamical modelling approaches have been based on either optical flows [Fleet and Weiss, 2006, Woo and Wong, 2017] which estiamte the transport vector field by examining the differenced input sequence, or else numerical models [Béréziat and Herlin, 2015] where the transporting field emerges as the output of a calibrated physical model. More principled physical models have the ability to display complex longrange spatiotemporal dependencies, but deriving and then calibrating such models is significantly harder than the DL and hybrid methods we consider.
Physics free deep learning As noted by [Agrawal et al., 2019] “Physics free" deep learning approaches can be boradly divided into attempts to model the time evolution as a temporal sequence, or else those which treat the problem as learning an unknown “image(s)toimages” map. The former include the convolutional LSTM model of [Shi et al., 2015, 2017], as well as video generation methods [Mathieu et al., 2016, Vondrick et al., 2016] For the latter option UNet based approaches have been popular, [Ayzel et al., 2019, Lebedev et al., 2019] and [Agrawal et al., 2019] who argue that the overly simplistic assumptions in optical flow or poorly specified numerical models are often unviable, so that freeform DL methods display superior performance methods tied to a misspecified physical model. While our work does use an overly simplistic model of the dynamics, by projecting on the latent space we are able to ensure this simplification does not overly constrain the resulting predictions.
Physics informed deep learning Aside from the work of [de Bezenac et al., 2018] which we have discussed in Section 2, we also mention the more complicated UNet architecture inspired by multiscale approximations to the NavierStokes equations in [Wang et al., 2020], since this model uses physical information to design a specific UNet architecture there some overlap with the already mentioned imagestoimages methods. Finally we note that we have advanced the solution to the problem by solving something which closely approximates the solution of a PDE. Alternative approaches [Raissi et al., 2017, Berg and Nyström, 2017, Zhu et al., 2019] have avoided this explicit solve step and instead use flexible DL surrogates of the output map, with the PDE operators reappearing as regularisers. This is a much more dramatic regularisation than the loss we have considered, and we are unaware of this complete operator regularisation scheme having been applied to convective nonlinear systems at the scale considered in the experiments.
6 Discussion
In this work we have combined the powerful feature extraction capabilities of DL, with minimal physical representations of the dynamics to introduce a physically informed model demonstrating superior predictive performance on the convective physical processes that dominate the atmospheric and fluid transport problems pervading our natural world. To the extent that this model is “physically informed” we have a priori specified only a minimal representation of the dynamics, we justify this by our desire to avoid overly strong, and so impossible to justify, assumptions on the complex generating mechanism and therefore maintain as much as possible the ability of the model to learn the underlying dynamics from data. Crucial to this has been our efforts to ensure that the latent representations formed by DLencoding are more strongly entwined with the dynamics of the process than previous approaches, leading to a model that demonstrates superior predictive performance, and improved recovery of temporally persistent nonlinear features.
Broader Impact
Climate change manifests not only as longterm trends, but also as the increased shortterm volatility of these processes, and as such the development of nearterm prediction methods which can be rapidly evaluated and recalibrated is only increasing in importance. Accurate near term prediction of rapidly changing events is vital for public safety in situations including extreme flooding, cyclone formation, and forest fires – all of these systems possessing the informative transport features which would allow our method to be successfully applied. These applications will continue to add to the evidence we have already provided that physical principles still have an important role to play in designing models, and argue against the “physics free” philosophy. We demonstrate that there is much to be gained from strongly connecting learned latent representations with global model dynamics, achieving not only interpretable latent spaces, but also interpretable dynamics on these spaces. The resulting models demonstrate better performance, and have a clearer physical picture, and we envision our work continuing to invite methodological study of the generalisation benefits to be obtained by this stronger entwining between encoded representations and the original dataspace dynamics.
Despite the progress we have made, predicting these systems remains a hard problem, and it is unlikely that any one method will work best in all scenarios and so appropriate aggregation strategies should be adopted. However, our discussion of the sea surface predictions in Sec. 4
demonstrates that our model does a better job of predicting certain nonlinear patterns in convective flows, and so will be an effective complement to those existing methods which perform well on more linear flows. Finally while we have presented an effective supervised learning method, it will be important to extend the work we have done to a generative probabilistic framework to provide further uncertainty quantification.
References
 TensorFlow: largescale machine learning on heterogeneous systems. Note: Software available from tensorflow.org External Links: Link Cited by: §4.1.
 Machine learning for precipitation nowcasting from radar images. External Links: arXiv:1912.12132 Cited by: §B, Table 2, §1, §1, §4.1, Table 1, §5.

All convolutional neural networks for radarbased precipitation nowcasting
. Procedia Computer Science 150, pp. 186 – 192. Note: Proceedings of the 13th International Symposium “Intelligent Systems 2018” (INTELS’18), 2224 October, 2018, St. Petersburg, Russia External Links: ISSN 18770509, Document, Link Cited by: §1, §4.1, §5.  Coupling dynamic equations and satellite images for modelling ocean surface circulation. Communications in Computer and Information Science 550, pp. 191–205. External Links: Link, Document Cited by: §5.
 A unified deep artificial neural network approach to partial differential equations in complex geometries. External Links: arXiv:1711.06464, Document Cited by: §1, §5.
 The proper orthogonal decomposition in the analysis of turbulent flows. Annual Review of Fluid Mechanics 25 (1), pp. 539–575. External Links: Document, Link, https://doi.org/10.1146/annurev.fl.25.010193.002543 Cited by: §2.3, §4.1.
 Gauge equivariant convolutional networks and the icosahedral CNN. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 1321–1330. External Links: Link Cited by: §1.
 Deep learning for physical processes: incorporating prior scientific knowledge. In International Conference on Learning Representations, External Links: Link Cited by: §A, §A, §1, §2.1, 1(a), Figure 2, §3, §3, §4.1, §4.1, §4.1, §4.1, Table 1, §5, footnote 1.

Optical flow estimation.
In
Handbook of Mathematical Models in Computer Vision
, N. Paragios, Y. Chen, and O. D. Faugeras (Eds.), Cited by: §5.  Rayleigh‐bénard convection: structures and dynamics. Columbia University, New York. Cited by: §2.1.
 Zur spektraltheorie stochastischer prozesse. Annales Academiæ Scientiarum Fennicæ, Series A. Cited by: §2.3.
 Precipitation nowcasting with satellite imagery. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, New York, NY, USA, pp. 2680–2688. External Links: ISBN 9781450362016, Link, Document Cited by: Table 2, §1, §4.1, Table 1, §5.
 Fonctions aleatoires de second ordre. Comptes Rendus De L’Académie Des Sciences 220. Cited by: §2.3.
 Coherent structures in turbulence. In Transition and Turbulence, R. E. MEYER (Ed.), pp. 215 – 242. External Links: ISBN 9780124932401, Document, Link Cited by: §2.3.
 NEMO ocean engine. Technical report Institut PierreSimon Laplace IPSL. Cited by: §A, §1, §4.1.
 Deep multiscale video prediction beyond mean square error. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 24, 2016, Conference Track Proceedings, Y. Bengio and Y. LeCun (Eds.), External Links: Link Cited by: §1, §1, §5.
 PyTorch: an imperative style, highperformance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ AlchéBuc, E. Fox, and R. Garnett (Eds.), pp. 8026–8037. External Links: Link Cited by: footnote 1.
 Physicsinformed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686 – 707. External Links: ISSN 00219991, Document, Link Cited by: §1.
 Physics informed deep learning (part i): datadriven solutions of nonlinear partial differential equations. External Links: arXiv:1711.10561 Cited by: §1, §5.
 An introduction to the finite element method. McGrawHill Education, New York. Cited by: Appendix A, §2.2.
 Unet: convolutional networks for biomedical image segmentation. In Medical Image Computing and ComputerAssisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi (Eds.), Cham, pp. 234–241. External Links: ISBN 9783319245744 Cited by: Appendix B, §3.
 Selected papers on the theory of thermal convection with special application to the earth’s planetary atmosphere. Dover. Cited by: §2.1.
 Convolutional lstm network: a machine learning approach for precipitation nowcasting. CoRR. Cited by: §1, §1, §4.1, §5.
 Deep learning for precipitation nowcasting: a benchmark and a new model. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 5617–5627. External Links: Link Cited by: §1, §1, §4.1, Table 1, §5.
 DGM: a deep learning algorithm for solving partial differential equations. Journal of Computational Physics 375, pp. 1339 – 1364. External Links: ISSN 00219991, Document, Link Cited by: §1.
 Generating videos with scene dynamics. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 613–621. External Links: Link Cited by: §1, §1, §5.
 Towards physicsinformed deep learning for turbulent flow prediction. External Links: Link Cited by: §5.
 General e(2)equivariant steerable cnns. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ AlchéBuc, E. Fox, and R. Garnett (Eds.), pp. 14334–14345. External Links: Link Cited by: §1.
 The unreasonable effectiveness of mathematics in the natural sciences. richard courant lecture in mathematical sciences delivered at new york university, may 11, 1959. Communications on Pure and Applied Mathematics 13 (1), pp. 1–14. External Links: Document, Link, https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpa.3160130102 Cited by: §1.
 Operational application of optical flow techniques to radarbased rainfall nowcasting. Atmosphere 8 (5), pp. 48 – 68. Cited by: §5.
 Systems biology informed deep learning for inferring parameters and hidden dynamics. bioRxiv. External Links: Document, Link, https://www.biorxiv.org/content/early/2019/12/04/865063.full.pdf Cited by: §1.
 Multiradar multisensor (mrms) quantitative precipitation estimation: initial operating capabilities. Bulletin of the American Meteorological Society 97 (4), pp. 621–638. External Links: Document, Link, https://doi.org/10.1175/BAMSD1400174.1 Cited by: §B.
 Physicsconstrained deep learning for highdimensional surrogate modeling and uncertainty quantification without labeled data. Journal of Computational Physics 394, pp. . External Links: Document Cited by: §1, §5.
Appendices
Appendix A RitzGalerkin method for PDEs
Since the forward map is unavailable in closed form it becomes necessary to solve the PDE numerically. To accompany the necessarily brief review in the main paper we provide an extended presentation of how to discretise a PDE on the model Poisson problem, though see also [Reddy, 2005] for more information
(9a)  
(9b) 
To discretise one first multiplies (9a) by some test function and then performs an integration by parts, it is standard to further assume that the test functions vanish on the boundary, so one arrives at the classical form of the variational problem: find such that
(10) 
for any test function , this is the weak form of the classical problem (9).
To numerically implement this idea one replaces the trial and test spaces by finite dimensional subspaces and respectively, with corresponding basis functions denoted by , and .
Once we have specified this basis we can then search for solutions with a representation of the form , where is the vector of coefficients parameterising the finitedimensional representation of . Inserting this representation into (10) we arrive at the finitedimensional version on the weakform given by the system of equations
which must hold for each basis vector . We may represent this in equivalent matrixvector notation as , where the stiffness matrix and load vector are given by
(11a)  
(11b) 
Following the same procedure we can arrive at a weakform discretisation of the transport equation (1). In this case the discrete representation analogous to (11a) is given by where
(12) 
Appendix B Network architectures
UNet architectures
We use a UNet [Ronneberger et al., 2015] for both our basis feature extractor, and in its own right as a benchmark DL method. These take the standard UNe form being divided into an encoder and a decoder, the encoder is a sequence of computational units called the Downsample Blocks and the encoder a sequence of Upsample Blocks, long range skips connections are then provided to connect each downsample block with the corresponding upsampling block in the decoding phase. The basis form of these blocks is given by

Downsample Block:

Upsample Block:
where the upsampling operation doubles the pixel resolution in both directions using nearest neighbour interpolation.
The UNet used in the experiments has four downsample and upsample blocks, the smaller UNet used as our feature extractor has three.
Convolution decoder
As discussed in Sec. 3 our method estimates the transport vector field and diffusion coefficient from the latent variables , this is done in two phases, first using a sequence of dense networks, and then convolutions. For the image patches used in the sea surface temperature experiment this network takes the form

Dense phase:

Convolution phase:
where is a densely connected layer with
features and ReLU activation function. Similarly
is a 2D convolution layer with features and a kernel operation. The Upsample operation is the same as for the UNet architecture just described. The Flatten operation takes the sequence of latent states and flattens them into a single vector.The output of this as image, final we split on the last dimension to produce the diffusion field, and the transport vector field, and the diffusion field is passed through an additional softplus layer to ensure positivity.
Appendix C Further experimental details
A NEMO Sea Surface Temperature data
The NEMO (Nucleaus for European Modelling of the Ocean) engine [Madec, 2008] is a stateoftheart framework for modelling the convective processes that drive the worlds oceans, including temperature, salinty, sea ice area and thickness etc., and is provided by the Copernicus Marine Service portal with product identifier GLOBAL_ANALYSIS_FORECAST_PHY_001_024. While it is a simulation engine, it accumulates historical data to produce a synthesised estimate of the state of the system using data assimilation methods, as such the data does follow the true temperature, making this an ideal dataset to apply our method to.
Our dataset is formed by first decomposing a larger study region into 64 by 64 pixel image patches, these are displayed in Figure 3. Our regions differ from those used in [de Bezenac et al., 2018] in that we have selected our grouping to more closely follow the movement of the gulf stream up the Eastern seaboard of North America and then accross the Atlantic, mixing with the colder water from the Arctic regions. This choice allows us to model the most interesting regions, with the complex nonlinear mixing dynamics we are most interested in.
Due to a recalibration of the GLOBAL_ANALYSIS_FORECAST_PHY_001_024 product we only have access to data from January 1st, 2016 onwards and so cannot use the same temporal range as considered in [de Bezenac et al., 2018]. Instead we use data from 2016–2018 as training data, and then the data from 2019 to test on. We also partitioned our data into length 4 input sequences, and then predicted on the next 8 sequences. Leading to 11935 training examples and 3984 test examples, we also test our data on the same set of regions as used for training.
Additional figures In Fig. 5 we display the predictions from our model first presented in Fig 4, but now including also the convection field used to advance our model. This vector field displays both long range stream lines, as well as localised regions around which a particle rotates around a center, furthermore the richness of all of these fields was obtained by decoding a dimensional vector, and not as an imagetoimage transform. This parsimonious deconvolution from the latent space, combined with the ability of our method to learn complex vector fields is the principle component of its demonstrated success on convective fields such as the sea surface temperature data.
B Precipitation nowcasting
For this experiment we consider the problem of providing short term precipitation forecasts from Doppler radar images as considered in [Agrawal et al., 2019]. The data comes from the multiradar multisensor (MRMS) system developed by NOAA National Severe Storms Laboratory, and provides precipitation rate updates at 2 minute intervals over a spatial resolution of 1km 1km. We use the MRMS dataset [Zhang et al., 2016] resampled to 10 minute intervals. The dataset covers the whole of the United States, but we restrict out attention to sixteen regions in the Pacific Northwest, and then further restrict our attention to the two wettest regions in the training set. We train on data from January 2018 and then test on the first week of November 2017 giving 8904 training instances and 2568 test instances. Following [Agrawal et al., 2019] we frame the problem as a classification problem and discretise the precipitation data using three thresholds in units of millimetres of rain per hour, these ranges are given by and , and the model is then trained using a crossentropy loss target these pixel values.
The resulting accuracy on the classification problem is displayed in Table 2, which demonstrate that there is little to choose between the two approaches in this instance. It would appear that once the process has been coarsened in this way there is no significant difference between the physics free method and our physics informed method on the precipitation data, the implications of this are the subject of further study, however it is ultimately reassuring that our method proves no worse on this problem, and indeed significant better on the “fullimage” temperature data reported in Sec. 4
Comments
There are no comments yet.