Deep Orthogonal Decompositions for Convective Nowcasting

06/28/2020 ∙ by Daniel J. Tait, et al. ∙ The Alan Turing Institute 0

Near-term prediction of the structured spatio-temporal processes driving our climate is of profound importance to the safety and well-being of millions, but the prounced nonlinear convection of these processes make a complete mechanistic description even of the short-term dynamics challenging. However, convective transport provides not only a principled physical description of the problem, but is also indicative of the transport in time of informative features which has lead to the recent successful development of “physics free” approaches to the now-casting problem. In this work we demonstrate that their remains an important role to be played by physically informed models, which can successfully leverage deep learning (DL) to project the process onto a lower dimensional space on which a minimal dynamical description holds. Our approach synthesises the feature extraction capabilities of DL with physically motivated dynamics to outperform existing model free approaches, as well as state of the art hybrid approaches, on complex real world datasets including sea surface temperature and precipitation.



There are no comments yet.


page 6

page 8

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Nowcasting refers to the process of producing near-term predictions of the complex dynamical systems driving our natural world. Such systems have a significant impact on the daily lives and safety of our planets’ population, with important examples including precipitation and sea surface temperature; variables crucial for flood warning and prevention, prediction of cyclogenesis, and maritime safety. A common theme is the dominance of short-term dynamics by convection – the transport of material along the stream lines of a vector field. Transport and the reduced role of diffusion at these time-scales leads to temporal feature preservation, and consequently these systems have received increasing interest from the machine learning (ML) community

[Shi et al., 2015, 2017, Mathieu et al., 2016, Vondrick et al., 2016, Agrawal et al., 2019].

The classical approach to solving these problems in the natural sciences is to specify a model of the underling physical properties. A properly calibrated mechanistic model described by parameterised differential operators is able to provide future prediction and generalise to new scenarios, provided this change of physics can be encoded by some parameter. In a lingua franca of ML we would comment on the remarkable generalisation and transfer-ability of fundamental physical theories specified by mathematical models, a demonstration of “the unreasonable effectiveness of mathematics” [Wigner, 1960].

Nevertheless, a successful physical theory often requires multiple simplifying assumptions so removing many of the complexities of real-world phenomena, done correctly this reduction allows the theory to focus on the most salient qualities of the system. However, this reduction can be insufficient for detailed prediction in the most complex open world systems we experience in nature, or else require expensive numerical models [Madec, 2008]. This complexity, combined with the existence of the persistent, informative features that characterise convection has led to the development of deep learning (DL) approaches, notably the dynamics based approaches of [Shi et al., 2015, 2017, Mathieu et al., 2016, Vondrick et al., 2016], or the images-to-images type approaches adopted by [Ayzel et al., 2019, Lebedev et al., 2019, Agrawal et al., 2019]. As DL methods continue to advance it becomes an open question as to whether one should still attempt to embed mechanistic structure into our models, or if one ought to go “physics free” [Agrawal et al., 2019]. We suggest this stance is information inefficient, ignoring the vast prior scientific knowledge acquired by human endeavour. This viewpoint has led to the emergence of physics informed ML [Raissi et al., 2017, de Bezenac et al., 2018] which aims to combine the expressive power of modern ML methods, with the representative power of fundamental physical models. Typically these employ DL surrogates of the input-output map, then ensuring this map is either constructed to obey some physical principle [Cohen et al., 2019, Weiler and Cesa, 2019], or else regularised by a guiding mechanistic model [Raissi et al., 2017, Sirignano and Spiliopoulos, 2018, Berg and Nyström, 2017, Raissi et al., 2019, Yazdani et al., 2019, Zhu et al., 2019]

In this work we add to the argument that physical principles still have an important role to play in modelling by introducing a new physically informed method motivated by two of the guiding principles behind a successful mathematical analysis of a complex dynamical problem; (i) the projection of a high dimensional model onto a lower dimensional representation, and (ii) the existence of a transformation that brings the dynamics closer to linear. We use the feature extraction prowess of DL to perform our reduction, while using physical principles to guide the evolution of these features, constructing a flexible system that is well suited to modelling convection phenomena by ensuring that deep nonlinear components are strongly linked with the minimal physical dynamics.

In summary, we use the powerful feature extraction capabilities of modern ML to perform localised model order reduction by projection onto a subspace spanned by deeply-extracted orthogonal features. We then show how one can combine flexible methods with minimal physical representations to advance the dynamics in this reduced space, crucially we construct our method so as to enable a tighter link between the DL-based encoding and the dynamics than previous approaches. Finally we demonstrate that this synthesis of DL feature extraction and minimal physics outperforms both existing model free approaches and state of the art hybrid approaches on complex real world data-sets.

2 Background

The classical approach to model based forecasting predicts the evolution of a variable using past records to calibrate a mechanistic model. These models are typically complex systems of partial differential equations (PDEs), which must be sufficiently rich to embed all relevant physical knowledge. These systems can be used to describe the change in intensity of some target variable, for example sea surface temperature or precipitation rate. Two of the most important physical processes describing these changes are diffusion – a smoothing transformation, and transport – volume preserving translations, and we now provide a brief review of the mathematical models encoding these effects.

(a) Pseudo-linear transport
(b) Rayleigh - Bénard convection

2.1 Models for transport and diffusion

Our goal is to introduce a flexible family of models for the evolution of a field variable indexed by spatial coordinates in a region , and temporal coordinate . A starting point for describing diffusion and transport is the PDE (1), which is parameterised by a diffusion coefficient , governing smoothing over time, and a transport vector field, , describing the direction of bulk motion, this a (pseudo-)linear PDE and so in principle easy to solve. However, this only defers modelling complexity because one must still motivate an appropriate transport vector field. Mathematically the most well studied models jointly describing convective transport are the system 2 of Rayleigh-Bénard equations [Saltzman, 1962, Getling and Spiegel, 1998]. However, solving this system across multiple spatial and temporal scales, even for known parameters, presents a formidable challenge.

Nevertheless, this motivating idea of forming a conditionally linear PDE has been successfully applied by [de Bezenac et al., 2018]. Their approach uses data to inform the vector field , and then use the Gaussian integral kernel obtained by solving (1) to motivate an integral-convolution update of the current image of the field variable. They propose to discard the non-linear components (2b) – (2c) of the system (2

) when advancing the solution, then reincorporate them in the supervised per-time step loss function


where is a loss function on the space of pixel-images, is the predicted surface and is the actual surface. Ultimately, this assumes the linear PDE (2a) is sufficient to capture the dynamics in the original space, instead we shall attempt to project the dynamics onto a space where this assumption becomes more plausible, but first we briefly review how to numerically solve a PDE like (1) or (2).

2.2 Ritz-Galerkin discretisation of PDEs

It is typically necessary to solve a PDE problem, like (1), numerically via discretisation, one powerful method of doing so is to first multiply the problem by a test function , and then integrate to form a variational problem. It is typical to choose tests functions which vanish on the boundary, and so one arrives at the classical variational form of the PDE problem: find such that


for any test function , this is the weak form of the classical problem (1). In order to numerically implement this idea one must also replace the trial and test spaces by finite dimensional sub-spaces and we take . While there are multiple choices of basis functions , for example one can use the nodal basis functions of the FEM [Reddy, 2005], in this work we shall consider sets of orthonormal basis functions. That is we specify a collection of basis functions such that , where is the -inner product.We then search for solutions to (10) with representation , where is a vector of unknown coefficients to be determined. Inserting this representation into (10

) we achieve a finite-dimensional projection of the dynamical problem as an ordinary differential equation (ODE)


This is the Ritz-Galerkin projection of the dynamics onto the subspace . We use the notation to make it clear that where as the classical operator in the RHS of (1) depended only up on the coefficient functions parameterising it, the projected problem further depends on the ability of the basis functions to capture information about the problem, entwining the encoding and the dynamics. The less informative the basis functions the higher the dimension of will be needed to faithfully reproduce the dynamics in the function space. In what follows we refer to the process of forming the state dynamic matrix in (5) as the Assembly operation, which involves performing quadratures to compress the parameter fields and basis functions into an matrix.

2.3 Proper orthogonal decompositions

Given an ensemble of field variables over a domain , the proper orthogonal decomposition

(POD) is a technique for extracting an informative modal basis, that is a low-dimensional basis which captures most of the information or “energy” of the ensemble. The decomposition in POD is more familiar to the machine learning community under the name principal components analysis (PCA), or alternatively the Karhunen-Loeve decomposition

[Karhunen, 1946, Loeve, 1945], and involves reconstructing elements of the ensemble as where are the first

-eigenfunctions of the empirical covariance matrix of the ensemble ordered by largest eigenvalue.

The idea of using the POD eigenfunctions, as an “empircal basis” onto which to perform the above Ritz-Galerkin projection for modelling turbulent flows was introduced in [Lumley, 1981]. However, the optimality results concern POD as a linear re-constructor, and do not transfer to any optimality on the dynamic prediction problem [Berkooz et al., 1993]. Therefore, in this work we shall instead use deep networks to extract our orthogonal subspace, however motivated by the POD idea we shall attempt to still include some version of this optimal reconstruction to motivate the regulariser in our supervised loss.

3 Methodology

Our objective will be to estimate a future length

sequence of realisations of the process starting from time , using only information coming from the length history process . Infact, we shall also discretise the domain as an pixelated grid, and instead aim to estimate the vectorised field variable . That is we aim to learn an images-to-images map , which also embodies some minimal dynamical representation. To do so we introduce a DL approach to building a localised version of the POD-basis discussed above.

Deep Galerkin Features

(c) GalerkinBasis feature extraction
(d) Reconstruction from basis features
Figure 1: (a) A local input sequence is passed through the GalerkinBasis feature extraction, to produce a collection of orthogonal images . (b) An image can be projected onto the subspace and then reconstructed as the linear combination .

In order to project the inputs onto a reduced space it is required to construct a set of functions, , such that , again we shall work with the vectorised images of these functions and replace the orthogonality condition with a quadrature approximation. Furthermore, and without loss of generality, we can re-scale the domain to have unit volume so that we seek a collection of vectors such that , under the Euclidean inner product on .

Any given input can then projected onto an dimensional space by the Galerkin projection where is the -vector with coefficients , for . While the POD method discussed in the previous section constructs this basis vector using the complete set of inputs, we wish to perform our projection using only the most recent inputs, relying on the power of DL approaches to extract sufficient information from this reduced set. Our chosen feature extractor will be the ubiquitous U-Net [Ronneberger et al., 2015], owing to its demonstrated successes in performing feature extraction from image data. Since we seek basis functions we shall consider architectures which takes as input an image sequence of shape , and outputs an image sequence of shape . Our GalerkinBasis model is then a composition of this map, with an orthogonalisation of the outputs and we write


Functionally, we have constructed the basis of vectors , which depend only on temporally local values, an overview of the composed process is represented in Fig. 1.

Linear latent convection model

The GalerkinBasis provides our desired nonlinear transformation of the input sequence onto a linear space parameterised by , using only information up to time . For the remainder of the prediction horizon we wish to forward solve the problem using the physically informed convection dynamics on a linear space determined by the extracted features. We shall assume that in this space the dynamics are reasonably well described by the linear PDE (1), and use our learned features to carry out a projection of the dynamics onto a local processes obtained via (5).

It remains to parameterise the dynamics with a diffusion coefficient, and a transport vector field. We shall refer to the model component performing this parameterisation as the Convection model, and we shall also allow it to be informed by the previous observations. Once this component has been specified we shall advance the latent state according to an updating scheme of the form


for . First the previous values of the latent process are used to build new parameters of the transport model (7a), these are then combined with the features to assemble the state dynamic matrix (7b), and finally the linear ODE (5) is solved to give the new state, (7c), and this process is repeated until a complete set of latent states are obtained. Our general approach to parameterisation of the transporting physical model is therefore similar to that in [de Bezenac et al., 2018], but crucially our approach propagates the dynamics in the latent space by way of the Galerkin projection, rather than applied to the complete surface. This leads to an important difference since any surface now has a finite dimensional representation by our convection step (7a) may be taken as a function of the low dimensional latent states, that is we can consider a decoder which takes the lagged latent states and outputs the diffusion and transport field over this is the top-right block of Fig. 1(b).

Conversely, the approach taken in [de Bezenac et al., 2018] requires one to first encode the surfaces, and then decode them again to estimate the motion field, see Fig. 1(a). While this step necessarily creates an implicit latent representation as a byproduct of this encode/decode step, this representation has no further role to play in the model dynamics. In contrast, since we have already performed encoding through the Galerkin projection, we require no further compression allowing the latent representations produced by our approach to feature more directly in the dynamical description, see Fig. 1(b).

In summary, our model first uses the output of the GalerkinBasis to construct a set of orthogonal feature images and then uses the Ritz-Galerkin method to project the dynamics onto this space. The encoded variables are then passed to the Convection model to determine the dynamics, but notably while the convection model in Fig. 1(a) must account for all of the transformation applied to an input image, our model is able to share the dynamical description with the -features through the projection (5). This allows for a more parsimonious parameterisation of the the Convection model, and since only this smaller model is called repeatedly during the loop procedure (3) we achieve a more memory efficient implementation. Recognising the partition of our method into an orthogonal decomposition, and then a convection informed forward integration, we shall refer to it as a Deep Orthogonal Decomposition for Convection model, or simply DOD-Convection.

(a) Single time step of [de Bezenac et al., 2018]
(b) Single time-step of our DOD-Convection model
Figure 2: (a) The method of [de Bezenac et al., 2018] uses DL to estimate a motion field and then transport the solution in the original data-space, no further use is made of the latent representations. (b) Our method uses the GalerkinBasis network to encode the inputs to a latent space, and then advances the solution in the latent space using only information from the encoded variables to parameterise the Convection model, before finally reconstructing in the data-space

Reconstruction and supervised loss

Finally, having created a full set of latent states , we may use our Galerkin features to reconstruct the process. Note that we have both the historical states, , and the future states , we also further recall from our discussion of the POD method in Section 2.3 that when combined with the Galerkin features, the historical states should do a good job of reconstructing the inputs, but that the POD method produces the optimal such reconstruction. Together this partition into historical and future states suggests the following decomposition of our supervised loss function


The first two terms of (8b) are analogous to those in (3), playing a similar role with regards to measuring the accuracy of the prediction through a metric , while regularising the divergence of the output of the Convection to ensure a valid divergence free vector field.

The final term is specific to our method, and addresses the optimality condition of the POD basis as discussed in Section 2.3, in particular we commented that when used to reconstruct an ensemble the POD basis gives the minimal reconstruction error for a particular dimension of the subspace. The coefficient therefore controls the extent to which our local basis decomposition attempts to recover a version of this optimally condition. This trade-off helps ensure that the GalerkinBasis continues to have an identifiable role as an encoder/decoder of the surfaces given its additional role in assembling the discretised differential operator . This dual role of the -features is one of the principal strengths of our method, allowing dynamical components lost by the reduction to the overly simplistic PDE (1), to be recaptured by the DL-features, at the expense of the optimality of these same features as re-constructors of the surface.

4 Experiments

We now apply our method to the large-scale convective systems encountered in the climate sciences in order to assess the accuracy and computational efficiency of our method compared to alternatives. All experiments were conducted on a single Nvidia GeForce RTX 2060 SUPER GPu with 8GB of memory. For additional experiments and visualisations see the supplementary material.

4.1 Convective seasurface temperature

Figure 3: Regions used for the sea surface temperature experiment

As in [de Bezenac et al., 2018] we demonstrate our model on the SST data obtained from the NEMO (Nucleus for European Modelling of the Ocean) engine [Madec, 2008]. We extract the 64 by 64 pixel sub-regions shown in Fig. 3

, and use data from 2016-2018 to train the model, a give a total of 11935 acquisitions. We then test on data from the years 2019 using the same regions giving 3894 test instances. The regions were selected to capture the complicated nonlinear mixing that occurs as warm-water is transported along the gulf stream and meets the colder water of the arctic regions. We assume that on the short-term horizons we are interested in, the information within a region is sufficient for prediction and do not incorporate information from adjacent regions. Data is normalised to remove trends on a yearly timescale by taking the mean and standard deviations over the day of the year.

We compare the method we have introduced to the physics informed model of [de Bezenac et al., 2018], as well as a convolutional LSTM (Conv-LSTM) introduced by [Shi et al., 2015, 2017] which enhances LSTM networks with a spatial convolution to better model spatio-temporal process, and finally to compare with “phyiscs-free” approaches we use a U-Net as proposed in [Ayzel et al., 2019, Lebedev et al., 2019, Agrawal et al., 2019]

which treats prediction as an images-to-images problem, and makes no attempt to embed recurrent dynamical or physical structure. All models were implemented in Tensorflow

[Abadi et al., 2015], apart from [de Bezenac et al., 2018] for which we use publicly available code. 111

A PyTorch

[Paszke et al., 2019] implementation for the model of [de Bezenac et al., 2018] is available at

Average Score (MSE) No. of Parameters Run-Time [s]
Conv-LSTM[Shi et al., 2017] 0.2179 1,779,073 0.43
U-Net [Lebedev et al., 2019, Agrawal et al., 2019] 0.1452 31,032,446 0.79
Flow [de Bezenac et al., 2018] 0.1344 22,197,736 0.60
DOD-Convec 10,106,339 0.48
Table 1: Comparison of methods on the SST data. Average score is the mean squared error (MSE), No. of parameters is the total number of trainable parameters in each model, and run-time is the mean time per-batch with the maximum batch size that can fit in memory. We fit our model with -features

Results are displayed in Table 1, examining the test error for each method we see that our method demonstrates superior performance with a lower test error than the purely data-driven approaches and the more dynamic alternatives. As discussed in Sec. 3 our approach leads to a more parsimonious parameterisation of the convection field than [de Bezenac et al., 2018]. With regards to run-time the U-Net model is faster, and more memory efficient than the alternatives, including our own, which make use of a recurrent structure to advance the predictions. In spite of this our method still demonstrates competitive run time, with the increased accuracy a more than acceptable trade-off.

In Fig. 4 we plot a representative sequence from the test set, in which we observe that our method, row three, seems to be better at identifying the “whorl” pattern formations that characterise turbulence, but that the U-Net feature extraction also does a job job of identifying these features, but only on a limit time-horizon with the structure degrading as the sequence continues, this is most noticeable in the loss of structure of the in-flowing cold flux in the top-right of the images in row four. On the other-hand the method of [de Bezenac et al., 2018] does a better job capturing and preserving linear features, this is likely because this method is ultimately solving a linear PDE (2a), and identifying a convection model from data alone that will do the “violence” [Berkooz et al., 1993] of a nonlinear model from data alone is hard. By projecting to a latent space, and allowing this linear space to be adaptivly determined by nonlinear transformations of the local inputs we are better able to recover nonlinear features with a simpler convection model, and while we do pay a visible price in the smoothness of our reconstructions, overall we achieve better performance as presented in Tab. 1.

Figure 4:

Predicted surfaces from an input sequence of length four, top row, compared to a target output sequence of length six, second row. from a test set sequence in Region 2 of Fig.


5 Related work

Spatio-temporal dynamical modelling Traditional dynamical modelling approaches have been based on either optical flows [Fleet and Weiss, 2006, Woo and Wong, 2017] which estiamte the transport vector field by examining the differenced input sequence, or else numerical models [Béréziat and Herlin, 2015] where the transporting field emerges as the output of a calibrated physical model. More principled physical models have the ability to display complex long-range spatio-temporal dependencies, but deriving and then calibrating such models is significantly harder than the DL and hybrid methods we consider.

Physics free deep learning As noted by [Agrawal et al., 2019] “Physics free" deep learning approaches can be boradly divided into attempts to model the time evolution as a temporal sequence, or else those which treat the problem as learning an unknown “image(s)-to-images” map. The former include the convolutional LSTM model of [Shi et al., 2015, 2017], as well as video generation methods [Mathieu et al., 2016, Vondrick et al., 2016] For the latter option U-Net based approaches have been popular, [Ayzel et al., 2019, Lebedev et al., 2019] and [Agrawal et al., 2019] who argue that the overly simplistic assumptions in optical flow or poorly specified numerical models are often unviable, so that free-form DL methods display superior performance methods tied to a misspecified physical model. While our work does use an overly simplistic model of the dynamics, by projecting on the latent space we are able to ensure this simplification does not overly constrain the resulting predictions.

Physics informed deep learning Aside from the work of [de Bezenac et al., 2018] which we have discussed in Section 2, we also mention the more complicated U-Net architecture inspired by multiscale approximations to the Navier-Stokes equations in [Wang et al., 2020], since this model uses physical information to design a specific U-Net architecture there some overlap with the already mentioned images-to-images methods. Finally we note that we have advanced the solution to the problem by solving something which closely approximates the solution of a PDE. Alternative approaches [Raissi et al., 2017, Berg and Nyström, 2017, Zhu et al., 2019] have avoided this explicit solve step and instead use flexible DL surrogates of the output map, with the PDE operators re-appearing as regularisers. This is a much more dramatic regularisation than the loss we have considered, and we are unaware of this complete operator regularisation scheme having been applied to convective non-linear systems at the scale considered in the experiments.

6 Discussion

In this work we have combined the powerful feature extraction capabilities of DL, with minimal physical representations of the dynamics to introduce a physically informed model demonstrating superior predictive performance on the convective physical processes that dominate the atmospheric and fluid transport problems pervading our natural world. To the extent that this model is “physically informed” we have a priori specified only a minimal representation of the dynamics, we justify this by our desire to avoid overly strong, and so impossible to justify, assumptions on the complex generating mechanism and therefore maintain as much as possible the ability of the model to learn the underlying dynamics from data. Crucial to this has been our efforts to ensure that the latent representations formed by DL-encoding are more strongly entwined with the dynamics of the process than previous approaches, leading to a model that demonstrates superior predictive performance, and improved recovery of temporally persistent nonlinear features.

Broader Impact

Climate change manifests not only as long-term trends, but also as the increased short-term volatility of these processes, and as such the development of near-term prediction methods which can be rapidly evaluated and re-calibrated is only increasing in importance. Accurate near term prediction of rapidly changing events is vital for public safety in situations including extreme flooding, cyclone formation, and forest fires – all of these systems possessing the informative transport features which would allow our method to be successfully applied. These applications will continue to add to the evidence we have already provided that physical principles still have an important role to play in designing models, and argue against the “physics free” philosophy. We demonstrate that there is much to be gained from strongly connecting learned latent representations with global model dynamics, achieving not only interpret-able latent spaces, but also interpretable dynamics on these spaces. The resulting models demonstrate better performance, and have a clearer physical picture, and we envision our work continuing to invite methodological study of the generalisation benefits to be obtained by this stronger entwining between encoded representations and the original data-space dynamics.

Despite the progress we have made, predicting these systems remains a hard problem, and it is unlikely that any one method will work best in all scenarios and so appropriate aggregation strategies should be adopted. However, our discussion of the sea surface predictions in Sec. 4

demonstrates that our model does a better job of predicting certain nonlinear patterns in convective flows, and so will be an effective complement to those existing methods which perform well on more linear flows. Finally while we have presented an effective supervised learning method, it will be important to extend the work we have done to a generative probabilistic framework to provide further uncertainty quantification.


  • M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Note: Software available from External Links: Link Cited by: §4.1.
  • S. Agrawal, L. Barrington, C. Bromberg, J. Burge, C. Gazen, and J. Hickey (2019) Machine learning for precipitation nowcasting from radar images. External Links: arXiv:1912.12132 Cited by: §B, Table 2, §1, §1, §4.1, Table 1, §5.
  • G. Ayzel, M. Heistermann, A. Sorokin, O. Nikitin, and O. Lukyanova (2019)

    All convolutional neural networks for radar-based precipitation nowcasting

    Procedia Computer Science 150, pp. 186 – 192. Note: Proceedings of the 13th International Symposium “Intelligent Systems 2018” (INTELS’18), 22-24 October, 2018, St. Petersburg, Russia External Links: ISSN 1877-0509, Document, Link Cited by: §1, §4.1, §5.
  • D. Béréziat and I. Herlin (2015) Coupling dynamic equations and satellite images for modelling ocean surface circulation. Communications in Computer and Information Science 550, pp. 191–205. External Links: Link, Document Cited by: §5.
  • J. Berg and K. Nyström (2017) A unified deep artificial neural network approach to partial differential equations in complex geometries. External Links: arXiv:1711.06464, Document Cited by: §1, §5.
  • G. Berkooz, P. Holmes, and J. L. Lumley (1993) The proper orthogonal decomposition in the analysis of turbulent flows. Annual Review of Fluid Mechanics 25 (1), pp. 539–575. External Links: Document, Link, Cited by: §2.3, §4.1.
  • T. Cohen, M. Weiler, B. Kicanaoglu, and M. Welling (2019) Gauge equivariant convolutional networks and the icosahedral CNN. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 1321–1330. External Links: Link Cited by: §1.
  • E. de Bezenac, A. Pajot, and P. Gallinari (2018) Deep learning for physical processes: incorporating prior scientific knowledge. In International Conference on Learning Representations, External Links: Link Cited by: §A, §A, §1, §2.1, 1(a), Figure 2, §3, §3, §4.1, §4.1, §4.1, §4.1, Table 1, §5, footnote 1.
  • D. J. Fleet and Y. Weiss (2006) Optical flow estimation. In

    Handbook of Mathematical Models in Computer Vision

    , N. Paragios, Y. Chen, and O. D. Faugeras (Eds.),
    Cited by: §5.
  • A. V. Getling and E. A. Spiegel (1998) Rayleigh‐bénard convection: structures and dynamics. Columbia University, New York. Cited by: §2.1.
  • K. Karhunen (1946) Zur spektraltheorie stochastischer prozesse. Annales Academiæ Scientiarum Fennicæ, Series A. Cited by: §2.3.
  • V. Lebedev, V. Ivashkin, I. Rudenko, A. Ganshin, A. Molchanov, S. Ovcharenko, R. Grokhovetskiy, I. Bushmarinov, and D. Solomentsev (2019) Precipitation nowcasting with satellite imagery. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, New York, NY, USA, pp. 2680–2688. External Links: ISBN 9781450362016, Link, Document Cited by: Table 2, §1, §4.1, Table 1, §5.
  • M. Loeve (1945) Fonctions aleatoires de second ordre. Comptes Rendus De L’Académie Des Sciences 220. Cited by: §2.3.
  • J.L. Lumley (1981) Coherent structures in turbulence. In Transition and Turbulence, R. E. MEYER (Ed.), pp. 215 – 242. External Links: ISBN 978-0-12-493240-1, Document, Link Cited by: §2.3.
  • G. Madec (2008) NEMO ocean engine. Technical report Institut Pierre-Simon Laplace IPSL. Cited by: §A, §1, §4.1.
  • M. Mathieu, C. Couprie, and Y. LeCun (2016) Deep multi-scale video prediction beyond mean square error. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Y. Bengio and Y. LeCun (Eds.), External Links: Link Cited by: §1, §1, §5.
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett (Eds.), pp. 8026–8037. External Links: Link Cited by: footnote 1.
  • M. Raissi, P. Perdikaris, and G.E. Karniadakis (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686 – 707. External Links: ISSN 0021-9991, Document, Link Cited by: §1.
  • M. Raissi, P. Perdikaris, and G. E. Karniadakis (2017) Physics informed deep learning (part i): data-driven solutions of nonlinear partial differential equations. External Links: arXiv:1711.10561 Cited by: §1, §5.
  • J. Reddy (2005) An introduction to the finite element method. McGraw-Hill Education, New York. Cited by: Appendix A, §2.2.
  • O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi (Eds.), Cham, pp. 234–241. External Links: ISBN 978-3-319-24574-4 Cited by: Appendix B, §3.
  • B. Saltzman (1962) Selected papers on the theory of thermal convection with special application to the earth’s planetary atmosphere. Dover. Cited by: §2.1.
  • X. Shi, Z. Chen, H. Wang, D. Yeung, and W. Woo (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. CoRR. Cited by: §1, §1, §4.1, §5.
  • X. Shi, Z. Gao, L. Lausen, H. Wang, D. Yeung, W. Wong, and W. WOO (2017) Deep learning for precipitation nowcasting: a benchmark and a new model. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 5617–5627. External Links: Link Cited by: §1, §1, §4.1, Table 1, §5.
  • J. Sirignano and K. Spiliopoulos (2018) DGM: a deep learning algorithm for solving partial differential equations. Journal of Computational Physics 375, pp. 1339 – 1364. External Links: ISSN 0021-9991, Document, Link Cited by: §1.
  • C. Vondrick, H. Pirsiavash, and A. Torralba (2016) Generating videos with scene dynamics. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 613–621. External Links: Link Cited by: §1, §1, §5.
  • R. Wang, K. Kashinath, M. Mustafa, A. Albert, and R. Yu (2020) Towards physics-informed deep learning for turbulent flow prediction. External Links: Link Cited by: §5.
  • M. Weiler and G. Cesa (2019) General e(2)-equivariant steerable cnns. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, and R. Garnett (Eds.), pp. 14334–14345. External Links: Link Cited by: §1.
  • E. P. Wigner (1960) The unreasonable effectiveness of mathematics in the natural sciences. richard courant lecture in mathematical sciences delivered at new york university, may 11, 1959. Communications on Pure and Applied Mathematics 13 (1), pp. 1–14. External Links: Document, Link, Cited by: §1.
  • W.-C. Woo and W.-K. Wong (2017) Operational application of optical flow techniques to radar-based rainfall nowcasting. Atmosphere 8 (5), pp. 48 – 68. Cited by: §5.
  • A. Yazdani, M. Raissi, and G. E. Karniadakis (2019) Systems biology informed deep learning for inferring parameters and hidden dynamics. bioRxiv. External Links: Document, Link, Cited by: §1.
  • J. Zhang, K. Howard, C. Langston, B. Kaney, Y. Qi, L. Tang, H. Grams, Y. Wang, S. Cocks, S. Martinaitis, A. Arthur, K. Cooper, J. Brogden, and D. Kitzmiller (2016) Multi-radar multi-sensor (mrms) quantitative precipitation estimation: initial operating capabilities. Bulletin of the American Meteorological Society 97 (4), pp. 621–638. External Links: Document, Link, Cited by: §B.
  • Y. Zhu, N. Zabaras, P. Koutsourelakis, and P. Perdikaris (2019) Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. Journal of Computational Physics 394, pp. . External Links: Document Cited by: §1, §5.


Appendix A Ritz-Galerkin method for PDEs

Since the forward map is unavailable in closed form it becomes necessary to solve the PDE numerically. To accompany the necessarily brief review in the main paper we provide an extended presentation of how to discretise a PDE on the model Poisson problem, though see also [Reddy, 2005] for more information


To discretise one first multiplies (9a) by some test function and then performs an integration by parts, it is standard to further assume that the test functions vanish on the boundary, so one arrives at the classical form of the variational problem: find such that


for any test function , this is the weak form of the classical problem (9).

To numerically implement this idea one replaces the trial and test spaces by finite dimensional sub-spaces and respectively, with corresponding basis functions denoted by , and .

Once we have specified this basis we can then search for solutions with a representation of the form , where is the vector of coefficients parameterising the finite-dimensional representation of . Inserting this representation into (10) we arrive at the finite-dimensional version on the weak-form given by the system of equations

which must hold for each basis vector . We may represent this in equivalent matrix-vector notation as , where the stiffness matrix and load vector are given by


Following the same procedure we can arrive at a weak-form discretisation of the transport equation (1). In this case the discrete representation analogous to (11a) is given by where


Appendix B Network architectures

U-Net architectures

We use a U-Net [Ronneberger et al., 2015] for both our basis feature extractor, and in its own right as a benchmark DL method. These take the standard U-Ne form being divided into an encoder and a decoder, the encoder is a sequence of computational units called the Downsample Blocks and the encoder a sequence of Upsample Blocks, long range skips connections are then provided to connect each downsample block with the corresponding upsampling block in the decoding phase. The basis form of these blocks is given by

  • Downsample Block:

  • Upsample Block:

where the upsampling operation doubles the pixel resolution in both directions using nearest neighbour interpolation.

The U-Net used in the experiments has four down-sample and up-sample blocks, the smaller U-Net used as our feature extractor has three.

Convolution decoder

As discussed in Sec. 3 our method estimates the transport vector field and diffusion coefficient from the latent variables , this is done in two phases, first using a sequence of dense networks, and then convolutions. For the image patches used in the sea surface temperature experiment this network takes the form

  • Dense phase:

  • Convolution phase:

where is a densely connected layer with

features and ReLU activation function. Similarly

is a 2D convolution layer with features and a -kernel operation. The Upsample operation is the same as for the U-Net architecture just described. The Flatten operation takes the sequence of latent states and flattens them into a single vector.

The output of this as image, final we split on the last dimension to produce the diffusion field, and the transport vector field, and the diffusion field is passed through an additional softplus layer to ensure positivity.

Appendix C Further experimental details

A NEMO Sea Surface Temperature data

The NEMO (Nucleaus for European Modelling of the Ocean) engine [Madec, 2008] is a state-of-the-art framework for modelling the convective processes that drive the worlds oceans, including temperature, salinty, sea ice area and thickness etc., and is provided by the Copernicus Marine Service portal with product identifier GLOBAL_ANALYSIS_FORECAST_PHY_001_024. While it is a simulation engine, it accumulates historical data to produce a synthesised estimate of the state of the system using data assimilation methods, as such the data does follow the true temperature, making this an ideal dataset to apply our method to.

Our dataset is formed by first decomposing a larger study region into 64 by 64 pixel image patches, these are displayed in Figure 3. Our regions differ from those used in [de Bezenac et al., 2018] in that we have selected our grouping to more closely follow the movement of the gulf stream up the Eastern seaboard of North America and then accross the Atlantic, mixing with the colder water from the Arctic regions. This choice allows us to model the most interesting regions, with the complex nonlinear mixing dynamics we are most interested in.

Due to a re-calibration of the GLOBAL_ANALYSIS_FORECAST_PHY_001_024 product we only have access to data from January 1st, 2016 onwards and so cannot use the same temporal range as considered in [de Bezenac et al., 2018]. Instead we use data from 2016–2018 as training data, and then the data from 2019 to test on. We also partitioned our data into length 4 input sequences, and then predicted on the next 8 sequences. Leading to 11935 training examples and 3984 test examples, we also test our data on the same set of regions as used for training.

Additional figures In Fig. 5 we display the predictions from our model first presented in Fig 4, but now including also the convection field used to advance our model. This vector field displays both long range stream lines, as well as localised regions around which a particle rotates around a center, furthermore the richness of all of these fields was obtained by decoding a dimensional vector, and not as an image-to-image transform. This parsimonious deconvolution from the latent space, combined with the ability of our method to learn complex vector fields is the principle component of its demonstrated success on convective fields such as the sea surface temperature data.

Figure 5: Estimated convection field for our model.

B Precipitation nowcasting

For this experiment we consider the problem of providing short term precipitation forecasts from Doppler radar images as considered in [Agrawal et al., 2019]. The data comes from the multi-radar multi-sensor (MRMS) system developed by NOAA National Severe Storms Laboratory, and provides precipitation rate updates at 2 minute intervals over a spatial resolution of 1km 1km. We use the MRMS dataset [Zhang et al., 2016] re-sampled to 10 minute intervals. The dataset covers the whole of the United States, but we restrict out attention to sixteen regions in the Pacific Northwest, and then further restrict our attention to the two wettest regions in the training set. We train on data from January 2018 and then test on the first week of November 2017 giving 8904 training instances and 2568 test instances. Following [Agrawal et al., 2019] we frame the problem as a classification problem and discretise the precipitation data using three thresholds in units of millimetres of rain per hour, these ranges are given by and , and the model is then trained using a cross-entropy loss target these pixel values.

The resulting accuracy on the classification problem is displayed in Table 2, which demonstrate that there is little to choose between the two approaches in this instance. It would appear that once the process has been coarsened in this way there is no significant difference between the physics free method and our physics informed method on the precipitation data, the implications of this are the subject of further study, however it is ultimately reassuring that our method proves no worse on this problem, and indeed significant better on the “full-image” temperature data reported in Sec. 4

Training Accuracy Test Accuracy
U-Net [Lebedev et al., 2019, Agrawal et al., 2019] 0.9353 0.8047
DOD-Convec 0.8445 0.8052
Table 2: One hour ahead precipitation nowcasting accuracy.