1 Introduction
Physical modeling is still one of the most striking examples where humans are a long way ahead of pure Machine Learning (ML) systems. Recently, numerous research efforts have been directed into designing ML algorithms, especially deep neural networks, that can learn the basic laws of physics from data Wu and Tegmark (2018); Iten et al. (2018); Greydanus et al. (2019); Wang et al. (2019). These works mostly agree on the importance of interpretability and respect of physical constraints, which is still not straightforward when using blackbox regressors such as neural networks Carleo et al. (2019)
. In particular, many works focused on the case where data describing the dynamical system of interest is assumed to be governed by a system of partial differential equations (PDEs) and where a certain physical a priori is known
Brunton et al. (2016); Raissi (2018); Long et al. (2019); Lu et al. (2019). From the perspective of ocean sciences, many results shown in the aforementioned references were run on toy models, and more investigation is needed on real ocean satellitederived or model simulation data. Few works set the foot in this direction: in de Bezenac et al. (2017) ideas from optical flow video prediction were linked to an advectiondiffusion model and considered to forecast Sea Surface Temperature (SST), while in Ayed et al. (2019); Ouala et al. (2019) the goal was to infer the dynamics of a latent variable from partial and noisy observations of SST and Sea Level Anomaly (SLA) respectively.The general idea behind the present work consists in standing on the shoulders of the current understanding of ocean variables by physical oceanographers, and include as much as we can of their knowledge in the design of our NN architecture. In this paper, we demonstrate this strategy on upper ocean dynamics, and more precisely the dynamics of the Sea Surface Height (SSH). We present a fullydifferential advectiondiffusion architecture which generalizes the QuasiGeostrophy (QG) theory, one of the main baselines for forecasting SSH Ubelmann et al. (2015). To ensure numerical stability and stay close to realistic solutions, ideas from numerical schemes were considered in developing the architecture at the expense of depth and memory load.
2 Model
Deep Neural Numerical Models
Automatic differentiation (AD) has a long history in the numerical modeling community. In ocean sciences for instance, OpenAD an open source code for AD has been used to calculate adjoints of popular general circulation models such as MITgcm
Naumann et al. (2006); Utke et al. (2008). However, these tools do not account for the training of neural networks straightforwardly, and are in general designed to handle Fortran based codes. We refer by Deep Neural Numerical Models (DNNMs) to fully differentiable numerical models that can incorporate easily trainable NNs. Depending on the complexity of the involved PDE equations, corresponding DNNMs require a considerable amount of technical and engineering work, to rewrite standard numerical model codes into fully differentiable architectures that allows training NNs by backpropagation. This could not be possible without the ongoing flare in the Deep Learning community and in particular in AD librairies such as PyTorch and Tensorflow
Paszke et al. (2017); Abadi et al. (2016); Baydin et al. (2018). DNNMs can range from fully NNbased architectures such as ResNets Weinan (2017); Rousseau et al. (2019) to complex physical constrained architectures such as in Raissi (2018); de Bezenac et al. (2017); Long et al. (2019). Here, we propose an advectiondiffusion DNNM, that solves the following equations:(1) 
where is the 2D Laplacian operator, and are components of the nondivergent velocity field, and the diffusion coefficient. These equations describe the evolution of the flow field through the advectiondiffusion of a proxy variable obtained by a given transformation . In case is the identity, we fall into classical models as the one studied in de Bezenac et al. (2017). Using PyTorch, we develop a DNNM where the discretization of the PDE involves the use of a order upwind scheme and a order Euler scheme in time. This scheme is stable as long as the Courant–Friedrichs–Lewy condition (CFL) is satisfied, implying that model integration is done in small steps . at each is obtained through the inversion of Equation 1a, for example if is linear, we use Conjugate Gradient (CG) method with constant boundary conditions. In practice, given a good initial guess, the CG is stopped after few iterations (less than 5) to permit realtime execution and to avoid computational burden.
QGNet
The 1layer Quasigeostrophic (QG) model is a reduced model that describes the evolution of oceanic flows close to geostrophic balance Vallis (2006)
. Recently, this model was found to be a good baseline to dynamically interpolate SSH fields through temporal gaps
Ubelmann et al. (2015). To forecast SSH dynamics, the numerical model uses the Potential Vorticity (PV) as a proxy variable that is advected by Geostrophic Velocities (GV). This process is governed by the following equations Smith and Vallis (2001); Hua and Haidvogel (1986); Fu and Flierl (1980):(2) 
where is the SSH field, is the gravity constant, is the Coriolis parameter, is the first Rossby deformation radius. and are the Geostrophic Velocities, and is a term that accounts for meridional advection of PV. This model is in accordance with the DNNM framework presented above. We find the correspondant , , and call it QGNet. Gradients and Laplacians were rewritten into filter convolutions. One CG iteration can already give an acceptable solution if the guess field respects the following pattern:
(3) 
therefore we used the equations of the CG algorithm Shewchuk and others (1994) to write the equivalent of one CG iteration. The resulting architecture is illustrated in Fig1(a,b). It benefits from high flexibility, since several building blocks can be interchanged with ConvNets, making it a playground for several modeling choices such as the ones exposed in the next section. Yet, due to the high memory cost of using float64 precision and low values of needed for the integration step (which results in a high number of iterations), training QGNet in an acceptable time requires several highend Graphics Processing Units (GPUs).
3 Experiments
Data and Experimental details
We use NATL60, a dataset from a comprehensive realistic ocean model simulation based on NEMO ocean engine run at kilometric resolution over the North Atlantic basin Molines (2018). Study region is a box located on the Gulf Stream, a region with challenging physics. Four Nvidia Tesla V100 GPUs were used for the computations described in the experiments.
Discovering insights about hidden laws from data
NATL60 ocean circulation model is governed by complex physics not covered entirely by the QG theory. Yet, we want to investigate to which extent QGNet can reveal the limits of this theory. A simple illustrative example consists in assuming that the PV is advected by some unknown fields and which are firstorder derivations of SSH through a linear operator . Concretely, given that the 2D gradient filter used to calculate the geostrophic velocities and in the original Python code are respectively and its transpose, we replace in QGNet by a 6parameters trainable filter and retrieve the resulting filter from a training procedure using NATL60 data. This resorts to:
(4) 
We set a 1day SSH forecasting experiment, and use for QGNet, meaning that 144 blocks are needed (Fig1 (b,c)). Note that we use shared weights across the blocks. QGNet in this experiment has then only 6 parameters which are the weights of . 18 SSH images of size
are used for training (one each 20 days from June 2012 to June 2013) using the BFGS algorithm and a loss function
composed of three terms: i) mean square error between the QGNet 1day forecast and the NATL60 target 1day forecast scaled by the variance of the target, ii) a loss penalizing velocity fields with high divergence, iii) L2regularization of the weights
(5) 
The result of the optimization yields , keep in mind that we are not expecting to find exactly due to the complex dynamics of NATL60. Therefore, from a completely random filter, QGNet found which is close to , a proof perhaps that the capacity of this type of models is reached and that the PV is best advected by the GV, as QGtheory claims.
Supplementing known physics with nonlinear learnable components
In this experiment, we consider the same 1day SSH forecast experiment but we assume that at each time step the PV is advected by the GV plus NN a nonlinear transformation of SSH (Fig1(c)). NN is a 2layer ConvNet with 16
filters, Batch Normalization and leaky ReLU activations, the output layer is a linear layer with 2 channels that are added to
and respectively to form W the new velocity fields. QGNet has 2545 trainable parameters and we split our data into 122 SSH images of size (1 SSH map each 3 days from 14 Jun 2012 to 13 Jun 2013), then after a 10day gap we take 32 snapshots as our test dataset (1 SSH map each 3 days from 24 Jun to 29 Sep 2013). Our network is trained using Adam optimizer with an initial learning rate ofwhich is later multiplied by 0.1 each 100 epoch. Batch size is 4 samples distributed on the 4 GPUs cards. The loss function considered here is the scaled mean square error used in Eq.
5. To ensure a stable gradient flow at the beginning of the optimization a scalar parameter initialized as zero is multiplied to the ConvNet velocities.At the end of training, we unplug the ConvNet from QGNet, resulting in a NN component that takes SSH as input and yields a deterministic "perturbation" of GV. A clear benefit from this setup is that the trained component can be plugged back in the original Python code and avoid computational load at test time. Fig2(b) presents the RMSE distributions of the standard QG, our QGNet and a naive constant model (persistance). Adding the ConvNet component to the GV has slightly improved the standard QG model, this is an indication that a nonlinear velocity term can model SSH dynamics beyond the standard QG. Fig2(a) shows an example of the additional velocities produced by the ConvNet along with GV for the same SSH input. We observe that the output of the ConvNet has a significant amplitude along SSH contours (ocean fronts), and that the fields follow a special pattern that depends on GV and are not completely random. Interpreting the ConvNet in this experiment is not straightforward and calls for more investigation to convert it into tangible equations that could be inspected by physical oceanographers.
4 Conclusion
We show that combining deep learning automatic differentiation libraries and numerical models could help designing hybrid models with trainable parameters and represent a test bed to evaluate established physical theories or seek intuition for developing new ones. We believe that this work represents a modest step for helping physicists developing innovative physical models.
Acknowledgments
The authors would like to thank Clément Ubelmann from CLS for the 1layer QG Python code that can be found here https://github.com/redouanelg/qgswDI. Most of the computations presented in this paper were performed using the GRICAD infrastructure (https://gricad.univgrenoblealpes.fr), which is partly supported by the Equip@Meso project (reference ANR10EQPX2901) of the programme Investissements d’Avenir supervised by the Agence Nationale pour la Recherche.
R. Lguensat is funded through a postdoctoral grant from Centre National d’Etudes Spatiales (CNES), he also acknowledges the support of NVIDIA Corporation under the NVIDIA GPU Grant program.
S. Metref is funded by ANR through contract number ANR17 CE01000901.
R. Fablet was supported by Labex Cominlabs (grant SEACS), CNES (grant OSTSTMANATEE) and ANR (EUR Isblue and Melody).
References
 [1] (2016) Tensorflow: a system for largescale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283. Cited by: §2.
 [2] (2019) Learning dynamical systems from partial observations. arXiv preprint arXiv:1902.11136. Cited by: §1.
 [3] (2018) Automatic differentiation in machine learning: a survey. Journal of machine learning research 18 (153). Cited by: §2.
 [4] (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113 (15), pp. 3932–3937. Cited by: §1.
 [5] (2019) Machine learning and the physical sciences. arXiv preprint arXiv:1903.10563. Cited by: §1.
 [6] (2017) Deep learning for physical processes: incorporating prior scientific knowledge. arXiv preprint arXiv:1711.07970. Cited by: §1, §2, §2.
 [7] (1980) Nonlinear energy and enstrophy transfers in a realistically stratified ocean. Dynamics of Atmospheres and Oceans 4 (4), pp. 219–246. Cited by: §2.
 [8] (2019) Hamiltonian neural networks. arXiv preprint arXiv:1906.01563. Cited by: §1.
 [9] (1986) Numerical simulations of the vertical structure of quasigeostrophic turbulence. Journal of the atmospheric sciences 43 (23), pp. 2923–2936. Cited by: §2.
 [10] (2018) Discovering physical concepts with neural networks. arXiv preprint arXiv:1807.10300. Cited by: §1.
 [11] (2019) PDEnet 2.0: learning pdes from data with a numericsymbolic hybrid deep network. Journal of Computational Physics, pp. 108925. Cited by: §1, §2.

[12]
(2019)
Extracting interpretable physical parameters from spatiotemporal systems using unsupervised learning
. arXiv preprint arXiv:1907.06011. Cited by: §1.  [13] (201803) meomconfigurations/NATL60CJM165: NATL60 code used for CJM165 experiment. External Links: Document, Link Cited by: §3.
 [14] (2006) Adjoint code by source transformation with openad/f. In Proceedings of the European Conference on Computational Fluid Dynamics (ECCOMAS CFD 2006). TU Delft, Cited by: §2.
 [15] (2019) Learning latent dynamics for partiallyobserved chaotic systems. arXiv preprint arXiv:1907.02452. Cited by: §1.
 [16] (2017) Automatic differentiation in pytorch. In NIPSW, Cited by: §2.
 [17] (2018) Deep hidden physics models: deep learning of nonlinear partial differential equations. The Journal of Machine Learning Research 19 (1), pp. 932–955. Cited by: §1, §2.
 [18] (2019) Residual networks as flows of diffeomorphisms. Journal of Mathematical Imaging and Vision, pp. 1–11. Cited by: §2.
 [19] (1994) An introduction to the conjugate gradient method without the agonizing pain. CarnegieMellon University. Department of Computer Science. Cited by: §2.
 [20] (2001) The scales and equilibration of midocean eddies: freely evolving flow. Journal of Physical Oceanography 31 (2), pp. 554–571. Cited by: §2.
 [21] (2015) Dynamic interpolation of sea surface height and potential applications for future highresolution altimetry mapping. Journal of Atmospheric and Oceanic Technology 32 (1), pp. 177–184. Cited by: §1, §2.
 [22] (2008) OpenAD/f: a modular opensource tool for automatic differentiation of fortran codes. ACM Transactions on Mathematical Software (TOMS) 34 (4), pp. 18. Cited by: §2.
 [23] (2006) Atmospheric and oceanic fluid dynamics. Cambridge University Press, Cambridge, U.K.. Cited by: §2.
 [24] (2019) Emergent schrödinger equation in an introspective machine learning architecture. Science Bulletin. Cited by: §1.
 [25] (2017) A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics 5 (1), pp. 1–11. Cited by: §2.
 [26] (2018) Toward an AI physicist for unsupervised learning. arXiv preprint arXiv:1810.10525. Cited by: §1.
Comments
There are no comments yet.