1 Introduction
The temporal evolution of the atmosphere or ocean can be represented by a set of equations based on the fundamental laws of physics and empirical relations deduced from observations. These equations are complex and numerical methods have been developed to solve them. The quality and complexity of these numerical models has iteratively increased, culminating in the actual ocean or atmosphere general circulation models (Eyring et al., 2016). In oceanography, the most complex models currently use a grid with an horizontal resolution of about 10 km, (e.g. Madec et al. (2015)) with 50 vertical levels on a square basin with sides of 5000 km, corresponding to
grid points. When the model is run to simulate a year, with 1 hour time steps, the information brought by the numerical experiment, for a single variable, is contained in a vector of around
values.This numerical approach can be described using formal mathematical notations in the following way. A vector (typically ) defines the discretized state of the system. A dynamical model is a system of differential equations which can be solved to compute vectors for time index using the information already known: where is a highly complex function obtained by discretizing the basic equations of the model. If is derived from physical principles, the model is said to be ”physically driven”. The missrepresentation of subscale phenomena and the imperfect resolution of some nonlinear processes is an inherent part of the numerical approach which always has to make a tradeoff between accuracy and computation cost. To remedy these deficiencies, the illresolved parts of the models are represented through parameterization schemes and simplified or empirical models (Randall et al., 2007).
In this paper we use machine learning and a datadriven approach to build a hybrid model, which combines physically and data driven terms, in order to reproduce the processes that are inaccurately represented in the original model. Several works, which used data to derive models based on machine learning and deep learning algorithms, have already been conducted. Approaches aiming at emulating an entire model given a set of perfect (without noise) observations
(Pathak et al., 2017, 2018; Fablet et al., 2017) were applied on chaotic lowdimension dynamical systems (such as the 3 variable Lorenz model (Lorenz, 1963), the 40 variable Lorenz model (Lorenz and Emanuel, 1998) and the KuramotoSivashinsky model (Kuramoto and Tsuzuki, 1976)). ”Physicaloriented” deeplearning architectures have also been proposed for forecasting or nowcasting with no explicit numerical models (De Bézenac, 2017; Shi et al., 2015). For perfect data, it has been proposed that machine learning approaches can help estimating unknown parameters or unknown parameterizations of a numerical model
(Schneider et al., 2017). Deeplearning was also applied to infer subgrid parameterizations (Bolton and Zanna, 2019; Rasp et al., 2018).Our approach aims at representing the illknown part of a numerical model. The main contributions of this paper are to address the case of noisy data, to aim at conducting longterm simulations that conserves the properties of the underlying dynamics and to evaluate the generalization skill of the hybrid model. We show that it is necessary to add physical constraints to the hybrid model in order to achieve these objectives.
In the machine learning approach, we rewrite the numerical model
(1) 
where represents the the illknown part of the numerical model at iteration . It can be estimated using an empirical model; for this we suppose the estimator of depends on the state of system:
(2) 
where
is a unknown function that the machine learning algorithm must determine by means of nonlinear regressions in a high dimensional space. In the following we will consider the particular case in which
depends only on the previous state: The function and thus can be estimated thanks to external data coming from two sources:
Direct or indirect observations of the system (e.g. insitu observations or satellite data)

Highresolution simulations (they can be carried out to resolve the illresolved processes on coarser resolutions).
Note that the access of these data are limited in space and time. In this paper we firstly use external data (see section 2.2) to calibrate the function and secondly integrate the neuralnetwork model in absence of these external data.
In the last years, big data and machine learning techniques (deeplearning, neural networks, …) have shown impressive skills in forecasting and classifying complex behaviors
(LeCun et al., 2015). In this work, we use a convolutional neural network
(Goodfellow et al., 2016) to compute the function (see Eq. 4). We try to evaluate the capacity of the neuralnetwork model to produce longterm simulations similar to those produced by a mediumcomplexity shallowwater model (described in section 2). The computations are constrained by weakly noised observations of model outputs.The paper is organized as follows. Section 2 presents the physical model and section 3 the method. The last section shows the results based on synthetic data generated with or without noise, with two different wind forcings.
2 The model and the reference experiment
2.1 The shallowwater model
We consider a one and a half layer shallowwater model in the plane. Noting and the zonal and meridional coordinates we have
(3)  
The vector contains the dissipation, the diffusion, and the forcing and is assumed to be unknown; , are the zonal and meridional velocities, the thickness anomaly of the active layer, the Coriolis parameter (, , =), is the vorticity, is the reduced gravity, and is the mean thickness of the active layer. The domain size is with a grid point of .
The equations are discretized on a Cgrid (Arakawa et al., 1981) with a resolution of 20 km, largely inferior to the mean Rossby deformation radius of the model, around 620 km. A leapfrog scheme for temporal discretization is used, combined with an Asselin filter (Asselin, 2007) with a time step of . This produces a recurrence relation similar to the one summarized in equation (1).
2.2 Reference experiment
In order to train our neural network (see Section 3.1) to determine the function and thus to reproduce , we analytically define a simulated ”truth”:
(4) 
where is the Laplace operator, , , . The meridional wind stress is null and the zonal wind stress is defined by :
(5) 
where: is in the standard case equal to ( km and ).
The fully specified model (Eq. 2.1 and Eq. 4) is the same as in Krysta et al. (2011) with changes in parameter values to ensure the presence of typical dynamical structures as eddies. No slip boundary conditions are applied.
Note that the unknown terms in our model are diverse; each has a different impact on the evolution of the system’s state. On the one hand the wind forcing varies in space and structures the mean state of the shallowwater model. On the other hand, the dissipation and diffusion depend on the state variable and dissipate energy. A quasistationary state of the system is obtained because of a balance between these terms after a spinup period. Addressing the problem of the capacity of a neural network to represent these heterogeneous processes in a model is one of the main objectives of this work.
The observations used to constraint the neural network are generated by the fullyspecified mode. In a real test case, we should rely on external data as discussed in the introduction.
3 Methods
3.1 Neural Network
A convolutional neural network (CNN) is used; it has shown abilities to represent highlevel patterns in a image (Goodfellow et al., 2016). In our case, this skill helps us to represent differential operators (e.g. Laplace operator) and nonlinear functions. It is constituted with successive layers; each layer computes convolutions:
(6) 
where:

is the index of a grid point in the domain

is the index of the convolution performed by layer ().

is the computed result of the convolution of the layer

is a set of index points in the vicinity of the point . It corresponds the size of the convolution kernel (typically )

is the values (or socalled weights) of the convolution kernel.

is a nonlinear function that introduces nonlinearities in the computation. Except for the last layer where is the identity function , the chosen function is the socalled rectifier function :
The last layer (the output layer) gives an estimation of and . Each layer takes inputs from the preceding layer , except for the first layer that takes as input the state variables of the model (see Eq. 4).
A CNN can thus be represented as a function where represents the weights of all the convolution kernels. The learning phase of the neural networks consists in adjusting in order to minimize the discrepancy between and (see Eq. 4). The minimization is performed iteratively given a training dataset of matching examples (,).
The selection of the architecture (number, size of layers, size of the kernels, activation functions, …) of our CNN is the result of a crossvalidation process. Its parameters are summarized in Table
1. Note that the first layer of the neural network uses kernel of size . It makes our model able to mimic firstorder finite differences. Had the scheme we were trying to approximate corresponded to a higher order of finite differences, we should have used a larger convolution filter size, or have added extra convolution layers. The last layer is equivalent to a dense connected layer with shared weights at each grid point. In our case, we have used the same CNN architecture for estimating and .Input Size  

Number of layers  2 
Number of units in each layer  32, 1 
Size of the kernel in each layer  , 
Activation function in each layer  ReLU, linear 
Output size  
Total number of weights  929 
3.2 Datasets
Several datasets were generated using the fullyspecified shallowwater numerical model in order to train the neural networks and test their generalization capacity. These datasets are summarized in Tab. 2. In each dataset, there are two subsets: (i) the subset representing matches between and and (ii) the subset representing matches between and .
Two training sets are used for training. Dataset train_nonoise represents the ideal case in which the knowledge of the matching between and is known with a perfect accuracy. The examples are 480 complete fields of (,) computed from Eq. 4 during a 40 years simulation by extracting snapshots at a frequency of approximately 1 month. Dataset train_noise01 represents a case in which the matching is imperfectly known. The snapshots are the same as for train_nonoise with an added noise for each parameter of the dataset:
(7) 
where stands for a variable in the dataset ,
is a random value drawn for a Normal distribution of mean 0 and standard deviation 0.1 and
is the standard deviation in the dataset of the corresponding variable .We aim at testing the skill of the CNN compared to a fullyknown numerical model. Therefore the test datasets do not contain noise. The first dataset test_windstd is produced exactly in the same conditions as train_nonoise except that the initial conditions of the simulation are taken to be the final state of the simulation used to generate train_nonoise. It ensures that the model states used for testing are different than those used for training.
In order to test the generalization capacity of the hybrid model, a second dataset test_windlow was produced. It was obtained by changing the wind forcing compared to the first dataset: the wind intensity in Eq.5 is set to . In this case, the hybrid model is forced by a wind forcing that was never encountered during the training. Note that, among the parameters, has the strongest impact on the system.
3.3 Validation diagnostics
Due to the large variability and the chaotic nature of the model, we cannot compare a reference simulation with the neuralnetwork model on a term by term basis. To validate a model run, then, some mean conservative quantities that represent the physical characteristics of the system being modeled are computed : the Kinetic Energy (KE), the Potential Energy (PE) and the potential vorticity (PV):
(8)  
where is the surface of the domain. The evolution of these four quantities in the fully specified model (using Eq. 4) is compared with their evolution in the hybrid model.
4 Results
4.1 Training and testing the neural networks
For each target parameter , , one training was performed for each of the training datasets described in Table 2. Thus, a total of 4 neural networks were trained. The performances, calculated on the test datasets, are shown in Table 3.
For each training, the a priori performances show that the neural network is able to reproduce and . Note that is easier to reproduce than because the meridional component of the wind stress is null. This leads to a significantly lower RMSE for than for . Neither the addition of noise in the training set, nor the test with a different wind intensity significantly degrade the accuracy of the CNN.
Train set  Test stet  parameter  RMSE  corr. 
train_nonoise  test_windstd  1.000  
train_nonoise  test_windstd  1.000  
train_noise01  test_windstd  0.9999  
train_noise01  test_windstd  0.9992  
train_nonoise  test_windlow  0.9998  
train_nonoise  test_windlow  1.000  
train_noise01  test_windlow  0.9997  
train_noise01  test_windlow  0.9987  
Training set used for the neural network  
predicted parameter  
rootmeansquareerror  
correlation coefficient 
4.2 Simulation using the hybrid model
Using the neural network trained with dataset train_noise01, two 10 year simulations with different wind forcing ( = 0.1 and = 0.15 in Eq. 5) were made with the hybrid model (Eq. 1). We recall that the forcing ( = 0.1) was never encountered during the training phase of the CNN; thus it allows us to test the generalization skill of the hybrid model.
As a comparison, the fullyspecified numerical model was also run using the analytical expression in Eq. 4 which is considered as our ”true” reference simulation.
To ensure that the simulation does not diverge, it is essential to impose the boundary conditions to the datadriven terms in the hybrid model. This essential point is illustrated in Fig. 1. In one case (red curve), the hybrid model is used in the whole domain (including the boundaries) to predict and . In the other case, the velocity orthogonal to the border of the domain is set to zero. Even though the unconstrained neuralnet is giving a very small error for boundary velocity (root mean square error of ), this error suffices in damaging the mass conservation and makes the CNN diverge after 23 months of simulation. Fig. 1 also shows that the hybrid model presents forecast skills (up to 1 month for the unconstrained model, and up to 56 months for the constrained model). In the following, only the constrained model was used.
For different experiments, the thickness averaged over 10 years is shown in Figure 2. The conservative parameters defined in Eq. 3.3
were computed every 5 months of the 10 year simulation then their time average (with confidence interval at 99%) are shown in Table
4. First, the two simulations with different wind forcing exhibit significant differences for all the conservative parameters (with a 99% confidence level) and the mean state. In comparison, differences between the reference simulation and the hybrid model simulation are not significant (with a 99% confidence level). Therefore, all the properties considered to characterize the simulation are well reproduced by the neural networks simulations.Even though the hybrid model was forced with a wind intensity that was never ”seen” during the training phase, the diagnostic properties are fairly reproduced. This shows that the neural network model has generalization skills.
Parameter  ref  hybrid  ref  hybrid 

KE  
PV  
PE 
5 Conclusion
In this work, it has been shown that is is possible to represent missing parts of a numerical model using a neural network. This was done by creating a hybrid model that simulated a shallowwater model. We showed that the hybrid model, used for a long term simulation, produced outputs with the same physical properties as the reference physical model. The focus of this paper was on long term simulations, but it was shown that the hybrid model also has some shortterm predictive skills up to few months.
One important feature to be stressed is the necessity to add some physical constraints to the neural network output (in our case boundary conditions) to ensure the convergence of the neural network model on longterm simulations. The addition of physical constraints is the only point of the methodology that is specific to our shallowwater application. The approach can be seen as a more general framework to produce hybrid models. Our experiments suggest that if we aim at short term simulations, the definition of such constraints may not be necessary.
The possibility of merging a physical based and a data based model depends strongly on the availability of data and uncertainties in our numerical model. In some cases, observations are indirect or incomplete. So it can be necessary to apply inverse methods or interpolation techniques to produce the data used as training by the neural network. These methods have they own inaccuracies, but it has been shown that neural networks were able to learn the unknown process in the presence of noise.
The hybrid model has demonstrated its ability to reproduce long term simulation in conditions similar to what was encountered during the training phase. But more importantly, even if the exterior forcing (in our case, the wind stress) differs from the one used in the training data, the neural network model still reproduces, with no significant difference, the characteristics of the physical system.
Acknowledgements.
JB is grateful to M. Bocquet, A. Carrassi and L. Bertino for useful discussions. This work was funded by the APPLEDOM project PNTS20182. JB has been funded by the project REDDA (#250711) of the Norwegian Research Council.References
 Arakawa et al. (1981) Arakawa, A., V. R. Lamb, A. Arakawa, and V. R. Lamb (1981), A Potential Enstrophy and Energy Conserving Scheme for the Shallow Water Equations, Monthly Weather Review, 109(1), 18–36, doi:10.1175/15200493(1981)109¡0018:APEAEC¿2.0.CO;2.
 Asselin (2007) Asselin, R. (2007), Frequency Filter for Time Integrations, Monthly Weather Review, 100(6), 487–490, doi:10.1175/15200493(1972)100¡0487:fffti¿2.3.co;2.
 Bolton and Zanna (2019) Bolton, T., and L. Zanna (2019), Applications of Deep Learning to Ocean Data Inference and SubGrid Parameterisation, Journal of Advances in Modeling Earth Systems, doi:10.31223/OSF.IO/T8UHK.
 De Bézenac (2017) De Bézenac, E. (2017), Deep learning for physical processes: Application to predicting ocean dynamics, Master’s thesis, LIP6, University Pierre and Marie Curie.
 Eyring et al. (2016) Eyring, V., S. Bony, G. A. Meehl, C. A. Senior, B. Stevens, R. J. Stouffer, and K. E. Taylor (2016), Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization, Geoscientific Model Development, 9(5), 1937–1958, doi:10.5194/gmd919372016.
 Fablet et al. (2017) Fablet, R., S. Ouala, and C. Herzet (2017), Bilinear residual Neural Network for the identification and forecasting of dynamical systems.
 Goodfellow et al. (2016) Goodfellow, I., Y. Bengio, and A. Courville (2016), Deep Learning, MIT Press, http://www.deeplearningbook.org.
 Krysta et al. (2011) Krysta, M., E. Blayo, E. Cosme, and J. Verron (2011), A Consistent Hybrid VariationalSmoothing Data Assimilation Method: Application to a Simple ShallowWater Model of the Turbulent Midlatitude Ocean, Monthly Weather Review, 139(11), 3333–3347, doi:10.1175/2011mwr3150.1.
 Kuramoto and Tsuzuki (1976) Kuramoto, Y., and T. Tsuzuki (1976), Persistent propagation of concentration waves in dissipative media far from thermal equilibrium, Progress of theoretical physics, 55(2), 356–369.
 LeCun et al. (2015) LeCun, Y., Y. Bengio, and G. Hinton (2015), Deep learning, Nature, 521(7553), 436–444.
 Lorenz (1963) Lorenz, E. N. (1963), Deterministic nonperiodic flow, Journal of the atmospheric sciences, 20(2), 130–141.
 Lorenz and Emanuel (1998) Lorenz, E. N., and K. A. Emanuel (1998), Optimal sites for supplementary weather observations: Simulation with a small model, Journal of the Atmospheric Sciences, 55(3), 399–414.
 Madec et al. (2015) Madec, G., et al. (2015), NEMO ocean engine, Tech. rep., Institut PierreSimon Laplace.
 Pathak et al. (2017) Pathak, J., Z. Lu, B. R. Hunt, M. Girvan, and E. Ott (2017), Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data, Chaos, 27(12), 121,102, doi:10.1063/1.5010300.
 Pathak et al. (2018) Pathak, J., B. Hunt, M. Girvan, Z. Lu, and E. Ott (2018), ModelFree Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach, Physical Review Letters, 120(2), 024,102, doi:10.1103/PhysRevLett.120.024102.
 Randall et al. (2007) Randall, D. A., R. A. Wood, S. Bony, R. Colman, T. Fichefet, J. Fyfe, V. Kattsov, A. Pitman, J. Shukla, J. Srinivasan, R. J. Stouffer, A. Sumi, K. E. Taylor, E. Manzini, T. Matsuno, B. McAvaney, R. Wood, S. Bony, R. Colman, T. Fichefet, J. Fyfe, V. Kattsov, A. Pitman, J. Shukla, J. Srinivasan, R. Stouffer, A. Sumi, and K. Taylor (2007), Climate models and their evaluation, in Climate change 2007: The physical science basis. Contribution of Working Group I to the Fourth Assessment Report of the IPCC (FAR), pp. 589–662, Cambridge University Press.
 Rasp et al. (2018) Rasp, S., M. S. Pritchard, and P. Gentine (2018), Deep learning to represent subgrid processes in climate models., Proceedings of the National Academy of Sciences of the United States of America, 115(39), 9684–9689, doi:10.1073/pnas.1810286115.
 Schneider et al. (2017) Schneider, T., S. Lan, A. Stuart, and J. Teixeira (2017), Earth System Modeling 2.0: A Blueprint for Models That Learn From Observations and Targeted HighResolution Simulations, Geophysical Research Letters, 44(24), 396–12, doi:10.1002/2017GL076101.
 Shi et al. (2015) Shi, X., Z. Chen, H. Wang, D.Y. Yeung, W.K. Wong, and W.C. Woo (2015), Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting, doi:10.1074/jbc.M200827200.
Comments
There are no comments yet.