Representing ill-known parts of a numerical model using a machine learning approach

by   Julien Brajard, et al.

In numerical modeling of the Earth System, many processes remain unknown or ill represented (let us quote sub-grid processes, the dependence to unknown latent variables or the non-inclusion of complex dynamics in numerical models) but sometimes can be observed. This paper proposes a methodology to produce a hybrid model combining a physical-based model (forecasting the well-known processes) with a neural-net model trained from observations (forecasting the remaining processes). The approach is applied to a shallow-water model in which the forcing, dissipative and diffusive terms are assumed to be unknown. We show that the hybrid model is able to reproduce with great accuracy the unknown terms (correlation close to 1). For long term simulations it reproduces with no significant difference the mean state, the kinetic energy, the potential energy and the potential vorticity of the system. Lastly it is able to function with new forcings that were not encountered during the training phase of the neural network.



There are no comments yet.


page 1

page 2

page 3

page 4


Hybrid Forecasting of Chaotic Processes: Using Machine Learning in Conjunction with a Knowledge-Based Model

A model-based approach to forecasting chaotic dynamical systems utilizes...

Selective decay for the rotating shallow-water equations with a structure-preserving discretization

Numerical models of weather and climate critically depend on long-term s...

Learning Dynamical Systems from Partial Observations

We consider the problem of forecasting complex, nonlinear space-time pro...

Identification of Physical Processes and Unknown Parameters of 3D Groundwater Contaminant Problems via Theory-guided U-net

Identification of unknown physical processes and parameters of groundwat...

LoadCNN: A Efficient Green Deep Learning Model for Day-ahead Individual Resident Load Forecasting

Accurate day-ahead individual resident load forecasting is very importan...

Learning the Dynamics of Physical Systems from Sparse Observations with Finite Element Networks

We propose a new method for spatio-temporal forecasting on arbitrarily d...

ARISE: ApeRIodic SEmi-parametric Process for Efficient Markets without Periodogram and Gaussianity Assumptions

Mimicking and learning the long-term memory of efficient markets is a fu...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The temporal evolution of the atmosphere or ocean can be represented by a set of equations based on the fundamental laws of physics and empirical relations deduced from observations. These equations are complex and numerical methods have been developed to solve them. The quality and complexity of these numerical models has iteratively increased, culminating in the actual ocean or atmosphere general circulation models (Eyring et al., 2016). In oceanography, the most complex models currently use a grid with an horizontal resolution of about 10 km, (e.g. Madec et al. (2015)) with 50 vertical levels on a square basin with sides of 5000 km, corresponding to

grid points. When the model is run to simulate a year, with 1 hour time steps, the information brought by the numerical experiment, for a single variable, is contained in a vector of around


This numerical approach can be described using formal mathematical notations in the following way. A vector (typically ) defines the discretized state of the system. A dynamical model is a system of differential equations which can be solved to compute vectors for time index using the information already known: where is a highly complex function obtained by discretizing the basic equations of the model. If is derived from physical principles, the model is said to be ”physically driven”. The miss-representation of sub-scale phenomena and the imperfect resolution of some non-linear processes is an inherent part of the numerical approach which always has to make a trade-off between accuracy and computation cost. To remedy these deficiencies, the ill-resolved parts of the models are represented through parameterization schemes and simplified or empirical models (Randall et al., 2007).

In this paper we use machine learning and a data-driven approach to build a hybrid model, which combines physically and data driven terms, in order to reproduce the processes that are inaccurately represented in the original model. Several works, which used data to derive models based on machine learning and deep learning algorithms, have already been conducted. Approaches aiming at emulating an entire model given a set of perfect (without noise) observations  

(Pathak et al., 2017, 2018; Fablet et al., 2017) were applied on chaotic low-dimension dynamical systems (such as the 3 variable Lorenz model (Lorenz, 1963), the 40 variable Lorenz model (Lorenz and Emanuel, 1998) and the Kuramoto-Sivashinsky model (Kuramoto and Tsuzuki, 1976)). ”Physical-oriented” deep-learning architectures have also been proposed for forecasting or nowcasting with no explicit numerical models (De Bézenac, 2017; Shi et al., 2015)

. For perfect data, it has been proposed that machine learning approaches can help estimating unknown parameters or unknown parameterizations of a numerical model 

(Schneider et al., 2017). Deep-learning was also applied to infer sub-grid parameterizations  (Bolton and Zanna, 2019; Rasp et al., 2018).

Our approach aims at representing the ill-known part of a numerical model. The main contributions of this paper are to address the case of noisy data, to aim at conducting long-term simulations that conserves the properties of the underlying dynamics and to evaluate the generalization skill of the hybrid model. We show that it is necessary to add physical constraints to the hybrid model in order to achieve these objectives.

In the machine learning approach, we rewrite the numerical model


where represents the the ill-known part of the numerical model at iteration . It can be estimated using an empirical model; for this we suppose the estimator of depends on the state of system:



is a unknown function that the machine learning algorithm must determine by means of non-linear regressions in a high dimensional space. In the following we will consider the particular case in which

depends only on the previous state: The function and thus can be estimated thanks to external data coming from two sources:

  • Direct or indirect observations of the system (e.g. in-situ observations or satellite data)

  • High-resolution simulations (they can be carried out to resolve the ill-resolved processes on coarser resolutions).

Note that the access of these data are limited in space and time. In this paper we firstly use external data (see section 2.2) to calibrate the function and secondly integrate the neural-network model in absence of these external data.

In the last years, big data and machine learning techniques (deep-learning, neural networks, …) have shown impressive skills in forecasting and classifying complex behaviors 

(LeCun et al., 2015)

. In this work, we use a convolutional neural network 

(Goodfellow et al., 2016) to compute the function (see Eq. 4). We try to evaluate the capacity of the neural-network model to produce long-term simulations similar to those produced by a medium-complexity shallow-water model (described in section 2). The computations are constrained by weakly noised observations of model outputs.

The paper is organized as follows. Section 2 presents the physical model and section 3 the method. The last section shows the results based on synthetic data generated with or without noise, with two different wind forcings.

2 The model and the reference experiment

2.1 The shallow-water model

We consider a one and a half layer shallow-water model in the plane. Noting and the zonal and meridional coordinates we have


The vector contains the dissipation, the diffusion, and the forcing and is assumed to be unknown; , are the zonal and meridional velocities, the thickness anomaly of the active layer, the Coriolis parameter (, , =), is the vorticity, is the reduced gravity, and is the mean thickness of the active layer. The domain size is with a grid point of .

The equations are discretized on a C-grid (Arakawa et al., 1981) with a resolution of 20 km, largely inferior to the mean Rossby deformation radius of the model, around 620 km. A leap-frog scheme for temporal discretization is used, combined with an Asselin filter (Asselin, 2007) with a time step of . This produces a recurrence relation similar to the one summarized in equation (1).

2.2 Reference experiment

In order to train our neural network (see Section 3.1) to determine the function and thus to reproduce , we analytically define a simulated ”truth”:


where is the Laplace operator, , , . The meridional wind stress is null and the zonal wind stress is defined by :


where: is in the standard case equal to ( km and ).

The fully specified model (Eq. 2.1 and Eq. 4) is the same as in Krysta et al. (2011) with changes in parameter values to ensure the presence of typical dynamical structures as eddies. No slip boundary conditions are applied.

Note that the unknown terms in our model are diverse; each has a different impact on the evolution of the system’s state. On the one hand the wind forcing varies in space and structures the mean state of the shallow-water model. On the other hand, the dissipation and diffusion depend on the state variable and dissipate energy. A quasi-stationary state of the system is obtained because of a balance between these terms after a spin-up period. Addressing the problem of the capacity of a neural network to represent these heterogeneous processes in a model is one of the main objectives of this work.

The observations used to constraint the neural network are generated by the fully-specified mode. In a real test case, we should rely on external data as discussed in the introduction.

3 Methods

3.1 Neural Network

A convolutional neural network (CNN) is used; it has shown abilities to represent high-level patterns in a image (Goodfellow et al., 2016). In our case, this skill helps us to represent differential operators (e.g. Laplace operator) and non-linear functions. It is constituted with successive layers; each layer computes convolutions:



is the index of a grid point in the domain

is the index of the convolution performed by layer ().

is the computed result of the convolution of the layer

is a set of index points in the vicinity of the point . It corresponds the size of the convolution kernel (typically )

is the values (or so-called weights) of the convolution kernel.

is a non-linear function that introduces non-linearities in the computation. Except for the last layer where is the identity function , the chosen function is the so-called rectifier function :

The last layer (the output layer) gives an estimation of and . Each layer takes inputs from the preceding layer , except for the first layer that takes as input the state variables of the model (see Eq. 4).

A CNN can thus be represented as a function where represents the weights of all the convolution kernels. The learning phase of the neural networks consists in adjusting in order to minimize the discrepancy between and (see Eq. 4). The minimization is performed iteratively given a training dataset of matching examples (,).

The selection of the architecture (number, size of layers, size of the kernels, activation functions, …) of our CNN is the result of a cross-validation process. Its parameters are summarized in Table 

1. Note that the first layer of the neural network uses kernel of size . It makes our model able to mimic first-order finite differences. Had the scheme we were trying to approximate corresponded to a higher order of finite differences, we should have used a larger convolution filter size, or have added extra convolution layers. The last layer is equivalent to a dense connected layer with shared weights at each grid point. In our case, we have used the same CNN architecture for estimating and .

Input Size
Number of layers 2
Number of units in each layer 32, 1
Size of the kernel in each layer ,
Activation function in each layer ReLU, linear
Output size
Total number of weights 929
Table 1: Architecture of the neural-net

3.2 Datasets

Several datasets were generated using the fully-specified shallow-water numerical model in order to train the neural networks and test their generalization capacity. These datasets are summarized in Tab. 2. In each dataset, there are two subsets: (i) the subset representing matches between and and (ii) the subset representing matches between and .

Two training sets are used for training. Dataset train_nonoise represents the ideal case in which the knowledge of the matching between and is known with a perfect accuracy. The examples are 480 complete fields of (,) computed from Eq. 4 during a 40 years simulation by extracting snapshots at a frequency of approximately 1 month. Dataset train_noise01 represents a case in which the matching is imperfectly known. The snapshots are the same as for train_nonoise with an added noise for each parameter of the dataset:


where stands for a variable in the dataset ,

is a random value drawn for a Normal distribution of mean 0 and standard deviation 0.1 and

is the standard deviation in the dataset of the corresponding variable .

We aim at testing the skill of the CNN compared to a fully-known numerical model. Therefore the test datasets do not contain noise. The first dataset test_windstd is produced exactly in the same conditions as train_nonoise except that the initial conditions of the simulation are taken to be the final state of the simulation used to generate train_nonoise. It ensures that the model states used for testing are different than those used for training.

In order to test the generalization capacity of the hybrid model, a second dataset test_windlow was produced. It was obtained by changing the wind forcing compared to the first dataset: the wind intensity in Eq.5 is set to . In this case, the hybrid model is forced by a wind forcing that was never encountered during the training. Note that, among the parameters, has the strongest impact on the system.

Name Size Notes
train_nonoise 480 Eq. 4
train_noise01 480 Eq. 7
test_windstd 120 Eq. 4
test_windlow 120 Eq. 5 with
Number of snapshots contained in the dataset
Table 2: Summary of the datasets used for training and testing the neural networks.

3.3 Validation diagnostics

Due to the large variability and the chaotic nature of the model, we cannot compare a reference simulation with the neural-network model on a term by term basis. To validate a model run, then, some mean conservative quantities that represent the physical characteristics of the system being modeled are computed : the Kinetic Energy (KE), the Potential Energy (PE) and the potential vorticity (PV):


where is the surface of the domain. The evolution of these four quantities in the fully specified model (using Eq. 4) is compared with their evolution in the hybrid model.

4 Results

4.1 Training and testing the neural networks

For each target parameter , , one training was performed for each of the training datasets described in Table 2. Thus, a total of 4 neural networks were trained. The performances, calculated on the test datasets, are shown in Table 3.

For each training, the a priori performances show that the neural network is able to reproduce and . Note that is easier to reproduce than because the meridional component of the wind stress is null. This leads to a significantly lower RMSE for than for . Neither the addition of noise in the training set, nor the test with a different wind intensity significantly degrade the accuracy of the CNN.

Train set Test stet parameter RMSE corr.
train_nonoise test_windstd 1.000
train_nonoise test_windstd 1.000
train_noise01 test_windstd 0.9999
train_noise01 test_windstd 0.9992
train_nonoise test_windlow 0.9998
train_nonoise test_windlow 1.000
train_noise01 test_windlow 0.9997
train_noise01 test_windlow 0.9987
Training set used for the neural network
predicted parameter
correlation coefficient
Table 3: A-priori performance of the neural networks on test-dataset

4.2 Simulation using the hybrid model

Using the neural network trained with dataset train_noise01, two 10 year simulations with different wind forcing ( = 0.1 and = 0.15 in Eq. 5) were made with the hybrid model (Eq. 1). We recall that the forcing ( = 0.1) was never encountered during the training phase of the CNN; thus it allows us to test the generalization skill of the hybrid model.

As a comparison, the fully-specified numerical model was also run using the analytical expression in Eq. 4 which is considered as our ”true” reference simulation.

Figure 1: Evolution in time of the Potential Energy for the reference model (in blue), the hybrid model with no boundary constraints (in red) and the hybrid model with imposed strictly null velocity orthogonal to the border. Note that the scale of the y-axis is not regular in order to visualize the different orders of magnitude. The unconstrained simulation after 8 months is not shown because NaN (Not A Number) values were encountered.

To ensure that the simulation does not diverge, it is essential to impose the boundary conditions to the data-driven terms in the hybrid model. This essential point is illustrated in Fig. 1. In one case (red curve), the hybrid model is used in the whole domain (including the boundaries) to predict and . In the other case, the velocity orthogonal to the border of the domain is set to zero. Even though the unconstrained neural-net is giving a very small error for boundary velocity (root mean square error of ), this error suffices in damaging the mass conservation and makes the CNN diverge after 2-3 months of simulation. Fig. 1 also shows that the hybrid model presents forecast skills (up to 1 month for the unconstrained model, and up to 5-6 months for the constrained model). In the following, only the constrained model was used.

For different experiments, the thickness averaged over 10 years is shown in Figure 2. The conservative parameters defined in Eq. 3.3

were computed every 5 months of the 10 year simulation then their time average (with confidence interval at 99%) are shown in Table 

4. First, the two simulations with different wind forcing exhibit significant differences for all the conservative parameters (with a 99% confidence level) and the mean state. In comparison, differences between the reference simulation and the hybrid model simulation are not significant (with a 99% confidence level). Therefore, all the properties considered to characterize the simulation are well reproduced by the neural networks simulations.

Even though the hybrid model was forced with a wind intensity that was never ”seen” during the training phase, the diagnostic properties are fairly reproduced. This shows that the neural network model has generalization skills.

Figure 2: Mean state of for the hybrid model simulation (left panel), reference simulation (central panel) and the relative difference between both (right panel). The upper row is the simulation performed with =0.15 (used also during the training) and the lower row is the simulation performed with
Parameter ref hybrid ref hybrid
Table 4: Long-term diagnostics of the simulations for the reference model (ref) and the hybrid model (hybrid). For the definitions of the parameters, see Eq. 3.3. The confidence intervals are computed with a confidence level of 99%

5 Conclusion

In this work, it has been shown that is is possible to represent missing parts of a numerical model using a neural network. This was done by creating a hybrid model that simulated a shallow-water model. We showed that the hybrid model, used for a long term simulation, produced outputs with the same physical properties as the reference physical model. The focus of this paper was on long term simulations, but it was shown that the hybrid model also has some short-term predictive skills up to few months.

One important feature to be stressed is the necessity to add some physical constraints to the neural network output (in our case boundary conditions) to ensure the convergence of the neural network model on long-term simulations. The addition of physical constraints is the only point of the methodology that is specific to our shallow-water application. The approach can be seen as a more general framework to produce hybrid models. Our experiments suggest that if we aim at short term simulations, the definition of such constraints may not be necessary.

The possibility of merging a physical based and a data based model depends strongly on the availability of data and uncertainties in our numerical model. In some cases, observations are indirect or incomplete. So it can be necessary to apply inverse methods or interpolation techniques to produce the data used as training by the neural network. These methods have they own inaccuracies, but it has been shown that neural networks were able to learn the unknown process in the presence of noise.

The hybrid model has demonstrated its ability to reproduce long term simulation in conditions similar to what was encountered during the training phase. But more importantly, even if the exterior forcing (in our case, the wind stress) differs from the one used in the training data, the neural network model still reproduces, with no significant difference, the characteristics of the physical system.

JB is grateful to M. Bocquet, A. Carrassi and L. Bertino for useful discussions. This work was funded by the APPLE-DOM project PNTS-2018-2. JB has been funded by the project REDDA (#250711) of the Norwegian Research Council.


  • Arakawa et al. (1981) Arakawa, A., V. R. Lamb, A. Arakawa, and V. R. Lamb (1981), A Potential Enstrophy and Energy Conserving Scheme for the Shallow Water Equations, Monthly Weather Review, 109(1), 18–36, doi:10.1175/1520-0493(1981)109¡0018:APEAEC¿2.0.CO;2.
  • Asselin (2007) Asselin, R. (2007), Frequency Filter for Time Integrations, Monthly Weather Review, 100(6), 487–490, doi:10.1175/1520-0493(1972)100¡0487:fffti¿;2.
  • Bolton and Zanna (2019) Bolton, T., and L. Zanna (2019), Applications of Deep Learning to Ocean Data Inference and Sub-Grid Parameterisation, Journal of Advances in Modeling Earth Systems, doi:10.31223/OSF.IO/T8UHK.
  • De Bézenac (2017) De Bézenac, E. (2017), Deep learning for physical processes: Application to predicting ocean dynamics, Master’s thesis, LIP6, University Pierre and Marie Curie.
  • Eyring et al. (2016) Eyring, V., S. Bony, G. A. Meehl, C. A. Senior, B. Stevens, R. J. Stouffer, and K. E. Taylor (2016), Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization, Geoscientific Model Development, 9(5), 1937–1958, doi:10.5194/gmd-9-1937-2016.
  • Fablet et al. (2017) Fablet, R., S. Ouala, and C. Herzet (2017), Bilinear residual Neural Network for the identification and forecasting of dynamical systems.
  • Goodfellow et al. (2016) Goodfellow, I., Y. Bengio, and A. Courville (2016), Deep Learning, MIT Press,
  • Krysta et al. (2011) Krysta, M., E. Blayo, E. Cosme, and J. Verron (2011), A Consistent Hybrid Variational-Smoothing Data Assimilation Method: Application to a Simple Shallow-Water Model of the Turbulent Midlatitude Ocean, Monthly Weather Review, 139(11), 3333–3347, doi:10.1175/2011mwr3150.1.
  • Kuramoto and Tsuzuki (1976) Kuramoto, Y., and T. Tsuzuki (1976), Persistent propagation of concentration waves in dissipative media far from thermal equilibrium, Progress of theoretical physics, 55(2), 356–369.
  • LeCun et al. (2015) LeCun, Y., Y. Bengio, and G. Hinton (2015), Deep learning, Nature, 521(7553), 436–444.
  • Lorenz (1963) Lorenz, E. N. (1963), Deterministic nonperiodic flow, Journal of the atmospheric sciences, 20(2), 130–141.
  • Lorenz and Emanuel (1998) Lorenz, E. N., and K. A. Emanuel (1998), Optimal sites for supplementary weather observations: Simulation with a small model, Journal of the Atmospheric Sciences, 55(3), 399–414.
  • Madec et al. (2015) Madec, G., et al. (2015), NEMO ocean engine, Tech. rep., Institut Pierre-Simon Laplace.
  • Pathak et al. (2017) Pathak, J., Z. Lu, B. R. Hunt, M. Girvan, and E. Ott (2017), Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data, Chaos, 27(12), 121,102, doi:10.1063/1.5010300.
  • Pathak et al. (2018) Pathak, J., B. Hunt, M. Girvan, Z. Lu, and E. Ott (2018), Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach, Physical Review Letters, 120(2), 024,102, doi:10.1103/PhysRevLett.120.024102.
  • Randall et al. (2007) Randall, D. A., R. A. Wood, S. Bony, R. Colman, T. Fichefet, J. Fyfe, V. Kattsov, A. Pitman, J. Shukla, J. Srinivasan, R. J. Stouffer, A. Sumi, K. E. Taylor, E. Manzini, T. Matsuno, B. McAvaney, R. Wood, S. Bony, R. Colman, T. Fichefet, J. Fyfe, V. Kattsov, A. Pitman, J. Shukla, J. Srinivasan, R. Stouffer, A. Sumi, and K. Taylor (2007), Climate models and their evaluation, in Climate change 2007: The physical science basis. Contribution of Working Group I to the Fourth Assessment Report of the IPCC (FAR), pp. 589–662, Cambridge University Press.
  • Rasp et al. (2018) Rasp, S., M. S. Pritchard, and P. Gentine (2018), Deep learning to represent subgrid processes in climate models., Proceedings of the National Academy of Sciences of the United States of America, 115(39), 9684–9689, doi:10.1073/pnas.1810286115.
  • Schneider et al. (2017) Schneider, T., S. Lan, A. Stuart, and J. Teixeira (2017), Earth System Modeling 2.0: A Blueprint for Models That Learn From Observations and Targeted High-Resolution Simulations, Geophysical Research Letters, 44(24), 396–12, doi:10.1002/2017GL076101.
  • Shi et al. (2015) Shi, X., Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-C. Woo (2015), Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting, doi:10.1074/jbc.M200827200.