1 Introduction and motivations
Model selection (Baum et al., 1970) is the problem of defining and using metrics to compare different models, or different versions of the same model that may differ by the model’s parameters, its settings or its boundary conditions. Model selection, though, has received little attention from the geoscientific community, as compared to data assimilation (DA), which is by now a wellestablished field in the geosciences (Daley, 1991; Ghil and MalanotteRizzoli, 1991; Bennett, 1992; Kalnay, 2003; Asch et al., 2016), and which has focused so far mainly on estimating highdimensional states of the atmosphere (Whitaker et al., 2008; Buehner et al., 2010), of the ocean (Lermusiaux, 2006; Sakov et al., 2012) and of the climate system as a whole (Saha et al., 2010; Saha et al., 2014). Even nowadays, a common practice to evaluate models at operational centers is to compare visually model outputs to observations or, more quantitatively, to compute the modeltodata rootmeansquare error (RMSE).
Some attempts to analytically address the model selection problem have been successful (e.g., Winiarek et al., 2011) but only under specific assumptions that may not be realistic for largedimensional geoscientific applications. Numerical methodologies that may be computationally affordable for highdimensional problems have also been proposed in recent years (Elsheikh et al., 2014a, b; Otsuka and Miyoshi, 2015; Reich and Cotter, 2015; Carson et al., 2017; Carrassi et al., 2017). Several of these approaches advocate the use of the marginal likelihood of the data — also called model evidence — for model selection. Along these lines, the contextual formulation of model evidence (CME), first proposed by Hannart et al. (2016b) and fleshed out by Carrassi et al. (2017), exploits DA techniques to approach the model selection problem.
Carrassi et al. (2017) provide the analytic formulae to compute the CME using several stateoftheart Gaussian ensemble DA methods and demonstrate the benefits of CME as a model selecting tool. Their study thus started to answer the first theoretical and practical research questions that had to be addressed to make the CME suitable for model selection operationally in weather forecasting centers.
Nevertheless, a number of issues still remain to be tackled in order to proceed to the operational implementation of the CME. Among these, one key issue is the use of spatial localization — called henceforth simply localization — in ensemblebased DA. Indeed, when applied to largedimensional models, the ensemble based DA methods of Carrassi et al. (2017) to compute CME would fail unless localization is properly incorporated. Even though localization is a widely established practice in ensemble DA, and has also generated a substantial literature (e.g., Houtekamer and Mitchell, 2001; Hamill et al., 2001; Ott et al., 2004 and Sakov and Bertino, 2011), it is still matter of intense research.
Localization originally arisen as a countermeasure to the sampling issue due to our attempt to describe uncertainty in highdimension using a very limited ensemble. Help in overcoming this problem comes from the fact that many physical systems, and notably the atmosphere and ocean, tend to have small longdistance correlations, i.e., the actual error covariances are sparse (e.g., Ghil et al., 1979; Balgovind et al., 1983). For such systems, localization can – without degrading too severely the physical systems’ representation – set longdistance correlations to zero and only model the closetodiagonal terms in the covariance matrix.
Many localization techniques exist to achieve sufficiently accurate approximations of the covariance matrix along these lines. These techniques, though, do lead to a change in the structure of the matrices manipulated during the DA process. Hence, in the CME context, the challenge is to adapt its original formulation to the need for localization in the underlying DA process. This study aims, therefore, to extend CME theory, as presented in Carrassi et al. (2017), to include localization and to apply it to a quasirealistic atmospheric model of intermediate complexity.
The structure of the paper is as follow. Section 2 recalls the benefits of approaching model evidence using the DA framework. A more extensive discussion of these benefits can be found in Hannart et al. (2016b). Section 3 first introduces the notion of localization in general, and the domain localization technique in particular. The application of domain localization in the CME computation is hereafter referred to as the domainlocalized CME (DLCME), and it is also presented in Section 3. Section 4 implements the two CME formulations (with and without localization) and compares them to the RMSE as model selection indicators with the RMSE in various numerical experiments based on the 40dimensional Lorenz95 model. Section 5 applies these indicators to a parameter selection problem using the simplified global atmospheric model called SPEEDY (Molteni, 2003; Kucharski, 2006). Section 6 finally provides a summary, conclusions and future directions.
2 Model evidence and data assimilation
Let us assume that a model describes an unknown process at time , as a dimensional state , that we only perceive via a set of partial, dimensional observations .
The model .
The model simulates the unknown process
(1) 
where propagates the state from time to time . The dimension of the model state can be decomposed as , where is the space of the physical variables and is the spatial resolution of the model.
In general,
is the model error — represented most often by an additive stochastic white noise — but here we shall assume that the model is perfect, i.e.,
. The implications of this assumption on the computation of the CME are substantial, cf. Carrassi et al. (2017), but we have reason to believe that it does not affect very much our conclusions about the impact of localization on the CME and its comparison with the RMSE. . This perfect model assumption will be removed in subsequent studies.The set of observations y.
The observation vector
is related to the state vector according to the observation model(2) 
Here is the observational operator at time and is the observation error, represented here as an additive stochastic white noise with zero mean and covariance , which is assumed to be constant in time.
2.1 CME for model selection
Using a model of type and the set of observations , the goal of model selection is to define a metric based on them, and apply it to identify the best model. Several metrics for model selection are defined in the literature (Akaike, 1974; Schwarz, 1978; Burnham and Anderson, 2002). In the present article, we focus on model evidence as the metric of choice (Elsheikh et al., 2014a, b; Otsuka and Miyoshi, 2015; Reich and Cotter, 2015; Carson et al., 2017; Carrassi et al., 2017).
2.1.1 Model evidence
Let us define the ideal infinite set of observations as . The marginal likelihood of , given , is calculated as
(3) 
This likelihood is also referred to as model evidence.
Note that, in the context of model evaluation, one should rather be interested in estimating the probability of the model conditioned to the observations
. Yet, model evidence can be used to construct a metric for model selection or model comparison between two models and . Too see this, let us write the ratio of the two posterior distributions, each one relative to models and respectively, conditioned on the same set of observations(4) 
Equation (4
) computes the ratio of the posterior model probability density functions (pdfs) and we see that it is proportional to the ratio of the model evidences – the Bayesian factor – times the ratio of the prior model pdfs. The latter are difficult to evaluate in practice, but, it is often legitimate to assume that the prior pdfs of the two models are the same, provided that we do not possess enough information to favor one model over the other. In this case, we see how the Bayesian factor provides an estimate of the model pdf ratio. In any case, the Bayesian factor is the scalar that updates the ratio of prior model pdfs to the ratio of posterior model pdfs.
In this sense, the ratio of the two model evidences can be considered as a selection indicator that favors one model or the other, depending on . More conveniently, one can look at the difference between their respective logarithms, and write the model selection indicator as
(5) 
We use the indicator in the following experiments, and the model selection criterion becomes .
2.1.2 Contextual model evidence
If the model is autonomous, one can condition model evidence to the initial observation of the set, and write
(6) 
where the explicit dependence on has been dropped on the righthand side, for the sake of clarity.
An alternative formulation of the model evidence, with also a potential computational gain, is to no longer try to estimate the climatological model evidence – which is difficult to estimate accurately and which does not inform on the present context of the system – the so called contextual model evidence (CME), (Carrassi et al., 2017). This new contextual view of model evidence implicitly assumes that all information from past observations is conveyed by conditioning on .
Marginalizing over yields
(7) 
Therefore, the two terms needed to compute the CME are:

the conditional pdf, , and

the likelihood of the observations, .
The conditional pdf is the posterior pdf, or analysis, produced by the state estimation DA process. The observational likelihood is often a given quantity (or at least considered as such) of the state estimation DA problem. Hence, as shown in Carrassi et al. (2017), the CME can be obtained as a byproduct of a DA algorithm designed to assimilate the data y into the model .
2.2 The CME’s EnKFformulation
We present here the CME formulation based on the ensemble Kalman filter (EnKF, Evensen, 2009). The choice of the EnKF among other, possibly even more accurate, methods to compute CME, is motivated by its great ratio cost/benefit. It was shown in Carrassi et al. (2017), that it achieves very high skill in evaluating the CME, only slightly lower than other more sophisticated methods, but at a much lower computational cost.
The EnKFformulation of the CME comes as a byproduct of the EnKF DA process that gives an approximation of model evidence . Over the evidencing window , the CME is calculated as
(8) 
with , where is the prior error covariance matrix at time .
Two different proofs of the EnKFbased CME, Eq. 8, are proposed using a stochastic version of the EnKF in Hannart et al. (2016b) and a deterministic one in Carrassi et al. (2017), and are both refereed to as the global CME (GCME) hereafter.
An efficient alternative to compute the GCME is presented in Appendix A Alternative implementation of the global CME (GCME) and is used throughout the present article. This implementation stands on the assumption that R is diagonal, i.e., that the observations are mutually uncorrelated, an assumption, nonetheless, often made in the geosciences. A significant improvement in this alternative GCME implementation is its reducing the computational cost from depending on the size of the state vector to depending on the ensemble size , which is generally much lower, .
3 CME and domain localization
This section presents a new formulation of the CME that takes into account domain localization. We describe first, in section 3.1, how domain localization is implemented within the DA process, and then the domainlocalized CME (DLCME) in section 3.2.
3.1 Domain localization
3.1.1 The basic idea
Domain localization consists in updating separately, at each spatial gridpoint , all the physical variables , with , by assimilating only neighboring observations. This idea goes back to earlier forms of DA, like the successive correction method (Cressman, 1959; Ghil et al., 1979; Ghil and MalanotteRizzoli, 1991). We describe here the implementation known as local analysis (Houtekamer and Mitchell, 2001; Ott et al., 2002, 2004; Hunt et al., 2007).
The first step of local analysis consists in selecting the observations within a disk centered on the spatial gridpoint we wish to update; the radius of this disk is a predefined parameter , called the cutoff radius. This type of selection process is often called the boxcar scheme, as opposed to schemes that gradually taper off the weights given to observations, from to . Note that in general the cutoff radius depends on the physical variable to be analyzed.
We call the resulting reduced observation vector , and its size . It is then also necessary to restrict the observation error covariance matrix R to a local, and hence of smaller dimension, matrix . The resulting analysis after the first step alone may, however, present abrupt discontinuities when crossing the boundary of the observation disk and have, therefore, a deleterious sideeffect, whose elimination or mitigation requires a second analysis step.
The second step in local analysis applies a localization function to the inverse matrix , called precision matrix, and it produces a smoother global update. The localization function is in fact the Schur product (Hunt et al., 2007) of by the diagonal localization matrix giving
(9) 
where is equal to if and it decreases gradually to outside of the disk of radius , called the localization radius, where . Hence, the further an observation lies from
, the more its observation error variance is increased and its impact on the local analysis is decreased.
3.1.2 The LETKF
The ensemble transform Kalman filter (ETKF: Bishop et al., 2001) is a deterministic EnKF performing the analysis in the ensemble space. The ETKF computes the updated ensemble , with the ensemble members, such that
(10) 
Here and , and the analysis ensemble is given by a linear combination of the forecast ensemble mean and the forecast ensemble anomalies in . Finally, and is the identity matrix.
The domain localization applied to the ETKF gives the local ensemble transform Kalman filter (LETKF, Hunt et al., 2007). In the LETKF, a separate analysis is performed for each model gridpoint of the spatial grid and the resulting update affects each physical variable at . The DA is performed using localized observations, i.e., the observation precision matrix is .
3.2 Domainlocalized CME
In order to formulate the CME with domain localization, we start with the general case of an observation subvector of generic nature. In section 3.2.1, the case of the trivial evidencing window is treated first. Second, we address the inconsistency that arises when and show how to remove it by using realistic approximations. In section 3.2.2, we derive from the general case a formulation of the local CME. Finally, in section 3.2.3
, we propose a heuristic global model selection indicator based on local CMEs, the DLCME.
3.2.1 CME of an observation subvector
Let us consider first the problem of estimating the CME of , a subvector of the original observation vector . Such a subvector may be obtained either by selecting a subset of observation types or by a spatial localization or both. The corresponding CME is just a particular case of the original DA, i.e., of the DA for the complete observation vector, as given by Eq. ((7)), and can be obtained from it. Indeed, if the evidencing window has length , the contextual evidence of becomes
(11) 
Both pdfs in the integrand above are byproducts of the original DA: is the original posterior pdf at time and the subsampled likelihood also stems from the original DA process. Using the EnKF formalism, one can compute the CME for this subvector using Eq. (8) which in this case leads to
(12) 
Here , where , and are the observation operator, its linearization and the observation error covariance matrix corresponding to the observation subvector, respectively. Note that are the forecast products of the original DA.
If the evidencing window is longer, , we decompose the contextual evidence — as shown in Carrassi et al., 2017 — as
(13) 
The factor in Eq. (13) corresponds to the case and it can be computed from the original DA process using Eq. (12). The subsampled likelihoods are also easily computable since the likelihood of the observation is usually given in a stateestimation process.
The pdf of the system’s state, conditional on the observation subvectors for , however, requires a dedicated DA of the observation subvectors. It seems desirable to estimate these exact pdfs but the forecastassimilation process required to do so may — depending on the nature of the observation subvectors — involve an insufficiently observed system and the DA algorithm may, therefore, fail to converge (cf. section 5.1.3).
In such a situation, approximating by the original posterior pdf, , is a possibility when the evidencing window is short enough. In this case, the EnKFformulation can be used to compute the CME as
(14) 
where . Using instead of the above approximation the correct pdfs of the system’s state, conditional on the observation subvector , would require computing the adhoc in , instead of .
3.2.2 Local CME
In the same spirit as the local analysis above, we want to locally compute the CME at each spatial gridpoint . The reduced observation vectors can be considered as observation subvectors and we can use, therewith, the formalism of section 3.2.1.
Hence, for each spatial gridpoint , the local CME can be obtained from Eq. (14), yielding
(15) 
here ; and are, respectively, the restrictions of the nonlinear and the linearized observation operator to the neighborhood of . In other words, at the spatial point , the local CME returns the CME of the observations that have affected its analysis.
Remark that, in the case of domain localization, it is not necessary to approximate the conditional pdf of the system’s state on the observation subvectors by the original posterior pdf, since is already the local approximation of . Nonetheless, the observation vectors represent a tube of observations spread in time over the evidencing window but confined to the same local domain. Due, however, to the physical evolution of the error covariances, such a static tube may not be as relevant for DA and evidencing as would be a dynamic tube that follows local domains that are covariant with the flow (Bocquet, 2016). The implementation of such covariant localization techniques is not required, though, for the short evidencing windows used in the following, so that we apply Eq. (15) without such corrections.
3.2.3 The DLCME
Creating a global indicator from local CMEs is not theoretically straightforward. We propose instead a global heuristic indicator for model selection , that we call the DLCME,
(16) 
here are positive, model grid–dependent coefficients, inversely proportional to the localization radius, that weight the logarithms of the local CMEs, such that . For instance, in the L95 experiments of section 4, all the weights are set to be equal since the localization radius is constant throughout the domain. However, in the SPEEDY experiments (section 5), vary with the latitude as the number of gridpoints per local domain does.
3.3 Brief review of covariance localization.
Another widely used localization technique is covariance localization, also called the Schur localization (Houtekamer and Mitchell, 2001). This localization consists in restricting the spatial impact of the unrealistic correlations due to the subsampling of the ensemblebased covariances, by directly applying a smoothing function onto the forecast error covariance matrix.
This technique can be applied to the computation of the CME in the same vein as DLCME above, in order to take into account the actual covariances used by the DA. Details of the derivation of the CME formulation using covariance localization are presented in Appendix B Derivation of the CME formulation using covariance localization. The computational cost remains the same as for the original EnKFformulation plus the significant cost of a Schur product.
This alternative approach to CME localization can also be applied to largedimensional systems, but only if combined with sophisticated numerical techniques to make it computationally feasible. For this reason, covariance localization will not be discussed furthermore in the present article. Further studies investigating this matter should follow.
4 Numerical experiments with a toy model
The numerical experiments are designed to: (i) study numerically the performance of the global and local CME as a model selection indicator and to compare it with the RMSE that is widely used for the same purposes; and (ii) analyze the impact of domain localization in the computation of the CME. These experiments use the Lorenz95 (L95: Lorenz, 1996; Lorenz and Emanuel, 1998) model.
4.1 Experimental setup
4.1.1 The Lorenz95 model
The onedimensional L95 model is a minimal representation of atmospheric flow along a midlatitude zonal circle with variables . In this case, and , i.e., one single physical variable, such as temperature or vorticity, is defined at each grid point.
4.1.2 The DA implementation
Observations are generated by sampling the true trajectory , every DA interval , using Eq. (2) with and , so that the full system is observed. The correct observation error covariance, , is used in the DA. The experiments are run over DA cycles, after a spinup of DA cycles.
The DA method used throughout the present section is the LETKF described in section 3.1.2. The localization matrix, , is diagonal with coefficients defined by , where is the size of the reduced observation error covariance matrix, is the localization radius and is the GaspariCohn function (Gaspari and Cohn, 1999) displayed in Eq. (27) of Appendix B Derivation of the CME formulation using covariance localization.
The performance of the DA algorithm is evaluated using the RMSE of the analysis with respect to the truth
(18) 
with and being the true and estimated state. Here and elsewhere we follow the superscript notation proposed by Ide et al. (1997) for the truth, the analysis (i.e. the estimated state), the forecast and the observations.
The accuracy of the model selection is assessed based on the RMSE of the forecast with respect to the observations
(19) 
Model evidence, for both the GCME and the DLCME formulations, is evaluated over the evidencing window . Unless otherwise stated, we set here , which means that the model evidence is computed only for one DA cycle.
4.2 Numerical results
4.2.1 CME versus RMSE
In this subsection, we focus on the performance of the GCME (Eq. 8) — i.e., without localization — as the model selection indicator, and compare it with that of the RMSE.
For each model configuration, i.e. for each choice of the forcing F, a DA is performed using the setup described in section 4.1.2, with a 40member ensemble (), without localization and with a coefficient of inflation tuned to provide the smallest RMSE (Eq. 18) for each DA. At each cycle of the experiment, the GCME and RMSE indicators are computed.
Probability of selection.
We first look at the probability of selection of the correct model for each indicator; that is, where is computed by counting how many times each indicator chooses the correct model over the incorrect one, out of the total of
cycles. Thus, a random classifier will produce a probability of selection equal to
and a perfect classifier will produce one equal to . This allows us to evaluate the reliability of the indicators in terms of model selection. Then, by repeating the computation for different values of , we deduce the sensitivity of either indicator to the discrepancy  between correct and incorrect models.The probabilities of selection are given in Figure 1. The incorrect models are simulated with the forcings plotted on the axis. Note that the larger the forcing , the easier it should be to select the correct model over the incorrect one. Therefore, a good model selection indicator is the one that increases the fastest when is increased.
From Figure 1, we see that the RMSE selects the correct model over the incorrect one given by with a probability of that is slightly inferior to the GCME probability of . As the forcing of the incorrect model increases, both indicators increase their probabilities of selection but the growth of the GCME indicator is faster.
ROC curves and the Gini coefficient.
The probabilities of selection as computed in Figure 1 only evaluate the ability of the indicators to select one model over the other. Yet the values of these indicators also provide distances between the two models. The greater these distances, the greater the confidence in the corresponding model selection.
We define the confidence value in the model selection for the GCME indicator as the difference between the logarithms of the model evidences, introduced in section 2, namely
(20) 
with being the GCME value for the correct model and the GCME value for the incorrect one. Similarly, a confidence value is defined for the RMSE, as
(21) 
with RMSE, defined in Eq. (19), for the correct model and RMSE for the incorrect one.
In order to evaluate the ability of each indicator to select the correct model with various levels of confidence, we compute a Receiver Operating Characteristic (ROC) curve. A ROC curve measures the performance of a binary classifier; see, for instance, Metz (1978) for a review of ROC analysis.
In our study, we define the ROC curve as follows. If the confidence value (either or ) is positive, we consider the correct model to have been selected and the incorrect one if it is negative. Knowing that there is only one correct model, we define the selection of the latter as a true positive instance and the selection of the incorrect model as a false positive instance.
Note that, with this definition of the classifier, there is neither a true negative nor a false negative for which a confidence value is obtained. Still, this configuration allows us to define the True Positive Rate (TPR) and the False Positive Rate (FPR), two parametric functions of the threshold , giving the proportion of positives that are correctly identified as such (i.e., when ) and, respectively, the proportion of negatives wrongly identified as positives (i.e., when ). The ROC curve is then displayed by plotting parametrically the TPR() against the FPR() for varying thresholds .
The random classifier has a chance of selection of each model for every threshold and it will produce a diagonal ROC curve (black dashed line in Figure 2, 4 and 6), namely a straight line between the origin and the point , while a perfect classifier that always selects the correct model corresponds to a ROC curve equal to for and to for .
Figure 2 shows the ROC curves of the model selection indicator RMSE (solid red line with red circles) and the model selection indicator GCME (solid green line with squares), for two incorrect models: (a) the one closest to the correct value, i.e. ; and (b) and the one farthest from it, i.e. . Both indicators outperform random selection by exhibiting a higher TPR/FPR ratio throughout, for every threshold . Similarly to the probability of selection in Figure 1, the RMSE and the GCME are almost equally selective for the more difficult case of the incorrect model in panel (a), while the GCME indicator slightly outperforms the RMSE for in panel (b).
The Gini coefficient (Gini, 1921) provides a simple scalar that quantifies the improvement of a given classifier over the random one. This coefficient equals the area between the classifier’s ROC curve and the diagonal of the graph — which is the ROC curve of the random classifier — divided by the area between the latter and the axis. In other words, . Hence, the Gini coefficient for the random classifier is and the Gini coefficient for a perfect classifier is .
The Gini coefficients corresponding to the ROC curves in Figure 2 are reported in Table 1. In the case of the incorrect model with , the Gini coefficients of the RMSE and the GCME are very close, with only a difference in favor of the RMSE. The GCME outperforms the RMSE in the case of the incorrect model with : its Gini coefficient is higher by .
To summarize, results so far suggest that the GCME and the RMSE present comparable selection skills for similar models, such as in the case and , but the GCME converges faster to a perfect classifier as is increased.
4.2.2 CME with domain localization
In the previous subsection, a 40member ensemble was used to compute the model selection indicators. However, in geosciences applications, such a big ensemble, with comparable to the state vector size , is quite unrealistic. To mimic a more realistic scenario, we use here a 10member ensemble, i.e. , in conjunction with localization. Bocquet and Carrassi (2017) have shown that is smaller than the dimension of the unstable–neutral subspace below which localization is mandatory.
Here we use the localization radius , as it produces the smallest RMSE for the DA process applied to the incorrect models (not shown here). Since the localization is tuned for the incorrect model–DAs, but left untuned for the correct–model DA, the analysis errors of the two DAs should be closer to each other than if the localization were tuned independently for each model. This choice is expected to render the model selection task harder.
We compare then the selectivity of the DLCME to the GCME and the RMSE. Note that unless stated otherwise the same 10member ensemble is used to compute the three model selection indicators: DLCME, GCME and RMSE. As before, these three model selection indicators are computed at each of the cycles of the numerical experiment.
Probability of selection.
Figure 3 shows the probabilities of selection as a function of the incorrect models, with , for three model selection indicators: RMSE, GCME and DLCME. The probabilities for the RMSE and the GCME using the 40member ensemble, and displayed in Figure 1, are also reported in Figure 3 for comparison.
First, we note that the probabilities of selection for both RMSE and GCME are lower for than for . This means that the use of a smaller ensemble and no localization in the DA reduces the probabilities of a correct selection by the RMSE and the GCME and is a direct consequence of the reduced accuracy of the DA in performing the state estimate. However, the convergence rate of the GCME with is faster than for GCME and, when , they select the correct model with almost the same accuracy.
The key result, though, is related to the effect of localization. We see that for the same ensemble size , the DLCME outperforms both the RMSE and the GCME indicators. Notably, the DLCME with selects the correct model with a higher probability than the GCME with , except for the case of smallest error in model definition, i.e. for .
ROC curves and the Gini coefficient.
The previous conclusion is confirmed by the ROC curves in the cases of the incorrect models with and that are plotted, respectively, in Figures 4(a, b). The ROC curves of GCME have also been plotted in Figure 4 for reference and benchmark. The DLCME has a TPR/FPR ratio that is larger than the GCME for both incorrect models. This result represents a substantial improvement in the computation of the CME.
In the difficult case of , the DLCME with is not as accurate in model selection as the GCME but, as previously stated, the use of large ensembles is most often not feasible for models of a realistic size. Interestingly, in the case of , the DLCME still provides a more accurate model selection than the GCME.
These results are confirmed by the corresponding Gini coefficient listed in Table 2. In the case , the DLCME model selection skill outperforms the GCME, even though the DA performance for state estimation in terms of RMSE is better when with no localization than with localization (not shown).
4.2.3 Sensitivity to the localization radius
Even though only the DLCME takes into account the localization, the GCME and the RMSE are also potentially sensitive to the localization radius since they are based on the output of a DA cycle that uses localization. The way the three model selection indicators respond to the localization radius is examined in Figure 5.
The experimental setup is identical to that in the previous subsections and the ensemble size is . The localization radius is shown on the axis; while the Gini coefficient for each model selection indicator is plotted on the axis. Superimposed on Figure 5 are the RMSE curves for the DA with the correct and the incorrect model, in solid and dashed black lines, respectively.
The results in Figure 5 confirm, once more, the superiority of the two CME indicators over the RMSE. They also confirm the improvement achieved by taking into account domain localization in the CME computation, i.e., the better model selection by the DLCME with respect to the GCME; both improvements are present across all localization radii.
It is, moreover, worth noting that the sensitivity of all three indicators to the localization radius is small. For all the indicators, the Gini coefficient increases slowly with . This increase can be explained by the difference in performance of the DA’s state estimation. Indeed the difference between the two RMSE curves also increases with the localization radius. Hence, the selection between the two DAproduced ensembles becomes easier as increases. The key result, however, is that the sensitivities of the two CME indicators are approximately the same. Thus, taking the localization into account in the CME computation should improve the CME’s model selection skill, whatever the value of the localization radius.
4.2.4 Dependence on the evidencing window
In all previous experiments, the model selection indicators are computed over an evidencing window , i.e., using only one DA cycle. This choice was made to assess the indicators’ performance in the most difficult setting. Indeed, selecting a model based on static information alone is a difficult task that is not usually required in geosciences applications. In the latter, one may want to select the most accurate model representation of a meteorological event or a climatological time span, seen as a “video loop” or a “fulllength movie”, respectively (Ghil, 1997). We thus want now to evaluate the indicators in a more realistic setting, by comparing the correct model, with , and the most incorrect one being considered, with , over larger evidencing windows, namely and .
The ROC curves obtained for these two experiments, with and , are displayed in Figures 6(a) and (b), respectively. Comparing these curves with these for K=1 of Figure 4(b), we see that increasing the evidencing window improves the selection skills of all the indicators as they converge to the perfect classifier. This behavior is consistent with the results obtained by Carrassi et al. (2017) in the absence (of the need) of localization.
Moreover, in the case , the DLCME still outperforms the RMSE and the GCME. In the case , both CME indicators remain more accurate than the RMSE, while the DLCME is almost achieving a perfect selection. Altogether, the results in Figures 4(b), 6(a) and 6(b) indicate that when is increased all indicators converge to the perfect classifier. It follows that, for small and medium evidencing windows, the DLCME outperforms the CME with no localization and the RMSE, but larger evidencing windows, when feasible computationally, may allow all indicators to show similar selection skills.
5 Numerical experiments with an atmospheric model
This section presents the first implementation of the CME approach for an intermediatecomplexity model. The numerical experiments herein confirm the benefits of applying CME in general, and DLCME in particular, as a model selection indicator for larger models.
5.1 Experimental setup
5.1.1 The SPEEDY model
We use an atmospheric general circulation model (AGCM) based on a spectral primitiveequation dynamic core that resolves the largescale dynamics of the atmosphere. The model was developed at the International Center for Theoretical Physics (ICTP) and it is referred to as the ICTP AGCM or the Simplified Parameterizations, primitivEEquation DYnamics (SPEEDY) model (Molteni, 2003; Kucharski, 2006). With its dynamical core and its simplified physical parameterization schemes, SPEEDY simulates successfully some of the complex physics described by stateoftheart AGCMs, while maintaining a low computational cost (Neelin et al., 2010). Hence, its intermediatelevel complexity — between loworder models and highend AGCMs — has made SPEEDY an important test bed for model development studies (e.g., Kucharski, 2003; Haarsma, 2003; Bracco, 2004; Kucharski, 2006) and for evaluating DA methodology (Miyoshi, 2005; Miyoshi and Yamane, 2007).
The model’s spectral dynamical core (Molteni, 2003) uses a hydrostatic, coordinate, spectraltransform formulation in the vorticitydivergence form described by Bourke (1974). The configuration we use here is described by Miyoshi (2005) and it has a spectral resolution of T30L7, i.e., a horizontal spectral truncation at spherical wavenumber 30 and 7 vertical layers. It has a vertical coordinate and it computes four physical variables in every vertical layer: zonal wind , meridional wind , temperature , and relative humidity , as well as one surface variable, in the lowest vertical layer, namely surface pressure .
The simulated values for each vertical field are given on a 96 long. 48 lat. horizontal grid. A summary description of SPEEDY, including its simplified physical parametrizations, is available in the appendix of Molteni (2003). We provide here a detailed description of its convection parametrization, since the purpose of the present section is to perform parameter selection between two values of a convective parameter.
5.1.2 The model’s convection parametrization
SPEEDY’s convection scheme is a simplified version of the massflux scheme developed by Tiedke (1993). The scheme simulates the balance of the updraft of the saturated air and the largescale downward motion in between the planetary boundary layer (PBL) and the topofconvection (TCN) layer. While the updraft of saturated air is triggered above the PBL, the detrainment only takes place in the TCN layer. In addition, the convection scheme simulates an exchange of moisture between the PBL and the layers above it.
SPEEDY’s simplified convection parametrization may roughly be summarized by three main steps: an activation of the convection scheme, its closure and an exchange mechanism between the intermediate layers. In particular, the closure of the scheme requires the convective moisture tendency in the PBL to be equivalent to a relaxation of humidity towards a prescribed threshold with a relaxation time equal to ; in our study, the parameter of interest is precisely . This parameter has a straightforward physical interpretation since it controls the speed of the vertical mixing associated with moist convection, and its interest lies in its impact on the global–scale circulation.
The correct parameter value, namely the value set in SPEEDY to create a ‘true’ trajectory — in the sense defined in sections 2 and 4 — is hr. Meanwhile, the incorrect parametrization here will use the relaxation time hr. The objective is to be able to select the correct parameter over the incorrect one, based on the observation of what is deemed to be the true trajectory in the present, “dizygotoustwin” setting; see, for instance, Ghil and MalanotteRizzoli (1991) for an explanation of the distinctions between identicaltwin, dizygotoustwin and realobservations setup in DA studies. The DA formulation needed to do the parameter selection in this setup is presented in the next subsection.
5.1.3 The DA implementation
As in the L95 experiments of section 2, a true trajectory is generated using the correct parameter value of hr. From this trajectory a set of observations are derived and made available every 6 hr. This DA configuration was adapted for SPEEDY by Miyoshi (2005).
The observed physical quantities are the 5 prognostic variables , , , , throughout the model’s 7 layers, and the surface pressure . All variables are observed every four gridpoint, yielding observations for , , , and observations for the surface pressure. The observation error variances, used to create the observations in the DA process, are:
The experiment lasted 6 months, from 1 January to 30 June 1983. The results for the spinup time of DA cycles (i.e., the 1st month) were discarded, and the diagnostics were calculated for the DA cycles of 1 February through 30 June.
The DA is performed using the LETKF (cf. Sect. 3.1.2) with N=50 ensemble members and with the domain localization introduced by Hunt et al. (2007), Miyoshi and Yamane (2007), and Greybush et al. (2011): localization domains are selected and the observation error covariance submatrix is multiplied, for each , by the inverse of a smoothing function, as explained in section 3.1. In this study, we use the implementation proposed by Miyoshi et al. (2007) and apply as a smoothing function, in the three–dimensional physical space, the Gaussian function
(22) 
here is the distance between the localization domain’s center and the observations, while is the localization radius, and both quantities have a horizontal and a vertical component. The horizontal component of is taken equal to 700 km, while its vertical component is defined in terms of logpressure coordinates and corresponds to approximately 4 km.
Note that, in the following, the DLCME results are computed for specific variables, i.e., we assume that only these variables have been observed. This computation is identical to the computation of the CME for an observation subvector, as discussed in section 3.2.1.
For an evidencing window that has length , the required computation of the pdf of the system’s state, conditioned on this particular observation subvector leads to a DA run that does not converge. Hence, we approximate once again this conditional pdf by the original posterior pdf which consists in directly applying the EnKFformulation to the observation vector reduced to a specific variable using Eq. (14).
5.2 Model selection problem
We now focus, as explained in section 5.1.2, on determining which value of the relaxation parameter (6 or 9 hrs) is more appropriate to describe the atmospheric evolution simulated during the 5 months of the experiment. Recall that the value hrs is the one used to simulate the reference truth from where the observations are sampled. The model selection is here a parameter identification problem and, again, is carried out with the RMSE, the GCME, and the DLCME as indicators.
The indicators are first computed on the evidencing window , i.e., on a single DA cycle then averaged over the 5–month interval. The results in terms of selection probability obtained on the various variables are shown in Figure 7(a).
As mentioned in section 5.1.2, the value of the relaxation time plays an important role in the globalscale circulation: it intensifies, for instance, the mesoscale circulations associated with deep convection which then affects the extent and intensity of the Hadley and Walker cells (see for instance Donner et al., 2001). This physical connection is confirmed by the results of the three model selection indicators when using observations of the variables or alone. Indeed, for either of these two variables, the indicators select the correct parameter with high probability. Though, it is to be noted that the RMSE outperforms the GCME selecting skills on these two variables. However, taking into account localization, i.e., using the DLCME, improves the selection skills of the CME and provides equally high selection probabilities as the RMSE does. Moreover, when only Tdata are used, the DLCME identifies the correct parameter with the highest probability. The humidity variable, , is the variable least impacted by the variation of . Yet, the DLCME is the best to emphasize the improvement of the correct parameter on the humidity representation. Finally, when using all observations, both CMEs fully discriminate between the two models whereas the RMSE only partially does. A more realistic setting is to use a longer evidencing window, taken now as , which corresponds to 3 days. The results of this experiment are shown in Figure 7(b), and they confirm that increasing improves the selection skills of all indicators. The humidity is still the most difficult variable to discriminate the two models from. Yet, the DLCME remains the best selector, strongly selective for all variables, both separately and together. Moreover, when using all observations, the RMSE fails to systematically recognize the correct model from the incorrect one.
From a practical standpoint, the latter result indicates that if, for instance, one’s goal is to determine which of two models is most likely responsible for a threeday weather event that was fully observed, the DLCME is more likely than the RMSE to provide the correct answer.
5.3 Mapping model evidence
Local CMEs, defined by Eq. (14), also provide a spatial representation of model evidence. This is exploited in Figure 8 that shows the spatial differences between the correct and the incorrect parametrization using all observations.
Note that the distinct information conveyed by these two maps — the local RMSE differences in panel (a) and the local CME differences in panel (b) — is not necessarily contradictory but rather complementary since the RMSE is the Euclidean distance between the DA forecast and the observations while the local CME can be seen as a distance between the same two quantities but smoothed by the uncertainties that surround them.
Indeed, the local RMSE reveals a significant difference between the two models all over the globe while the local CME shows that this difference mainly impacts the low latitude regions. Also, some of the regions that are strongly impacted by the incorrect parameter according to the RMSE, are not so according to the local CME. For instance, the differences emphasized by the RMSE in the region off the coast of China are sensibly smaller according to the local CMEs. This would suggest that, although the DA forecast produced by the incorrect model does not fit the observations as well as the one produced by the correct model, the uncertainties generated in this region mitigate the quality of the correct model’s representation of this region.
6 Concluding remarks
6.1 Summary and conclusions
Carrassi et al. (2017) have introduced the contextual model evidence (CME) as a powerful indicator for model selection and for attribution of climate related events. They have also have provided analytic ways to compute this indicator by using a hierarchy of ensemblebased data assimilation (DA) algorithms and have shown its efficiency.
When applied to largedimensional models, ensemblebased DA requires localization techniques in order to mitigate the spurious effect of undersampling. The present study focused on adapting the CME’s EnKFformulation to the use of a specific localization technique, namely domain localization.
In section 3.2.1, we have shown how to compute local CMEs as the CMEs of an observation subvector. This idea led to an EnKFformulation of the local CME, given by Eq. (14), subject to very few simplifying assumptions. Combining local CMEs for each grid point, we introduced in Eq. (15) a heuristic model selection indicator called the domainlocalized CME (DLCME). This DLCME formulation allows one to estimate model evidence in a consistent and routine way by using localized DA with largedimensional problems.
We implemented the DLCME formulation — along with the global CME’s EnKFformulation called global CME (GCME) and the root mean square error (RMSE), defined by Eq. (19) — on two atmospheric models: the Lorenz 40variable midlatitude atmospheric dynamics model (L95: Lorenz and Emanuel, 1998) and the simplified global atmospheric SPEEDY model (Molteni, 2003), in sections 4 and 5, respectively. The main objective of the numerical experiments in these two sections was to assess the performance of the three approaches so defined — the RMSE, the GCME and the DLCME — in increasingly harder model selection problems.
In section 4, the numerical experiments based on the L95 model were used to compare the three model selection indicators for varying model forcings , as well as with respect to the localization radius and to the length of the evidencing window. In general, both CME indicators, GCME and DLCME, are more accurate than the RMSE; see Figures 1–6. Moreover, the DLCME outperforms systematically the GCME regardless of the length of the localization radius, cf. Figure 5.
In the 40ensemble member localizationfree experiment, the GCME increasingly outperforms the RMSE when the difference between the model forcings is increased; see Figures 12. When the number of ensemble members is reduced to 10, hence making localization necessary, the GCME is still more accurate than the RMSE. With the same ensemble size, the DLCME is even more accurate than the GCME which indicates the impact of consistently using localization in both the DA state estimation procedure and in the CME computation. Remarkably, DLCME, with a 10member ensemble and localization, produces better selection probabilities than GCME with a 40member ensemble and no localization, at least when the discrepancy between the correct and incorrect forcing is large enough, cf. Figures 34.
The sensitivity of the model selection skills to the localization radius is approximately the same for all indicators, meaning that the improvement of the DLCME over the two other indicators is almost independent of the localization radius, cf. Figure 5 . This independence is highly desirable, given the computational cost of tuning the radius. In particular, the DL CME is clearly the most accurate indicator for small evidencing windows. For larger windows, the performances of the DLCME and the GCME become almost identical and do remain higher than the RMSE’s performance, see Figure 6.
In section 5, model selection was carried out for a more realistic problem. We formulated two parametrizations of the convection in the SPEEDY model by varying the convective relaxation time . The selecting skills of the RMSE, the GCME and the DLCME were assessed on all physical model variables, both separately and together.
Over a single DAcycle, the DLCME and the GCME are fully selective on all variables combined. The DLCME, though, is the best selection indicator on all four model variables — , , and — separately, whether for a single DAcycle or a large evidencing window, , as shown in Figure 7.
Local CMEs provide, in addition, a spatial view of model evidence, cf. Figure 8. This spatial view can provide precious information on the impacts of the correct and incorrect parametrizations on the physical evolution of the largescale atmospheric flow.
In summary, while the DLCME displays the best performance in terms of model selection, the spatial plots of local CME can help identifying the spatial regions and the physical variables where the incorrect model misfits the observations. Beyond model selection problems, this feature can be used to bring precious information in order to help understanding the causal chain leading to meteorological or climatological events as well as designing target observations/areas for that purpose.
6.2 Future directions
A first line of future investigations should focus on strengthening the theoretical foundation of the CME. Indeed, several theoretical issues are yet to be addressed. For instance, simulating model error and taking it into account in both the DA process and in the CME computation could improve significantly the CME’s model selection skill. The DA community is still conducting active research on the best ways to incorporate model error in DA methodology (e.g., Dee, 1995; Raanes et al., 2015; Carrassi and Vannitsem, 2016; Harlim, 2017). This research has the potential to improve significantly the accuracy of the DA product improving therewith the CME estimation. Moreover, taking into account model error in the CME computation itself could also have an impact in the CME’s model selection abilities.
Another theoretical issue requiring investigation is the derivation of an exact global indicator based on local CMEs that could outperform the global heuristic indicator proposed in this article, the DLCME. These theoretical challenges are part of the authors’ current research program.
A second line of investigations should focus on broadening the applications of the CME approach by increasing the realism and the complexity of the models considered, on the one hand, and by widening the use of the CME from model selection to other purposes like event attribution for example, on the other hand. In particular, it seems worth investigating the potential of the spatial representation of local CMEs for diagnosing causality in a geoscientific system, as proposed by Hannart et al. (2016b).
Acknowledgments
We acknowledge discussions with the other members of the DADA team (Philippe Naveau, Aurélien Ribes and Manuel Pulido). We also wish to acknowledge the personal communications with Pr. P.J. Van Leeuwen which led to significant clarifications of the manuscript.
This research was supported by grant DADA from the Agence Nationale de la Recherche (ANR, France: All authors), and by Grant N000141612073 of the Multidisciplinary University Research Initiative (MURI) of the US Office of Naval Research (MG). A. Carrassi has been funded by the Nordic Centre of Excellence EmblA of the Nordic Countries Research Council, NordForsk, and by the project REDDA of the Norwegian Research Council (contract number 250711). CEREA is a member of the Institute Pierre Simon Laplace (IPSL).
Appendix A
Alternative implementation of the global CME (GCME)
The GCME implementation was first presented in Carrassi et al. (2017) for the CME formulation based on the En4DVar DA method. This implementation makes the assumption that the observation error covariance matrix R is diagonal. This assumption is strong but often verified in geosciences applications.
Implementing Eq. (8) requires to derive: (i) the determinant term ; (ii) the weighted sum of squares . We deal with each two terms separately.
First, the computation of the determinant term can be greatly simplified by application of Sylvester’s determinant rule (i.e., for any two rectangular matrices A and B of conformable sizes). We have
(23) 
where . with , the normalized forecast anomalies such that . Note that R is diagonal in general thus its determinant is easy to compute.
Under Eq. (2), one sees that boils down (after the one off computation of ) to the computation of which is a much smaller determinant of dimension (size of the ensemble) instead of dimension (size of the observation vector).
Second, the computation of the weighted sum of squares can also be greatly simplified by application of the ShermanMorrisonWoodbury formula. We have
(24) 
It follows
(25) 
with .
Appendix B
Derivation of the CME formulation using covariance localization
Covariance localization (also called Schur localization) consists in restraining the spatial impact of the empirical forecast error covariances. Indeed, the ensemble size is smaller than the model’s state dimension and thus the ensemblebased covariance is subsampled by construction. This subsampling may lead to spurious correlations. This issue can be partly overcome by removing the unrealistic correlations. Covariance localization performs this removal by applying to the forecast error covariance matrix a Schur product (i.e., a pointwise multiplication) with a smoothing correlation matrix
(26) 
where the correlation matrix coefficients are defined by , for a smoothing function.
The smoothing function we use is a shortranged predefined correlation function , based on the GaspariCohn function (Gaspari and Cohn, 1999), for the localization radius ,
(27) 
Based on the covariance localization, a new CME formulation can be derived. We call this formulation: the Global Localized CME (GLCME). The GLCME respects the original formulation Eq. (8), i.e., is computed for the entire observation vector, while taking into account the regularized forecast error covariance matrix in the term
(28) 
In this case, the CME computation remains quite straightforward. However, concerning the computational cost, the regularization does not change the size of . The computational cost is as heavy as the original EnKFformulation which adds to the cost of the application of a Schur product, making this formulation not directly tractable in high dimensions. To do so, additional sophisticated techniques are required.
References
 Akaike (1974) Akaike H (1974). A new look at the statistical model identification. Autom. Contr., IEEE Transac. on 19: 716723
 Asch et al. (2016) Asch, M., M. Bocquet and M. Nodet (2016). Data assimilation: methods, algorithms, and applications. Fundamentals of Algorithms. SIAM, Philadelphia.
 Balgovind et al. (1983) Balgovind, R., A. Dalcher, M. Ghil, and E. Kalnay (1983) A stochasticdynamic model for the spatial structure of forecast error statistics. Mon. Wea. Rev., 111, 701–722.

Baum et al. (1970)
Baum LE., T. Petrie, G. Soules and N. Weiss (1970).
A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains.
The Annals of Mathematical Statistics, 41:164171.  Bennett (1992) Bennett AF. (1992). Inverse methods in physical oceanography. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA.
 Bishop et al. (2001) Bishop CH., BJ. Etherton and SJ. Majumdar (2001). Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129: 420436.
 Bocquet (2016) Bocquet M. (2016). Localization and the iterative ensemble Kalman smoother. Q. J. R. Meteorol. Soc., 142: 10751089.
 Bocquet and Carrassi (2017) Bocquet M. and A. Carrassi (2017). Fourdimensional ensemble variational data assimilation and the unstable subspace. Tellus A, 69: 1304504.
 Bourke (1974) Bourke, W. (1974). A multilevel spectral model. I. Formulation and hemispheric integrations. Mon. Wea. Rev., 102: 687701.
 Bracco (2004) Bracco A., F. Kucharski, R. Kallumal and F. Molteni (2004). Internal variability, external forcing and climate trends in multidecadal AGCM ensembles. Clim. Dyn., 23: 659678
 Buehner et al. (2010) Buehner M., PL. Houtekamer, C. Charette and HB. Mitchell (2010). Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part I: Description and singleobservation experiments. Mon. Wea. Rev., 138: 15501566.
 Burnham and Anderson (2002) Burnham KP. and DR. Anderson (2002). Model selection and inference: A practical informationtheoretic approach. Springer: New York, 2nd edition.
 Carrassi and Vannitsem (2016) Carrassi A. and S. Vannitsem (2016). Deterministic treatment of model error in geophysical data assimilation. In Mathematical Paradigms of Climate Science, Springer: 175213.
 Carrassi et al. (2017) Carrassi A., M. Bocquet, A. Hannart and M. Ghil (2017). Estimating model evidence using data assimilation. Q. J. R. Meteorol. Soc., 143: 866–880.
 Carson et al. (2017) Carson J, M. Crucifix, S. Preston and RD. Wilkinson (2017). Bayesian model selection for the glacialinterglacial cycle. Journ. of the Roy. Stat. Soc.: Series C (Applied Statistics).
 Cressman (1959) Cressman, G.P. (1959) An operational objective analysis system. Mon. Wea. Rev., 87: 367374.
 Daley (1991) Daley R. (1991). Atmospheric data analysis. Cambridge University Press, Cambridge, United Kingdom.
 Dee (1995) Dee DP. (1995). Online estimation of error covariance parameters for atmospheric data assimilation. Mon. Wea. Rev., 123: 11281145.
 Donner et al. (2001) Donner, L. J., Seman, C. J., Hemler, R. S. and Fan, S. (2001). A cumulus parameterization including mass fluxes, convective vertical velocities, and mesoscale effects: Thermodynamic and hydrological aspects in a general circulation model. Journ. of clim., 14: 34443463.

Elsheikh et al. (2014a)
Elsheikh A, I. Hoteit and M. Wheeler (2014a).
Efficient bayesian inference of subsurface flow models using nested sampling and sparse polynomial chaos surrogates.
Comput. Methods Appl. Mech. Engrg., 269: 515537.  Elsheikh et al. (2014b) Elsheikh A, M. Wheeler and I. Hoteit (2014b). Hybrid nested sampling algorithm for bayesian model selection applied to inverse subsurface flow problems. J. Comput. Phys., 258: 319337.
 Evensen (2009) Evensen G. (2009). Data Assimilation: The Ensemble Kalman Filter. Springer Verlag/Berlin/Heildelberg, second edn.
 Gaspari and Cohn (1999) Gaspari G. and SE. Cohn (1999). Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125: 723757.
 Ghil et al. (1979) Ghil, M., M. Halem and R. Atlas (1979) Timecontinuous assimilation of remotesounding data and its effect on weather forecasting. Mon. Wea. Rev., 107: 140–171.
 Ghil (1997) Ghil M. (1997). Advances in sequential estimation for atmospheric and oceanic flows. J. Meteor. Soc. Japan, 75: 289
 Ghil and MalanotteRizzoli (1991) Ghil M. and P. MalanotteRizzoli (1991). Data assimilation in meteorology and oceanography. Adv. Geophys., 33: 141266.
 Gini (1921) Gini C. (1921). Measurement of inequality of incomes. Econ. J., 31: 124126.
 Greybush et al. (2011) Greybush SJ., E. Kalnay, T. Miyoshi, K. Ide, and BR. Hunt (2011). Balance and ensemble Kalman filter localization techniques. Mon. Wea. Rev., 139: 511522.
 Haarsma (2003) Haarsma RJ., EJD. Campos and F. Molteni (2003). Atmospheric response to South Atlantic SST dipole. Geophys. Res. Lett., 30
 Hamill et al. (2001) Hamill, TM., JS. Whitaker and C. Snyder (2001). Distancedependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129: 27762790.
 Harlim (2017) Harlim J. (2017). Model error in data assimilation. In Nonlinear and Stochastic Climate Dynamics, Franzke CLE. and TJ. O’Kane (eds.) Cambridge University Press: Cambridge, UK.
 Hannart et al. (2016a) Hannart A., J. Pearl, F.E.L. Otto, P. Naveau and M. Ghil (2016a). Causal counterfactual theory for the attribution of weather and climaterelated events. Bull. Am. Meteorol. Soc., 97: 99110.
 Hannart et al. (2016b) Hannart A., A. Carrassi, M. Bocquet, M. Ghil, P. Naveau, M. Pulido, J. Ruiz and P. Tandeo (2016b). DADA: data assimilation for the detection and attribution of weather and climaterelated events. Climatic Change, 136: 155174.
 Houtekamer and Mitchell (2001) Houtekamer PL. and HL. Mitchell (2001). A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Wea. Rev., 129: 123137.
 Hunt et al. (2007) Hunt B., EJ. Kostelicj and I. Szunyogh (2007). Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230: 112126.
 Ide et al. (1997) Ide, K., P. Courtier, M. Ghil, and A. Lorenc (1997). Unified notation for data assimilation: Operational, sequential and variational. J. Meteor. Soc. Japan, 75: 181189.
 Kalnay (2003) Kalnay, E. (2003). Atmospheric modeling, data assimilation and predictability. Cambridge university press.
 Kucharski (2003) Kucharski F. and F. Molteni (2003). On nonlinearities in a forced North Atlantic Oscillation. Clim. Dyn., 21: 677687
 Kucharski (2006) Kucharski F., F. Molteni and A. Bracco (2006). Decadal interactions between the western tropical Pacific and the North Atlantic Oscillation. Clim. Dyn., 26: 7991
 Lermusiaux (2006) Lermusiaux PFJ. (2006). Uncertainty estimation and prediction for interdisciplinary ocean dynamics. J. Comp. Phys., 217: 176199.
 Lorenz (1996) Lorenz, E. N. (1996). Predictability: a problem partly solved. Seminar on Predictability, 4–8 September 1995, European Centre for Mediumrange Weather Forecasting, Shinfield Park, UK, pp. 1–18.
 Lorenz and Emanuel (1998) Lorenz, EN. and KA. Emanuel (1998). Optimal sites for supplementary weather observations: simulation with a small model. J. Atmos. Sci., 55: 399414.
 Metz (1978) Metz CE. (1978). Basic principles of ROC analysis. Semin. Nucl. Med., 8: 283298.
 Molteni (2003) Molteni F. (2003). Atmospheric simulations using a GCM with simplified physical parameterizations. I: model climatology and variability in multidecadal experiments. Clim. Dyn., 20: 175191.
 Miyoshi (2005) Miyoshi T. (2005). Ensemble Kalman filter experiments with a primitiveequation global model. Ph.D. dissertation, University of Maryland, 197 pp.
 Miyoshi and Yamane (2007) Miyoshi T. and S. Yamane (2007). Local ensemble transform Kalman filtering with an AGCM at a T159/L48 resolution. Mon. Wea. Rev., 135: 38413861.
 Miyoshi et al. (2007) Miyoshi T., S. Yamane and T. Enomoto (2007). Localizing the error covariance by physical distances within a Local Ensemble Transform Kalman Filter (LETKF). SOLA, 3: 8992
 Neelin et al. (2010) Neelin, J. David and Bracco, Annalisa and Luo, Hao and McWilliams, James C. and Meyerson, Joyce E. Considerations for parameter optimization and sensitivity in climate models. Proc. Natl. Acad. Sci. USA, 107:2134921354.
 Otsuka and Miyoshi (2015) Otsuka S. and T. Miyoshi (2015). A Bayesian optimization approach to multimodel ensemble Kalman filter with a loworder model. Mon. Wea. Rev., 143: 20012012.
 Ott et al. (2002) Ott, E., BR. Hunt, I. Szunyogh, M. Corazza, E. Kalnay, DJ. Patil, A. Yorke, AV. Zimin and EJ. Kostelich (2002). Exploiting local low dimensionality of the atmospheric dynamics for efficient ensemble Kalman filtering.
 Ott et al. (2004) Ott, E., BR. Hunt, I. Szunyogh, AV. Zimin, EJ. Kostelich, M. Corazza, E. Kalnay, DJ. Patil and A. Yorke (2004). A local ensemble Kalman filter for atmospheric data assimilation. Tellus A, 56: 415428.
 Raanes et al. (2015) Raanes PN, A. Carrassi and L. Bertino (2015). Extending the square root method to account for additive forecast noise in ensemble methods. Mon. Wea. Rev., 143: 38573873.
 Reich and Cotter (2015) Reich S and C. Cotter (2015). Probabilistic Forecasting and Bayesian Data Assimilation. Cambridge University Press, Cambridge, United Kingdom.