The ensemble Kalman filter Evensen (1994); Burgers et al. (1998); Evensen (2009) is one of the most widely used data assimilation algorithms Asch et al. (2016); Law et al. (2015); Reich and Cotter (2015) that uses a Monte Carlo approach to provide a non-linear approximation to the Kalman filter Kalman (1960). In the typical case of an undersampled ensemble the algorithm requires correction procedures such as inflation Anderson (2001), localization Hunt et al. (2007); Petrie (2008); Anderson (2012); Nino-Ruiz et al. (2015); Nino-Ruiz and Sandu (2017); Zhang et al. (2010), and ensemble subspace enrichment Nino-Ruiz and Sandu (2015, 2018); Ruiz et al. (2014).
proposed augmenting the ensemble with synthetic members drawn from a normal distribution defined by a full rank shrinkage covariance matrix approximation. In this work we consider augmenting the ensemble with members drawn from a normal distribution with a possibly low rank covariance matrix derived froma priori
information such a climatological information or method of snapshots. We show that this is equivalent to a stochastic implementation of the shrinkage covariance matrix estimate proposed inNino-Ruiz and Sandu (2015, 2018), and therefore augmenting the physical ensemble with synthetic members enriches the rank of the covariance matrix, and nudges the resulting covariance estimate to the true covariance.
Our aim is to understand the behavior of an evolving natural phenomenon. The dynamics of the natural phenomenon is approximated with an imperfect model, described by the dynamical system
is a random variable (RV) whose distribution represents our uncertainty in the state of the system at time, is the (imperfect) dynamical model, is a RV whose distribution represents our uncertainty in the additive modeling error, and is the RV whose distribution represents our uncertainty in the (forecasted) state at time .
One collects noisy observations of the truth:
where represents the true state of nature represented in model space, is the (potentially non-linear) observation operator, is a RV whose distribution represents our uncertainty in the observations, and are the observation values, assumed to be realizations of an observation RV .
The goal of data assimilation is to find the a posteriori
estimate of the state given the observations, which is typically achieved through Bayes’ theorem. At timewe have:
In typical Kalman filtering the assumption of Gaussianity is made, whereby the states at all times, as well as the additive model and observation errors, are assumed to be Gaussian and independently distributed. Specifically one assumes and .
In what follows we use the following notation. The a priori estimates at all times are represented with the superscript , for forecast (as from the imperfect model), and the a posteriori estimates are represented with the superscript , for analysis (through a DA algorithm).
2.1 Ensemble Transform Kalman Filter
Forecasting with an ensemble of coarse models has proven to be a more robust methodology than forecasting with a single fine model Kalnay (2003)
. Ensemble Kalman filtering aims to utilize the ensemble of forecasted states to construct empirical moments and use them to implement the Kalman filter formula. The Ensemble Transform Kalman Filter (ETKF) computes an optimal transform of the prior ensemble member states to the posterior member states; for Gaussian distributions the optimal transform is described by a symmetric transform matrix.
We now describe the standard ETKF. Let represent the –members analysis ensemble at time . The forecast step is:
where is a random draw from .
The ETKF analysis step reads:
Here the unique symmetric square root of the matrix is used, as there is evidence of that option being the most numerically stable Sakov and Oke (2008). Also, it is common practice to approximate by
although in reality is implicitly defined from the analysis ensemble as follows:
Note that equation (2.8) is automatically satisfied by (2.7) when is a linear operator. Additionally, this implicitly defines a fixed point iteration which should converge when the linear operator is sufficiently smooth. In this paper we make the assumption that equation (2.7) is exact.
The empirical forecast covariance estimate
is not perfect. In order to improve the performance of EnKF, inflation is applied to the ensemble anomalies,
before any other computation is performed (meaning that it is also applied to the observation anomalies, as well). The inflation parameter is typically chosen in a heuristic manner.
2.2 Covariance Shrinkage
Assume that there exists some target covariance matrix that represents a priori knowledge about the error spatial correlations. We then assume that there is some scalar, , such that represents the scaling factor that represents the confidence of our knowledge of the state. A covariance shrinkage approximation is a linear combination of the empirical covariance (2.9) and the target covariance ,
which is optimal for some value of . Note that
meaning that it is possible to apply common choices for
from the literature, computed for target matrices equal to multiples of identity, as long as one can perform inexpensively matrix-vector products with. One popular choice for is the Rao-Blackwell Ledoit-Wolf (RBLW) estimator Chen et al. (2009) (equation (9) in Nino-Ruiz and Sandu (2017)),
If is a climatological covariance matrix of the state errors of a dynamical system, e.g., is the static covariance matrix of an ergodic dynamical system, and is the empirical covariance matrix from equation (2.9), then, in the same spirit as the RBLW estimator, an approximation of is,
Which scales the static covariance to be more representative of the scale of our current uncertainty.
If ’s eigendecomposition is known, , in any inner product space, then . Notice also that if represents the scaled anomalies of , then its trace can be computed as,
with being the
th singular value of, and similarly for the trace of the covariance squared,
3 ETKF implementation with stochastic shrinkage covariance estimates
Assume that our dynamical system is locally (in time) stationary. This means that, roughly climatologies about the local time roughly describe a measure of averaged-in-space uncertainty. Ensemble methods propagate our uncertainty about the dynamics of the system. Any valid application of Bayes’ rule requires that all available information is used Jaynes (2003). Therefore, it is our prerogative to attempt and use the locally known climatological information in conjunction with our ensemble information. If we assume our dynamical system is ergodic, then this means that the mean covariance in space is equal to the mean covariance in time, meaning that climatological information becomes global in time. The method of snapshots Sirovich (1987c) requires ergodicity, thus we make this assumption for the rest of this paper, however, this assumption need not be made, and the methods remain just as valid for local-in-time climatological information.
Nino-Ruiz and Sandu Nino-Ruiz and Sandu (2015, 2018) proposed to replace the empirical covariance in EnKF with a shrinkage covariance estimator (2.11). They showed that this considerably improves the analysis at a modest additional computational cost. Additional, synthetic ensemble members drawn from a normal distribution with covariance are used to decrease the sampling errors.
In this work we develop an implementation of ETKF with a stochastic shrinkage covariance estimator (2.11). Rather than computing the covariance estimate (2.11), we build a synthetic ensemble by sampling directly from the normal distribution with covariance a multiple of . The anomalies of this synthetic ensemble are independent of the anomalies of the forecast EnKF ensemble.
To be specific, let be the synthetic ensemble of members drawn from the normal distribution , with being the dimension of the state space of our dynamical system and climatological information. We denote the variables related to the synthetic ensemble by calligraphic letters. The corresponding anomalies in the state and observation spaces:
where is the dimension of our observation space.
The shrinkage estimator (2.11) of the forecast error covariance for is represented in terms of synthetic and forercast anomalies as:
The Kalman filter formulation to the optimal analysis covariance Kalman (1960), yields the following analysis covariance matrix:
where can be defined in two different ways, as discussed later.
which we refer to as the “target” analysis covariance formula.
We discuss below two ways of finding the analysis ensembles. In the first method, we enrich our ensemble in such a way as to closely approximate the shrinkage covariance (2.11). In the second approach we evolve the physical ensemble and the surrogate ensemble as separate entities that are only related by the common information in their transformations.
3.1 Type I approach
In the first approach we enrich the ensembles of forecast anomalies with synthetic anomalies (3.1):
Next, we define a transform matrix (2.5a) that now applies to the enriched ensemble (3.4) to form an analysis ensemble that represents the target analysis covariance (goal). Specifically, we search for a transform matrix such that:
where, from (2.6b),
In the Type I method we compute the analysis mean using the shrinkage covariance estimate. From (2.5b):
in which, it is important to point out, that the full analysis covariance estimate (equation (3.6)) is used. In addition, we achieve the (goal) by keeping the first members of the transformed extended ensemble, or equivalently, the first columns of the symmetric square root (3.8). From (2.5a), we have:
The calculation of the symmetric square root (3.8) requires an SVD of the right hand side matrix. With the same computational effort one can compute the low rank transformation:
The mean calculation (3.9) is the same. The ensemble transform produces transformed ensemble members that contain “mixed” information from both the physical and the synthetic ensembles:
3.2 Type II approach
Ideally there should be no correlation between the forecast ensemble that we propagate with the physics-based model and the synthetic forecast ensemble that we generate. If we impose a block-diagonal structure of the transformation matrix
then, instead of equation (3.5), we search for separate transforms and for each ensemble such that:
Note that there is no unique solution to this equation.
Under the assumption that the synthetic ensemble is independent of the physical ensemble, but is transformed through a standard ETKF formulation, it is the case that,
where the formulation (2.6b) of uses all available information:
This (almost uniquely) defines the transformation for the physical ensemble as,
with representing a pseudo-inverse (in this case we use the Moore-Penrose pseudo-inverse, keeping exactly singular values of ). In this case (3.17) comes from directly looking at all the terms of (target) expanded with (3.16), and by removing the terms found in (3.15).
This completes the Type II approach:
with the physical ensemble used to continue the computation defined as,
4 Numerical experiments
4.1 The Lorenz’96 model (L96)
We first look at the 40-Variable Lorenz ’96 problem Lorenz (1996),
Assuming (4.1) is ergodic (thus having a constant spatio-temporal measure of uncertainty on the manifold of the attractor), we compute the target covariance matrix as the empirical covariance from independent ensemble members run over days in the system (where time units corresponds to 6 hours), with an interval of 6 hours between snapshots, This system is known to have 13 positive Lyapunov exponents, with a Kaplan-Yorke dimension of about 27.1 Popov and Sandu (2019).
The time between consecutive assimilation steps is units, corresponding to six hours in the system. All variables are observed directly with an observation error covariance matrix of . The time integration of the model is performed with RK4 the fourth order Runge-Kutta scheme RK4 Hairer et al. (1991) with a step size . The problem is run over 1100 assimilation steps. The first 100 are discarded to account for model spinup. The synthetic ensemble size is fixed to .
4.2 L96 assimilation results
First we assess the quality of the analysis ensembles using a rank histogram Hamill (2001). For a quantitative metric we compute the KL divergence from to ,
is the uniform distribution andis our ensemble rank histogram.
Additionally, for testing the accuracy of all our methods we compute the spatio-temporal RMSE,
with representing the amount of snapshots at which the analysis is computed, and is the th component of the state variable at time .
Figure 1 shows the KL divergence results for variable . As can be seen, the addition of the synthetic ensemble introduces error into the calculations, which seems to converge to the the EnKF error in the limit of . However, it dramatically increases the uniformity of the ensemble, as for even small values of the KL divergence of the ensemble rank histogram to the uniform distribution is virtually zero.
We compare our two methods against the canonical ETKF for various values of inflation factors and physical ensemble sizes. Ten separate experiments are performed for each run, and the one with the lowest RMSE is plotted. As can be seen from Figure 2, ETKF is only stable when we fully cover the exponentially growing error action described by the Lyapunov modes with positive (and zero) exponents, meaning that ensemble memebers is enough to cover the 14 non-decreasing modes. On the contrary, in our stochastic surrogate ETKF formulation, both Type I and Type II perform similarly well, especially in the low undersampled () ensemble size cases. In the high ensemble size cases, the stochasticity of the methods prevents them from being as good as pure ETKF (as shown in figure 1). The Type I method shows slightly higher stability, with the same accuracy, as it is likely that the ensemble anomally inversions introduce numerical error, though the difference is verging on negligible.
Our results show that our methodology heavily reduces the need for localization, as as little as 5 ensemble members are needed. The Type I scheme, when we have an Ergodic system, and an extremely well-computed estimate for the ergodic covariance matrix, is clearly better than the Type II method.
Comparing Type I and Type II with each other, we can see that—at least in this case—the Type I method performs slightly better, though the difference is negligible. This might be due to the case that the observations are sufficiently close to each other, and related in a non-linear manner.
4.3 The Quasi-Geostrophic model (QG)
where stands for the vorticity, stands for the stream function, is the Reynolds number, is the Rossby number, is the Jacobian term, and is a constant (in time) forcing term.
The relationship between stream and vorticity, is explicitly enforced in the evaluation of the ODE. The forcing term is a symmetric double gyre,
Homogeneous Dirichlet and Neumann boundary conditions is enforced on the spatial domain . The spatial discretization is a second order central finite difference for the first derivatives, and the Laplacian, with the Arakawa approximation Arakawa (1966) (a pseudo finite element scheme Jespersen (1974)) used for computing the Jacobian term. All spatial discretizations exclude the trivial boundary points from explicit computation.
The matrix is approximated through the method of snapshots Sirovich (1987a, b, c) in space over (meaning that the inner products are calculated with spatial information). We utilize 700 snapshots about 283 hours apart each, and a rank 100 approximation to the full matrix is built. The true model is run outside of time of the snapshots so as to not pollute the results. Nature utilizes a spatial discretization, and the model a spatial discretization. Observations are first relaxed into the model space via multigridding Zubair (2009), then 150 distinct points from the non-linear observation operator,
representing zonal wind magnitude, are taken. The observation error is unbiased, with covariance . The number of synthetic ensemble members is fixed at a constant in order to be less than the order of the ergodic covariance matrix approximation. Observations are taken time units (representing one day in model space) apart. We run a total of 350 assimilation steps, taking the first 50 as spinup. Results are averaged over 3 model runs, with diverging runs treated as de-facto infinite error.
4.4 QG assimilation results
For the inflation factor values and the ensemble sizes chosen, ETKF only converged for an ensemble size of 25. Figure 3
shows clearly that, the type I scheme wins out in terms of RMSE compared to the type II method, which is probably due to the fact that the approximate anomaly inverse is introducing significant numerical error. Another source of error is due to both the nonlinear observations and the coarse approximation to the covariance estimate.
In terms of stability, again, the type I method has a slight edge over the Type II method. As can be seen, for a severely undersampled ensemble, low inflation causes the Type II method to diverge from the truth. All this is likely again due to the approximate anomaly inversion.
In terms of cost, it is of course true that the matrix operations in the Type II method are simpler and require significantly less computation, though at the cost of accuracy and stability.
As the Lorenz ’96 model is a tiny synthetic model that only propagates discretization error, the quasi-geostrophic equations show that our methodoligy can be readily translated to real-world PDE-based models, especially those for which the observations are non-linear transformations of the state representation.
In Nino-Ruiz and Sandu (2015)
it was shown that shrinkage covariance matrix estimators can greatly improve the performance of the EnKF. This work develops ensemble transform filters that utilize a stochastic implementation of shrinkage covariance matrix estimates. We derive two types of ensemble transforms. Both filters perform similarly well in the under-sampled regime, where the ETKF does not converge, though perform worse that ETKF in the over-sampled regime, and perform asymptotically as well in the limit of synthetic ensemble size, as can be seen from the Lorenz ’96 results. We hypothesize that this is due to the fact that the synthetic ensemble introduces additional variance into the analysis, similarly to a perturbed observations scheme.
Numerical experiments show that the Type I scheme performs better than the Type I scheme in terms of accuracy and stability, though the Type II method performs better in terms of operations required, though relies on inexact calculations of a pseudo-inverse, therefore is not as attractive.
Both our methods allow our ensemble to have near perfect rank histograms, from the fact of the added Gaussian noise, while sacrificing little accuracy. For large-scale problems this would increase the ensemble interpretability, as the ensemble members would be more representative around the truth.
Acknowledgements.The first two authors would like to thank Traian Iliescu and Changhong Mou for their in-depth help with understanding of the Quasi-Geostrophic model, Steven Roberts, the primary maintainer of the ODE Test Problems package, and the rest of the members of the Computational Science Laboratory at Virginia Tech for their continuous support. The first two authors were supported, in part, by the National Science Foundation through awards NSF ACI–1709727 and NSF CCF–1613905, AFOSR through the award AFOSR DDDAS 15RT1037.
- An ensemble adjustment kalman filter for data assimilation. Monthly weather review 129 (12), pp. 2884–2903. Cited by: §1.
- Localization and sampling error correction in ensemble kalman filter data assimilation. Monthly Weather Review 140 (7), pp. 2359–2371. Cited by: §1.
- Computational design for long-term numerical integration of the equations of fluid motion: Two-dimensional incompressible flow. Part I. Journal of Computational Physics 1 (1), pp. 119–143. Cited by: §4.3.
- Data assimilation: methods, algorithms, and applications. SIAM. Cited by: §1.
- Analysis scheme in the ensemble kalman filter. Monthly weather review 126 (6), pp. 1719–1724. Cited by: §1.
- Shrinkage estimation of high dimensional covariance matrices. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2937–2940. Cited by: §2.2.
- ODE test problems. External Links: Cited by: §4.
- Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. Journal of Geophysical Research: Oceans 99 (C5), pp. 10143–10162. Cited by: §1.
- Data assimilation: the ensemble Kalman filter. Springer Science & Business Media. Cited by: §1.
Solving ordinary differential equations. 1, nonstiff problems. Springer-Vlg. Cited by: §4.1.
- Interpretation of rank histograms for verifying ensemble forecasts. Monthly Weather Review 129 (3), pp. 550–560. Cited by: §4.2.
- Efficient data assimilation for spatiotemporal chaos: a local ensemble transform kalman filter. Physica D: Nonlinear Phenomena 230 (1-2), pp. 112–126. Cited by: §1.
- Probability theory: the logic of science. Cambridge university press. Cited by: §3.
- Arakawa’s method is a finite-element method. Journal of computational physics 16 (4), pp. 383–390. Cited by: §4.3.
- A new approach to linear filtering and prediction problems. Journal of basic Engineering 82 (1), pp. 35–45. Cited by: §1, §3.
- Atmospheric modeling, data assimilation and predictability. Cambridge university press. Cited by: §2.1.
- Data assimilation: a mathematical introduction. Vol. 62, Springer. Cited by: §1.
- Predictability: a problem partly solved. In Proc. Seminar on predictability, Vol. 1. Cited by: §4.1.
- Data-driven correction reduced order models for the quasi-geostrophic equations: a numerical investigation. arXiv preprint, http://arxiv.org/abs/1908.05297. Cited by: §4.3.
- Ensemble Kalman filter implementations based on shrinkage covariance matrix estimation. Ocean Dynamics 65 (11), pp. 1423–1439. External Links: Cited by: §1, §1, §3, §5.
- An ensemble Kalman filter implementation based on modified Cholesky decomposition for inverse covariance matrix estimation. SIAM Journal on Scientific Computing submitted. External Links: Cited by: §1, §2.2.
- Efficient parallel implementation of DDDAS inference using an ensemble Kalman filter with shrinkage covariance matrix estimation. Cluster Computing, pp. 1–11. External Links: Cited by: §1, §1, §3.
- A parallel ensemble Kalman filter implementation based on modified Cholesky decomposition. In Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA ’15, Vol. Supercomputing 2015, Austin, Texas. External Links: Cited by: §1.
- Localization in the ensemble kalman filter. MSc Atmosphere, Ocean and Climate University of Reading. Cited by: §1.
- A bayesian approach to multivariate adaptive localization in ensemble-based data assimilation with time-dependent extensions. Nonlinear Processes in Geophysics 26 (2), pp. 109–122. Cited by: §4.1.
- Probabilistic forecasting and Bayesian data assimilation. Cambridge University Press. Cited by: §1.
- ODE test problems: a MATLAB suite of initial value problems. CoRR abs/1901.04098. External Links: Cited by: §4.
- An efficient implementation of the ensemble Kalman filter based on an iterative Sherman–Morrison formula. Statistics and Computing, pp. 1–17 (English). External Links: Cited by: §1.
- Implications of the form of the ensemble transformation in the ensemble square root filters. Monthly Weather Review 136 (3), pp. 1042–1053. Cited by: §2.1.
- A stabilized proper orthogonal decomposition reduced-order model for large scale quasigeostrophic ocean circulation. Advances in Computational Mathematics 41 (5), pp. 1289–1319. Cited by: §4.3.
- Turbulence and the dynamics of coherent structures. i. coherent structures. Quarterly of applied mathematics 45 (3), pp. 561–571. Cited by: §4.3.
- Turbulence and the dynamics of coherent structures. ii. symmetries and transformations. Quarterly of Applied mathematics 45 (3), pp. 573–582. Cited by: §4.3.
- Turbulence and the dynamics of coherent structures. iii. dynamics and scaling. Quarterly of Applied mathematics 45 (3), pp. 583–590. Cited by: §3, §4.3.
- Ensemble filter methods with perturbed observations applied to nonlinear problems. Computational Geosciences 14 (2), pp. 249–261. Cited by: §1.
- Efficient multigrid methods based on improved coarse grid correction techniques.. Ph.D. Thesis, Delft University of Technology, Netherlands. Cited by: §4.3.