# Projected Data Assimilation

We introduce a framework for Data Assimilation (DA) in which the data is split into multiple sets corresponding to low-rank projections of the state space. Algorithms are developed that assimilate some or all of the projected data, including an algorithm compatible with any generic DA method. The major application explored here is PF-AUS, a new implementation of Assimilation in the Unstable Subspace (AUS) for Particle Filters. The PF-AUS implementation assimilates highly informative but low-dimensional observations. In the context of particle filtering, the projected approach mitigates the collapse of particle ensembles in high dimensional DA problems while preserving as much relevant information as possible, as the unstable and neutral modes correspond to the most uncertain model predictions. In particular we formulate and numerically implement PF-AUS with the optimal proposal and compare to the standard optimal proposal and to the Local Ensemble Transform Kalman Filter.

## Authors

• 5 publications
• 2 publications
11/30/2017

### Parameter Estimation for Subsurface flow using Ensemble Data Assimilation

Over the years, different data assimilation methods have been implemente...
01/22/2018

### Improving the particle filter for high-dimensional problems using artificial process noise

The particle filter is one of the most successful methods for state infe...
01/18/2019

### Algorithms for high-dimensional non-linear filtering and smoothing problems

Several numerical tools designed to overcome the challenges of smoothing...
07/27/2018

### Particle filters for applications in geosciences

Particle filters contain the promise of fully nonlinear data assimilatio...
01/10/2019

### Accounting for model error in Tempered Ensemble Transform Particle Filter and its application to non-additive model error

In this paper, we trivially extend Tempered (Localized) Ensemble Transfo...
07/07/2020

### Innovative And Additive Outlier Robust Kalman Filtering With A Robust Particle Filter

In this paper, we propose CE-BASS, a particle mixture Kalman filter whic...
07/16/2021

### Auto-differentiable Ensemble Kalman Filters

Data assimilation is concerned with sequentially estimating a temporally...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Many Bayesian data assimilation techniques were developed based on extending assumptions of linearity in the phase space and data models and under the assumption of Gaussian errors. Several techniques have proven to be successful in weakening these assumptions, while other techniques have been developed to explicitly overcome these obstacles. Important among these are particle filters, a key subject of this paper. Particle filters have proven to be successful for low dimensional problems but tend to have difficulty with higher dimensional problems. Different variants of particle filters have been develop to combat these difficulties, including implicit particle filters, the optimal proposal, proposal density methods, etc..

Our contribution in this paper is to develop a framework for data assimilation schemes in which the data are constrained by a projection to lie in some subspace of observation or model space. This is motivated in large part by assimilation in the unstable subspace (AUS) techniques. These techniques have largely focused on projecting the phase space model using Lyapunov vectors while employing the original data or observational model. The techniques and framework developed in this paper allow for combinations of projected and unprojected physical and data models and their formulation is independent of the source of the time dependent projections. Using the framework developed here, we focus on techniques for projecting the observations in particle filters to lessen the impact of the curse of dimensionality. The framework and techniques lead to several natural applications. In particular we develop, implement, and compare two new particle filter algorithms based upon a dimension reduction technique into the unstable subspace and to show the application of these projected particle filter techniques.

The approach is related to three past approaches that use projections to identify a low-dimensional subspace that captures the most uncertain, or most nonlinear, directions in state space. The AUS techniques [CaGhTrUb08, TrDiTa10, ToHu13, PaCaTr13, LaSaShSt14, SaSt15] to improve speed and reliability of data assimilation specifically address the partitioning of the tangent space into stable, neutral and unstable subspaces corresponding to Lyapunov vectors associated with negative, zero and positive Lyapunov exponents. In particular, Trevisan, d’Isidoro & Talagrand propose a modification of 4DVar, so-called 4DVar-AUS, in which corrections are applied only in the unstable and neutral subspaces [TrDiTa10, PaCaTr13]. These techniques are based on updating in the unstable portion of the tangent space and may be interpreted in terms of projecting covariance matrices during the assimilation step.

Motivated by these techniques for assimilation in the unstable subspace, in [ProjShadDA] a new method is developed for data assimilation that utilizes distinct treatments of the dynamics in the stable and non-stable directions. In particular, the first phase of this development has involved employing time dependent Lyapunov vectors to form a subsystem with tangent space dynamics similar to the unstable subspace of the original state space model. This is motivated by the AUS techniques (e.g. [CaGhTrUb08, TrDiTa10]). In [ProjShadDA] the following Newton like iteration was considered, alternatively correcting in the unstable space and the stable subspace. For a smooth discrete time model and projection , alternate solving the two projected subsystems, using shadowing refinement in the first subsystem and forward in time evolution in the second. Here is the Newton iterate. With known, solve for , , and then update:

 u(k)n+1+dn+1=Pn+1Fn(u(k)n+dn),u(k+12)n=u(k)n+dn,n=0,1,...,N−1. (1)

For the known, solve for , , and update:

 u(k+12)n+1+en+1=(I−Pn+1)Fn(u(k+12)n+en),u(k+1)n=u(k+12)n+en,n=0,1,...,N−1. (2)

This manuscript develops a complementary approach to project the data model as well.

The third branch of projected DA schemes use the ‘Dynamically Orthogonal’ (DO) formulation [SL09, Sapsis]

, in which the forecast model is broken into a partial differential equation governing the mean field and a number of stochastic differential equations describing the evolution of components in a time-dependent stochastic subspace of the original differential equation. The DO approach was used to assimilate with different DA schemes in the subspace and mean field space in

[SL13, MajdaQiSapsis14, QiMajda15].

We develop a projected DA framework and algorithms for arbitrary time dependent orthogonal projections, but are mainly interested in the AUS approach where the projections identify the unstable/neutral subspace. To determine the projections we will employ standard techniques for approximation of Lyapunov exponents, e.g., the so-called discrete QR algorithm (see [DVV07, DiVV15]). Unlike past work on the topic we will focus on confining the data, not the model, to the unstable subspace. If the non-stable subspace is relatively low dimensional this makes applications of techniques such as particle filters appealing.

Particle Filters [doucet]

are particularly effective for nonlinear problems and for the tracking of non-Gaussian, multi-modal probability distributions; but they suffer from the so-called curse of dimensionality (see, e.g., Snyder

[SnyderEtAl08, Snyder2011] Morzfeld et al. [MTAC12] and Van Leeuwen [van2012particle]). There is a known formulation that minimises degeneracy; however, even with a linear model, it is known that the computational cost of this "Optimal Proposal" Particle Filter scales like the exponential of the observation dimension [Snyder2011]. Our aim in this work is to avoid the established limits of particle filter performance by reducing the observation dimension in a sensible way. Other efforts to bypass this limitation include, e.g., the Equivalent Weights Particle Filter [Leeuwen10]. One attractive feature of our approach is that it is a reformulation of the standard DA problem rather than a specific algorithm, and so it is compatible with these advanced particle filters.

The availability of the projection into the unstable subspace will also allow us to develop a novel approach to resampling. It is necessary to periodically refresh any particle ensemble, and some noise is usually added at this step. To avoid forcing the ensemble off the attractor with this added noise we confine most of the noise to the unstable subspace, which improves the filter accuracy and reduces the incidence of resampling in later steps.

This paper is organized as follows. Data assimilation is reviewed in section 2 and projected DA is formulated in section 3. Algorithms for using the new projected data are introduced (section 4) and applied to AUS with several numerical experiments (section 5). A discussion (section 6) and bibliography conclude the paper.

## 2 Data Assimilation

Data assimilation methods combine orbits from a dynamical system model with measurement data to obtain an improved estimate for the state of a physical system. In this paper we develop a data assimilation method in the context of the discrete time stochastic model

 un+1=Fn(un)+ωn,n=0,1,... (3)

where are the state variables at time and

, i.e., drawn from a normal distribution with mean zero and model error covariance

. Let the sequence , be a distinguished orbit of this system, referred to as the true solution of the model, and presumed to be unknown. As each time is reached we collect an observation related to via

 yn=Hutn+ηn,yn∈RM (4)

where , , is the observation operator, and the noise variables are drawn from a normal distribution with zero mean and known observational error covariance matrix . In general the observation operator can be nonlinear.
We formulate DA under the ubiquitous Bayesian approach. Consider the assimilation of a single observation, , at time step . Given a prior estimate of the state, Bayes’ Law gives

 p(un|yn) ∝p(yn|un)p(un),

Using (4) the likelihood function is, up to a normalization constant,

 (5)

This procedure, which we have written for the assimilation of data at a single observation time, readily extends to the sequential assimilation of observations at multiple times under the assumptions that the state is Markovian and the observations at different times are conditionally independent (see for example [HandbookDA]).

In the following we introduce some key DA schemes. Not much detail is given here, but the interested reader is referred in particular to three recent books on DA, [ReichCotter15, Law2015, Asch16].

### 2.1 Kalman Filtering

The Kalman Filter and later extensions are ubiquitous in DA, and are now briefly described. For a linear model, i.e. where (3) is

 un+1=Anun+ωn, (6)

and for the linear observation operator , the Kalman Filter calculates the exact posterior , where the analysis variables are

 uan= ufn+Kn(yn−Hufn), (7) Pan= (I−KnH)Pfn. (8)

The weight matrix is the Kalman gain matrix

 Kn=PfnHT(HPfnHT+R)−1. (9)

The superscript is reserved for forecast variables, obtained at time by using (6) to update ,

 ufn=An−1uan−1+ωn−1,
 Pfn=An−1Pan−1ATn−1+Σ.

The covariance matrices and are not bolded in order to distinguish them from the projections that are the focus of the remainder of the paper. Two extensions of the Kalman Filter are prevalent in nonlinear DA, the Extended Kalman Filter (EKF) and Ensemble Kalman Filter (EnKF). Neither give the exact posterior for a nonlinear model.

#### 2.1.1 Extended Kalman Filter

The nonlinear model (3) is used to make the forecast , and then the Kalman Filter update is applied using the linearisation

 An=∂Fn∂u∣∣∣uan−1.

If the observation operator is a nonlinear function , the linearization

 Hn=dhdu∣∣∣ufn

is used everywhere except to compute the innovation in the calculation of .

The EKF is suitable for low dimensional nonlinear filtering, but the required linearizations are nontrivial for high-dimensional filtering. The EnKF by contrast is well suited to high dimensions.

#### 2.1.2 Ensemble Kalman Filter

The Ensemble Kalman Filter is a Monte Carlo approximation of the Kalman Filter that is well suited to high dimensional filtering problems, introduced in [Evensen94, Burgers98]. An ensemble of forecasts are made at time , from 1 to . Then the forecast covariance is approximated by the sample covariance of the ensemble, and the analysis ensemble is obtained in such a way that its mean satisfies (7) and its sample covariance satisfies (8). In this paper we will use analysis updates corresponding to the Ensemble Transform Kalman Filter (ETKF) [Bishop2001] or Local Ensemble Transform Filter (LETKF) [Hunt2007]. For more details and a modern introduction to the Ensemble Kalman Filter, see e.g. [evensen2009data].

### 2.2 The Particle Filter

Particle Filters (PF) are a collection of particle based data assimilation schemes do not rely on linearization of the dynamics or Gaussian representations of the posterior; see [doucet] for a comprehensive review. The basic idea is to represent the prior distribution , previously the forecast, and the posterior distribution , previously the analysis, by discrete probability measures. Suppose that at time we have the posterior distribution , supported on points and with weights . Each and . Here L is the number of particles that are used to approximate the distribution . The two key steps in the Particle Filter are as follows:
Prediction step. Propagate each of the particles . One simple choice, the bootstrap PF, is to use the state dynamics (3) to forecast each particle.
This gives the forecast probability distribution as a discrete probability measure concentrated on points with weights .

Filtering step. Update the weights using the observation by setting

 win=cwin−1p(yn|uin),

where is chosen so that .

This scheme is easy to implement but suffers from severe degeneracy, especially in high dimensions. That is, after a few time steps all the weight tends to concentrate on a few particles. A common remedy is to monitor the Effective Sample Size (ESS) and resample when the ESS drops below some threshold in order to refresh the particle cloud; see e.g. [doucet, HandbookDA].

#### 2.2.1 The Optimal Proposal

The optimal proposal particle filter (OP-PF) [SnyderEtAl08, DoucetEtAl2000, Snyder2011, van2012particle] attempts to address the degeneracy issue in particle filters with the aim of ensuring that all posterior particles have similar weights. The ‘proposal’ is the distribution used to update the particles from one time step to the next. In the prediction step in the basic particle filter above, the particles are updated using the model, so the proposal density in that approach is (compare (3)) .

The optimal proposal density is . Given the additive noise of the model (3), the optimal proposal update in each particle is Gaussian with , where

 Σ−1p= Σ−1+HTR−1H, (10) mn= Fn−1(un−1)+ΣpHTR−1(yn−HFn−1(un−1)). (11)

The prefactor can be written as the Kalman gain (9) (albeit with ) by an application of the Sherman-Morrison-Woodbury formula (see e.g. [Kalnay03], p. 171).

Two applications of Bayes’ law (e.g. in [Snyder2011]) show that the weight update for the -th particle drawn from this proposal satisfies and is also Gaussian,

 p(yn|un−1)∝ exp[−12(yn−HFn−1(un−1))T(Σ+HTRH)−1(yn−HFn−1(un−1))]. (12)

As mentioned in the previous section, degeneracy - characterised by a single particle with weight of approximately 1 - is a common problem in the PF. In [Snyder15] it is shown that, of all PF schemes that obtain using and

, the ‘optimal proposal’ above has the minimum variance in the weights. That is, it suffers the least from weight degeneracy. In

[VanLeeuwen18] this result is extended to any PF scheme that obtains using , and .
The distributions required to apply the Optimal Proposal are not always available (the additive model error of (3) and linear observation operator of (4) are required above to ensure the Gaussian character of the individual particle updates and weight updates), but when OP-PF can be formulated it is the least degenerate of a large class of filters. However, in [Snyder2011] it is shown that the optimal proposal requires an ensemble size L satisfying for a linear model, or will suffer from filter degeneracy.
That is, filter degeneracy is intimately connected to model and observation dimension, and is a fundamental component of most Particle Filters.

## 3 Formulating projected DA

We are developing techniques that decompose the state space model and observational model based upon the time dependent stability properties of the state space model. Suppose that at time a dynamically significant rank orthogonal projection is available, along with a splitting , where has orthonormal columns. In the application motivated above the projection identifies unstable modes, but the derivation below does not require this interpretation of .

The main result will be to derive a new data model

 yqn=~QTnPHutn+γn (13)

that is a linear transformation of (

4), where is derived from , from , and has known distribution. This new data model assimilates only the components of observations that can be written as a linear combination of the columns of .

We will derive (13) in three steps. The first obstacle to be overcome is that the projection is in state space with , while typically has different dimension.

### 3.1 Observations as a subspace of model space

Assume that has full row rank. Then defining where , and defining , the data model for is

 ~yn= H†yn = PHutn+H†ηn = PHutn+ψn

where .

The covariance matrix is positive semi-definite and has a singular normal distribution [TK15]. Using that one readily confirms that , that is the observation operator collapses onto the standard data model. The transformation through has not affected the output of a DA scheme, as .

### 3.2 Using the projections

We now make use of the orthogonal projection . The idea is to formulate a new data model, along the lines of , that contains only the components of the observation that align with the projection.

Depending on the dimension of the observations and on the rank of the projection , there are two ways this might be achieved. One way, as above, is to directly apply the projection to . The product of projections that is applied to the state vector in this data model is not itself a projection; in some circumstances it might be desired to instead identify the projection that is the intersection of and . This projection may be approximated accurately by e.g. Von Neumann’s algorithm or Dykstra’s projection algorithm; see Appendix A for a review. The construction of is possible in only some assimilation problems, in particular those in which the transversality condition is satisfied; otherwise there is no guarantee of any intersection between and .

For convenience we denote by either if , or if , and denote by the matrix with orthonormal columns satisfying

. This matrix is already known from the QR decomposition (

20) if

, or can be calculated via the singular value or Schur decomposition if

. For the case we abuse notation by redefining as the rank of .

Define , the projected observation. The data model is

 ypn= ~PnH†yn = ~PnPHutn+~PnH†ηn = ~Pnutn+ξn (14)

where . The data model has a singular normal distribution with support in the -dimensional subspace of model space spanned by the columns of , and the likelihood of this distribution can be written using the pseudo-inverse (see e.g. [TK15]) as

 p(ypn|u)∝ exp(−12(ypn−~PnPHu)T(~PnH†R(H†)T~Pn)†(ypn−~PnPHu)). (15)
###### Remark

If is invertible then the distribution of is non-singular, , and the above construction does not affect the output of a DA scheme.

To make explicit the reduction in the data dimension that has been obtained by we introduce a low dimensional data model . The following result establishes that is equivalent to (14).

###### Theorem

Define , with the associated data model

 yqn=Hqnutn+γn, (16)

where , , and . Then .

###### Proof

The matrix has orthonormal columns so and for any matrix

 (~QnB)†= B†~Q†n=B†~QTn, (B~QTn)†= (~QTn)†B†=~QnB†.

Applying these results to (15), and using that , , ,

 p(ypn|u)∝ exp(−12(ypn−~PnPHu)T(~PnH†R(H†)T~Pn)†(ypn−~PnPHu)) = = exp(−12(yqn−Hqnu)T~QTn~Qn(Rqn)†~QTn~Qn(yqn−Hqnu)) = exp(−12(yqn−Hqnu)T(Rqn)†(yqn−Hqnu)) = p(yqn|u).

If in addition for or for , and is full rank, then the covariance matrix of is invertible and has a standard normal distribution. More generally for , the Cholesky factorization, consider the SVD of , then the rank of the covariance matrix is equal to the number of non-zero singular values of .

Theorem 3.2 provides a blueprint for any DA scheme to be efficiently implemented with projected observations, involving the following changes: the observation is replaced with , the observation operator is replaced with , and the assumed measurement covariance is replaced with . This is only the most basic approach to using the projected data, and we will develop more complicated algorithms that use the orthogonal data, or even the original data, in addition to .

A data model for the complementary orthogonal projection is easy to write down with 2. Define

 yq⊥n=(~Q⊥n)T~yn∈RN−p, (17)

where

. The two projected data models are not independent in general and have joint distribution

 [yqnyq⊥n]∼N([HqnxtnHq⊥nxtn],[RqnRq12,nRq21,nRq⊥n]), (18)

where , , and the off-diagonal covariances are and .

## 4 Algorithms for Projected DA

In this section we discuss how some combination of the standard/projected forecast models (3), (1), (2) and data models (4), (16), (17)–(18) may be used to form a ‘projected DA scheme’.
A projected data model changes the innovation, the observation operator, and the observation error covariance. A projected physical model changes the prior and model error covariances. We want combinations of physical models, data models, and DA techniques that optimize the assimilation, particularly of the Particle Filtering schemes discussed in Section 2.2.

For comparison, past work on projected DA algorithms is now reviewed in the above framework. The standard AUS approach ([PaCaTr13], e.g.) can be derived by calculating the assimilation step using the projected model (1) and the original, unprojected data (4). This derivation will be performed in the next section. The projected shadowing approach of [ProjShadDA] iterates between assimilating unprojected data (4) into the projected model (1) and evolving the complementary orthogonal forecast model (2) without assimilation.

Neither AUS nor the shadowing approach have hitherto used projected data, though both may benefit from it. The use of projected data in AUS Particle Filters is a major topic for the remainder of the paper. For projected shadowing techniques, projected observations - that lie in the state space of the physical model - provide another means for generating an initial guess.

The DO-DA schemes in [SL13, MajdaQiSapsis14, QiMajda15] use both a projected and mean field model to make a forecast, similar to using (1)–(2). The data is naturally split into the ‘projected’ and remaining components without using (16) explicitly, as the data may be confined to the DO-subspace by simply subtracting the forecast mean field. This attractive feature of the DO methodology bypasses the need to derive the projected data model (16), as the covariance structure of the data does not change.

We identify the following approaches to assimilating with projected data using the results of this paper:

###### Algorithm (Project data only, and discard the orthogonal component)

Apply a standard DA scheme using the unprojected forecast model (3), but replace the standard data (4) with the projected data of (16). The observation operator is replaced by , and the covariance matrix of the observations is replaced by .

A Particle Filter employing Algorithm 4 will be tested on a stiff dissipative linear system in Section 5.1.

###### Algorithm (Project model and data, discarding the orthogonal component of the data)

Apply a standard DA scheme, replacing the model with the projected component (1), and replacing the data with the projected component (16). Update the orthogonal component of the forecast using (2), without assimilation.

The focus of this work is on projected data, and this algorithm will not be implemented here. However, it is mentioned as one approach to achieve a major reduction in the dimension of a DA problem, as both the model and data are projected. One problem-specific issue is that the orthogonal component of the model is not corrected by any assimilation scheme and may introduce error; in [ProjShadDA] these errors were handled by iterating the update of the projected and unprojected model repeatedly.

###### Algorithm (Project data and models, and assimilate both)

Project the forecast into the two subspaces, either by applying the projections and to a forecast or by using the projected models (1)–(2). Apply a hybrid DA scheme that uses a method suitable for low-dimensional, nonlinear filtering to assimilate in the -subspace, and a separate method suitable for high-dimensional filtering in the -subspace. When assimilating in the -subspace use the corresponding data (16), and when assimilating in the orthogonal subspace use (17).

The design of hybrid schemes such as the above is the subject of much interest lately; one such scheme is given by the DO-DA Blended PF [MajdaQiSapsis14]. Unlike the preceding or following schemes, the above description does not completely describe how to design such a hybrid DA scheme. The two DA schemes that assimilate in each subspace of Algorithm 4 must communicate information between each other, and should ideally respect the correlation of the two projected data models given in (18). (One can partially translate the DA algorithm between that paper and this: the state variables and there correspond to and here, so the splitting in ([MajdaQiSapsis14], eqn. 1) of the prior density plays the same role as the projected models (1)–(2). The vectors in ([MajdaQiSapsis14], eqn. 8) are the columns of here. As mentioned above the DO formulation makes the two projected data models trivial to obtain, and their covariance structure in ([MajdaQiSapsis14], eqn. 15) is equivalent to the covariance described in (18).)

The last algorithm to be described is a novel, efficient PF scheme taking advantage of the Optimal Proposal PF described in section 2.2.1. The scheme differs from an Algorithm 4 implementation of OP-PF by using the full observations to form the proposal, and differs from an Algorithm 4 implementation by not using the orthogonal data in the assimilation step.

###### Algorithm (Blend projected and unprojected data in the assimilation step)

This algorithm describes a Particle Filter. For the particle update or proposal step, use the typical optimal proposal equations (10)–(11). Compute the weight update for each particle using the projected data model only, i.e. using an Algorithm 4 version of (12).

Algorithm 4 uses all available data to update the particles, but only updates the weights based on how well the particles represent the projected data. This strategy will be tested on the chaotic Lorenz-96 system in Section 5.2. One major advantage of this approach is that it is requires no modification to the numerical simulation used to obtain the forecast. A second advantage is its efficiency; the full data are used for the particle update step, over which the update is straightforward and the dimension of the data does not lead to filter degeneracy; and only the projected data are used to avoid filter degeneracy in the weight update step. The scheme will prove to be more accurate than either, OP-PF or an Algorithm 4 implementation of OP-PF, in numerical tests.

### 4.1 Convergence results for projected algorithms

A normal line of inquiry for a new DA algorithm is to quantify the conditions under which it will well represent the posterior distribution, which neglecting time subscripts we write as . The projected algorithms above do not generally converge to , and so there are two questions: ‘Does the algorithm converge to a known distribution?’, and ’How different is that distribution to the usual posterior?’.

Algorithm 4 clearly implements an approximation of the distribution , and similarly Algorithm 4 approximates . That is, a Particle Filter implementation of either approach would converge to the above distributions in the limit as the number of particles approaches infinity. The distribution approximated by Algorithm 4 is problem specific and may not be easy to write down, and similarly the distribution approximated by Algorithm 4 is a blending of and that is non-trivial to obtain in closed form.

We now quantify how the above distributions relate to the standard posterior . For this we will employ the Hellinger distance: given two probability distributions and , with associated probability measures and , the Hellinger distance between the two is

 dH(μ,μ′)=[12∫(√ρ(u)−√ρ′(u))2du]1/2. (19)

The Hellinger distance is a useful metric as it constrains the difference between functions in the two probability spaces, , true for any that is square integrable over and [Law2015].

To bound this distance for Algorithm 4 we write and . The second distribution is written as

 p(u|yq)=p(u|y)p(y|yq)p(y|u,yq),

obtained via Bayes’ law in the form , conditioning on , and using . Substituting into (19) we obtain a bound for the consistency of Algorithm 4 with the original posterior ,

 dH(μ,μ′) =⎡⎣12∫(1−√p(y|yq)p(y|u,yq))2ρ(u)du⎤⎦1/2 =⎡⎢⎣12Eμ⎛⎝1−√p(y|yq)p(y|u,yq)⎞⎠2⎤⎥⎦1/2.

Intuition on the projected algorithms suggests that if the projection somehow represents ‘important’ quantities in the model, e.g. directions associated with positive Lyapunov exponents, or coherent structures, etc, then the projected data will retain the same key information from the original data, and the posterior approximated by the projected DA algorithm will be similar to the original posterior. The above result quantifies that intuition. The posterior distribution associated with the projected algorithm will be close to provided that the ratio at the most likely values of , that is, provided that the projected data contains information sufficient to constrain the values of the unprojected data in a neighbourhood of the truth. The ratio can be written as to further clarify that knowledge of the projected data should constrain the values of the orthogonal projected data.

A similar bound may easily be established for Algorithm 4. Let now , and the associated probability measure; then

 dH(μ,μ′) =⎡⎢⎣12Eμ⎛⎜⎝1− ⎷p(~Pnu|y)p(u|y)p(y|yq)p(y|~Pnu,yq)⎞⎟⎠2⎤⎥⎦1/2.

The modification to this expression from the previous one is a second ratio that measures the information gap between the distribution of and of . As with the previous expression, the ratio is far from 1 if knowledge of does not constrain the possible values of at all, and close to 1 if the projected data does have some relation to the full state.

An intuitive example of the above bounds in practice is a slow-fast system with a center manifold onto which the fast variables are attracted. Choosing to identify the slow variables will lead to a small value of for either Algorithm 4 or 4, since knowledge of the slow variables is sufficient to constrain the fast variables. In the case where there are few slow variables and many fast variables, then, an Algorithm 4 Particle Filter will be a much less degenerate implementation of the Particle Filter that converges close to the desired posterior . A linear system of this type will be the first numerical example, in Section 5.1.

## 5 Application: Assimilation in the Unstable Subspace

For the remainder of the paper we will study the case where the projection identifies the most unstable modes in the forecast model. To determine these modes we employ the discrete QR algorithm [DVV07, DiVV15]. For the discrete time model with , let

denote a random matrix such that

,

 Qn+1Rn= F′n(un)Qn≈1ϵ[Fn(un+ϵQn)−Fn(un)],n=0,1,... (20)

where and is upper triangular with positive diagonal elements. With a finite difference approximation the cost is that of an ensemble of size plus a reduced via modified Gram-Schmidt to re-orthogonalize. Time dependent orthogonal projections to decompose state space are and , where as described in Section 3.2 refers either to or to the intersection of and .

In this section we establish the relationship between existing AUS algorithms and the projected data approach. For that reason, in this section AUS is used to refer only to traditional EKF-AUS ([Trevisan11, PaCaTr13], e.g.). This approach implements a modified EKF in which the forecast covariance matrix is replaced by the projected matrix , leading to the Kalman gain

 Kn= PnPfnPnHT[HPnPfnPnHT+R]−1, (21)

where the EKF forecast covariance matrix and observation operator are described in Section 2.1.1. It is clear that the EKF-AUS Kalman gain can be written as a combination of the columns of .

The Algorithm 4 implementation of the EKF, with projection to compare with AUS, would have Kalman gain given by

 Kn=PfnPHPn[PnH†(HPfnHT+R)(H†)TPn]†. (22)

The difference between the two Kalman gains is essentially that (22) interchanges the position of and , requiring the use of in order to do so, but manages to project all terms in the covariance-weighting inverse instead of only the forecast covariance matrix. Unlike the classical AUS gain (21), (22) does not restrict the analysis increment to the unstable subspace. The innovation is in classical AUS, but with (22) would be .
That is, classical AUS uses the full data but restricts the assimilation update to the unstable subspace via (21); Algorithm 4 restricts the innovation to the unstable subspace but the assimilation update can distribute this innovation across the whole of model space. The comparison between these algorithms here is pedagogical, not competitive; the advantages of the EKF-AUS algorithm are well established, while Algorithm 4 effects a reduction in data dimension that we will explore for Particle Filters, not the EKF.

Finally we obtain a form of EKF associated with the projected models (1)–(2) and unprojected data. This is essentially a re-derivation of EKF-AUS from the projected framework employed in this paper, confirming that the two are compatible. Consider the linearized physical model of Section 2.1.1. Then the projected physical model has the form or

 vn+1= ~Pn+1un+1=[~Qn+1Rn~QTn]vn+~Pn+1ωn ≡ Bnvn+~Pn+1ωn,

where and using The forecast covariance matrix is

 ~Pfn+1= ~Pn+1An~PanATn~Pn+1+~Pn+1Σ~Pn+1 = An(~Pn~Pan~Pn)ATn+~Pn+1Σ~Pn+1

since . The first row of the forecast covariance matrix is the EKF-AUS forecast covariance if and , and so the assimilation step produces EKF-AUS with Kalman gain (21).

We will now explore the benefits of the projected data algorithms in an AUS framework. For the remainder of the paper we refer to the projected DA schemes as AUS schemes, e.g. PF-AUS for an Algorithm 4 Particle Filter and OP-PF-AUS for an Algorithm 4 Optimal Proposal Particle Filter.
The first test case is a simple linear model that demonstrates the situation in which PF-AUS will significantly improve on the standard Particle Filter.

### 5.1 Case study: linear model with Gaussian noise

Suppose that forecasts are made with a physical model of the form

 dudt=Au+σ˙W,

where is a Wiener process. The one-step model (3) is obtained through a simulation of this physical model with a standard numerical integrator, initialised at one observation time and terminating at the next. We construct

so that it has two eigenvalues with small real part

, and so that the remaining 98 eigenvalues have real part . This produces a well known multiscale dynamic, which we describe for the deterministic case . There exists a transformation of this system into a system consisting of 2 ‘slow’ and 98 ‘fast’ variables. The fast variables are rapidly attracted onto a slow invariant manifold that depends only on the slow variables. Using centre manifold theory [Carr]

one can obtain an expression for this slow manifold, and furthermore calculate an ordinary differential equation that governs the evolution of the 2 slow variables.

We run DA experiments assuming that the slow manifold and reduced system are unknown, instead using forecasts and observations from the full, 100-dimensional system. The key to a successful assimilation step will be to get the correct distribution for the (unknown) 2-dimensional subspace of model space corresponding to the slow variables.

We present results for the PF compared to PF-AUS using Algorithm 4. Four scenarios are considered: where every variable is observed, every second variable, every fourth variable, and finally a scenario in which only two variables are observed. We collect these observations every time units until 100 observation times have passed, with small measurement error covariance and model noise

. Both PF algorithms resample if the ESS drops below half the number of particles, which is 1000. On resampling, noise is added to every variable with a standard deviation of

. We use PF-AUS with .

We report the RMSE between the filter mean at each time step and the true system state. The standard Particle Filter performs very poorly with high dimensional data, while PF-AUS is reasonably indifferent to the dimension of the data and in all cases has mean RMSE below the RMSE of the observations. Results are displayed in Figure

1.
Two extremes of the data dimension serve to highlight its role in Particle Filter divergence. In Figure (a)a every variable is accurately observed, and consequently one could obtain a reasonable estimate of the system at every observation time by directly using the data. Despite this, and despite the low-dimensional attractor in the state dynamics, the Particle Filter estimate diverges frequently and far from the true state. The other extreme in data availability is Figure (d)d, in which only two variables are observed and the Particle Filter has an accurate mean RMSE of 0.03. By comparison PF-AUS is more accurate than the observations in each scenario, and in particular does not diverge at large data dimension.

We now present examples from the Lorenz 96 system.

### 5.2 Case Study: Chaotic Lorenz 96 system

Consider the system of ordinary differential equations introduced in [Lo96],

 ˙xi= (xi+1−xi−2)xi−1−x+F, (23)

where runs from 1 to J, and . If , then this system is chaotic with 14 positive and 1 neutral Lyapunov exponents.
We present experiments in which the deterministic part of the model (3) is given by an integration of (23) for a fixed time.

In this section we will implement Algorithm 4, OP-PF-AUS, and compare to the OP-PF and EnKF. We make the following modification to resampling in OP-PF-AUS:

###### Algorithm (Resampling in the Unstable Subspace)

When adding noise after resampling, generate (the usual) noise sampled from and then multiply this random vector by , for some

When this algorithm is no different to the normal resampling approach, but for some proportion of the uncertainty in resampling is constrained to lie in the space spanned by the columns of . For AUS the resampling scheme should add more noise in the directions of greatest uncertainty in the forecast model, which provides one advantage; a second advantage is that the algorithm does not shift particles as far off the attractor.

We now tune in Algorithm 5.2 to demonstrate the benefit of constraining resampling noise to the most unstable directions. For this experiment the mean RMSE and percentage of resampling steps were recorded for 20 runs of OP-PF-AUS at 20 different values of and 10 different values of , choosing for OP-PF-AUS. Figure 2 clearly displays that the RMSE and filter degeneracy both strictly decrease as increases, at all considered values of . For this experiment the model noise covariance is , and every second variable is observed with error covariance .

The remaining experiments use Algorithm 5.2 for resampling in the AUS algorithms with . The resampling noise is optimised separately in the standard PF and the PF-AUS algorithms by choosing the minimum RMSE result over 20 values in .

We now investigate the optimal choice of dimension for the projected data in OP-PF-AUS. One might expect that would be optimal, as the system has 15 unstable and neutral modes. However, the blending of projected and unprojected data in Algorithm 4 will sufficiently constrain the weakly unstable modes in the system, and at the same time the PF algorithm will avoid degeneracy at low values of . The RMSE and number of resampling steps taken by OP-PF-AUS compared to OP-PF are displayed in Figure 3. The RMSE has a clear minimum at , about less than the OP-PF RMSE, and the frequency of resampling in OP-PF-AUS decreases sharply with . The optimal choice of noise to add on resampling was for OP-PF, and