# Robust data assimilation using L_1 and Huber norms

Data assimilation is the process to fuse information from priors, observations of nature, and numerical models, in order to obtain best estimates of the parameters or state of a physical system of interest. Presence of large errors in some observational data, e.g., data collected from a faulty instrument, negatively affect the quality of the overall assimilation results. This work develops a systematic framework for robust data assimilation. The new algorithms continue to produce good analyses in the presence of observation outliers. The approach is based on replacing the traditional Ł_2 norm formulation of data assimilation problems with formulations based on Ł_1 and Huber norms. Numerical experiments using the Lorenz-96 and the shallow water on the sphere models illustrate how the new algorithms outperform traditional data assimilation approaches in the presence of data outliers.

There are no comments yet.

## Authors

• 5 publications
• 34 publications
• 5 publications
• 1 publication
• ### Robust data-driven discovery of governing physical laws using a new subsampling-based sparse Bayesian method to tackle four challenges (large noise, outliers, data integration,

The derivation of physical laws is a dominant topic in scientific resear...
07/17/2019 ∙ by Sheng Zhang, et al. ∙ 6

• ### Robust Estimation for Linear Panel Data Models

In different fields of applications including, but not limited to, behav...
05/13/2020 ∙ by Beste Hamiye Beyaztas, et al. ∙ 0

• ### Robust subsampling-based sparse Bayesian inference to tackle four challenges (large noise, outliers, data integration, and extrapolation) in the discovery of physical laws from

The derivation of physical laws is a dominant topic in scientific resear...
07/17/2019 ∙ by Sheng Zhang, et al. ∙ 23

• ### Preconditioning mixed finite elements for tide models

We describe a fully discrete mixed finite element method for the lineari...
03/03/2020 ∙ by Tate Kernell, et al. ∙ 0

• ### Basis Pursuit Denoise with Nonsmooth Constraints

Level-set optimization formulations with data-driven constraints minimiz...
11/28/2018 ∙ by Robert Baraldi, et al. ∙ 4

• ### ROBIN: a Graph-Theoretic Approach to Reject Outliers in Robust Estimation using Invariants

Many estimation problems in robotics, computer vision, and learning requ...
11/07/2020 ∙ by Jingnan Shi, et al. ∙ 0

• ### A robust method based on LOVO functions for solving least squares problems

The robust adjustment of nonlinear models to data is considered in this ...
11/29/2019 ∙ by E. V. Castelani, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Dynamic data-driven application systems (DDDAS [3]

) integrate computational simulations and physical measurements in symbiotic and dynamic feedback control systems. Within the DDDAS paradigm, data assimilation (DA) defines a class of inverse problems that fuses information from an imperfect computational model based on differential equations (which encapsulates our knowledge of the physical laws that govern the evolution of the real system), from noisy observations (sparse snapshots of reality), and from an uncertain prior (which encapsulates our current knowledge of reality). Data assimilation integrates these three sources of information and the associated uncertainties in a Bayesian framework to provide the posterior, i.e., the probability distribution conditioned on the uncertainties in the model and observations.

Two approaches to data assimilation have gained widespread popularity: ensemble-based estimation and variational methods. The ensemble-based methods are rooted in statistical theory, whereas the variational approach is derived from optimal control theory. The variational approach formulates data assimilation as a nonlinear optimization problem constrained by a numerical model. The initial conditions (as well as boundary conditions, forcing, or model parameters) are adjusted to minimize the discrepancy between the model trajectory and a set of time-distributed observations. In real-time operational settings the data assimilation process is performed in cycles: observations within an assimilation window are used to obtain an optimal trajectory, which provides the initial condition for the next time window, and the process is repeated in the subsequent cycles.

Large errors in some observations can adversely impact the overall solution to the data assimilation system, e.g., can lead to spurious features in the analysis [15]. Various factors contribute to uncertainties in observations. Faulty and malfunctioning sensors are a major source of large uncertainties in observations, and data quality control (QC) is an important component of any weather forecasting DA system [17]. The goal of QC is to ensure that only correct observations are used in the DA system. Erroneous observations can lead to spurious features in the resulting solution [15]. Departure of the observation from the background forecast is usually used as a measure for accepting or rejecting observations: observations with large deviations from the background forecast are rejected in the quality control step [12]. However, this process has important drawbacks. Tavolato and Isakksen present several case studies [30] that demonstrate that rejecting observations on the basis of background departure statistics leads to the inability to capture small scales in the analysis.

As an alternative to rejecting the observations which fail the QC tests, Tavolato and Isakksen [30] propose the use of Huber norm in the analysis. Their Huber norm approach adjusts error covariances of observations based on their background departure statistics. No observations are discarded; but smaller weights are given to low quality observations (that show large departures from the background). Huber norm has been used in the context of robust methods for seismic inversion [11]

and ensemble Kalman filtering

[24]. The issue of outliers has also been tackled by regularization techniques using norms [9]. To the best of our knowledge, no prior work has fully addressed the construction of robust data assimilation methods using these norms. For example [30] adjusts the observation data covariances but then applies traditional assimilation algorithms.

This paper develops a systematic mathematical framework for robust data assimilation. Different data assimilation algorithms are reformulated as optimization problems using and Huber norms, and are solved via the alternating direction method of multipliers (ADMM) developed by Boyd [4], and via the half-quadratic minimization method presented in [21].

The remainder of the paper is organized as follows: Section 2 reviews the data assimilation problem and the traditional solution algorithms. Section 3 presents the new robust algorithms for 3D-Var, Section 4 for 4D-Var, and Section 5 develops new robust algorithms for ensemble based approaches. Numerical results are presented in Section 6. Concluding remarks and future directions are discussed in Section 7.

## 2 Data assimilation

Data assimilation (DA) is the process of fusing information from priors, imperfect model predictions, and noisy data, to obtain a consistent description of the true state of a physical system [8, 14, 26, 27]. The resulting best estimate is called the analysis .

The prior information encapsulates our current knowledge of the system. Prior information typically consists of a background estimate of the state , and the corresponding background error covariance matrix .

The model captures our knowledge about the physical laws that govern the evolution of the system. The model evolves an initial state at the initial time to states at future times . A general model equation is:

 (1) xi=Mt0→ti(x0),i=1,⋯,N.

Observations are noisy snapshots of reality available at discrete time instances. Specifically, measurements of the physical state are taken at times ,

 (2) yi=H(xi)+εi,εi∼N(0,Ri),i=1,⋯,N,

where the observation operator maps the model state space onto the observation space. The random observation errors

are assumed to have normal distributions.

Variational methods solve the DA problem in an optimal control framework. Specifically, one adjusts a control variable (e.g., model parameters) in order to minimize the discrepancy between model forecasts and observations. Ensemble based methods are sequential in nature. Generally, the error statistics of the model estimate is represented by an ensemble of model states and ensembles are propagated to predict the error statistics forward in time. When the observations are assimilated, the analysis is obtained by operating directly on the model states. It is important to note that all the approaches discussed in the following subsections aim to minimize the norm of the discrepancy between observations and model predictions.

Ensemble-based methods employ multiple model realizations in order to approximate the background error distribution of any model state. The empirical moments of the ensemble are used in order to estimate the background state and the background error covariance matrix. Since the model space is several times larger than the ensemble size, localization methods are commonly used in order to mitigate the impact of sampling errors (spurious correlations). For example, the local ensemble transform Kalman filter

[13] performs the assimilation of observations locally in order to avoid the impact of analysis innovations coming from distant components.

We next review these traditional approaches.

### 2.1 Three-dimensional variational data assimilation

The three dimensional variational (3D-Var) DA approach processes the observations successively at times . The background state (i.e., the prior state estimate at time ) is given by the model forecast started from the previous analysis (i.e., the best state estimate at ):

 (3) xbi=Mti−1→ti(xai−1).

The discrepancy between the model state and observations at time , together with the departure of the state from the model forecast , are measured by the 3D-Var cost function

 (4a) J(xi)=12∥xi−xbi∥2B−1i+12∥H(xi)−yi∥2R−1i. When both the background and observation errors are Gaussian the function (4a) equals the negative logarithm of the posterior probability density function given by Bayes’ theorem. While in principle a different background covariance matrix should be used at each assimilation time, in practice the same matrix is re-used throughout the assimilation window, Bi=B,i=1,…,N. The 3D-Var analysis is the maximum a-posteriori (MAP) estimator, and is computed as the state that minimizes (4a) (4b) xai=argminxiJ(xi).

### 2.2 Four-dimensional variational data assimilation

Strong-constraint four-dimensional variational (4D-Var) DA processes simultaneously all observations at all times within the assimilation window. The control parameters are typically the initial conditions , which uniquely determine the state of the system at all future times under the assumption that the model (1) perfectly represents reality. The background state is the prior best estimate of the initial conditions , and has an associated initial background error covariance matrix . The 4D-Var problem provides the estimate of the true initial conditions as the solution of the following optimization problem

 (5a) xa0= arg\, % minx0  J(x0)subject to (???), (5b) J(x0)=12∥x0−xb0∥2B−10+12N∑i=1∥H(xi)−yi∥2R−1i.

The first term of the sum (5b) quantifies the departure of the solution from the background state at the initial time . The second term measures the mismatch between the forecast trajectory (model solutions ) and observations at all times in the assimilation window. The covariance matrices and need to be predefined, and their quality influences the accuracy of the resulting analysis. Weak-constraint 4D-Var [26] removes the perfect model assumption by allowing a model error .

In this paper we focus on the strong-constraint 4D-Var formulation (5a). The minimizer of (5a) is computed iteratively using gradient-based numerical optimization methods. First-order adjoint models provide the gradient of the cost function [5]

, while second-order adjoint models provide the Hessian-vector product (e.g., for Newton-type methods)

[28]. The methodology for building and using various adjoint models for optimization, sensitivity analysis, and uncertainty quantification is discussed in [27, 6]. Various strategies to improve the the 4D-Var data assimilation system are described in [7]. The procedure to estimate the impact of observation and model errors is developed in [23, 22]. A framework to perform derivative free variational data assimilation using the trust-region framework is given in [25].

### 2.3 The ensemble Kalman filter

The uncertain background state at time is assumed to be normally distributed . Given the new observations at time , the Kalman filter computes the Kalman gain matrix, the analysis state (i.e., the posterior mean), and the analysis covariance matrix (i.e., the posterior covariance matrix) using the following formulas, respectively:

 (6a) Ki = PbiHTi(HiPbiHTi+Ri)−1, (6b) ¯¯¯xai = ¯¯¯xbi+Ki(yi−H(¯¯¯xbi)), (6c) Pai = (I−KiHi)Pbi.

The ensemble Kalman filter describes the background probability density at time

by an ensemble of forecast states The background mean and covariance are estimated from the ensemble:

 (7) ¯¯¯xbi≈1\textscnens\textscnens∑ℓ=1xb⟨ℓ⟩i,Pbi≈1\textscnens−1Xbi(Xbi)T,

where the matrix of state deviations from the mean is

 Xbi=[xb{1}i−¯¯¯xbi,…,xb{\textscnens}i−¯¯¯xbi]≈1√\textscnens−1(Pbi)1/2.

The background observations and their deviations from the mean are defined as:

 ¯¯¯¯hbi=1\textscnens\textscnens∑ℓ=1H(xb⟨ℓ⟩i),Ybi=[H(xb{1}i)−¯¯¯¯hbi,…,H(xb{\textscnens}i)−¯¯¯¯hbi].

Any vector in the subspace spanned by the ensemble members can be represented as

 x=¯¯¯xbi+Xbiw.

The ensemble Kalman filter makes the approximations (7) and computes the Kalman gain matrix (6a) as follows:

 (8a) Ki = XbiSi(Ybi)TR−1i, (8b) Si := ((\textscnens−1)I+(Ybi)TR−1iYbi)−1.

The mean analysis (6b) is:

 (9) ¯¯¯xai = ¯¯¯xbi+XbiSi(Ybi)TR−1i(yi−H(¯¯¯xbi))

Using we have the mean analysis in ensemble space:

 (10) ¯¯¯¯wai = Si(Ybi)TR−1i(yi−H(¯¯¯xbi)).

The analysis error covariance (6c) becomes:

 (11) Pai = XbiSi(Xbi)T.
##### Ensemble square root filter (EnSRF)

The 3D-Var cost function (4a) is formulated with the background error covariance replaced by the forecast ensemble covariance matrix (7). It can be shown that the analysis mean weights (10) are the minimizer of [13]:

 (12)

The local transform ensemble Kalman filter [13] computes the symmetric square root of matrix (8b)

 (13) Si=(\textscnens−1)−1WiWTi,

and obtains the analysis ensemble weights by adding columns of the factors to the mean analysis weight:

 (14) wa⟨ℓ⟩i=¯¯¯¯wai+Wi(:,ℓ),xa⟨ℓ⟩i=¯¯¯xbi+Xbiwa⟨ℓ⟩i.
##### Traditional ensemble Kalman filter (EnKF)

In the traditional version of EnKF the filter (9) is applied to each ensemble member:

 xa⟨ℓ⟩i = xb⟨ℓ⟩i+XbiSi(Ybi)TR−1i(y⟨ℓ⟩i−H(xb⟨ℓ⟩i)),ℓ=1,…,\textscnens,

where are perturbed observations. This leads to the analysis weights

 (15) wa⟨ℓ⟩i= wb⟨ℓ⟩i+Si(Ybi)TR−1i(y⟨ℓ⟩i−H(xb⟨ℓ⟩i))≈ Si((\textscnens−1)wb⟨ℓ⟩i+(Ybi)TR−1i(y⟨ℓ⟩i−H(¯¯¯xbi))),

where the second relation comes from linearizing the observation operator about the background mean state. A comparison of (15) with (10) reveals that the classical EnKF solution (15) obtains the weights of the -th ensemble member by minimizing the cost function:

 (16)

The paper develops next a systematic framework to perform robust DA using Huber and norms. This will ensure that important information coming from outliers is not rejected but used during the quality control.

## 3 Robust 3D-Var data assimilation

### 3.1 L1-norm 3D-Var

Consider the 3D-Var cost function in equation (4a) that penalizes the discrepancy between the model state and observations at time , together with the departure of the state from the model forecast . Function (4a

) represents the negative log-likelihood posterior probability density under the assumption that both the background and the observation errors are Gaussian. In particular, the scaled innovation vector

 (17) zi:=R−1/2i[H(xi)−yi],

which represents scaled observation errors, is assumed to have a normal distribution (each observation error is independent and has a standard normal distribution). We are interested in the case where some observation errors have large deviations. To model this we assume that the scaled observation errors (17) are independent, and each of them has a univariate Laplace distribution:

The univariate Laplace distribution models an exponential decay on both sides away from the mean . Assuming that observation errors are unbiased () and independent the probability distribution of the scaled innovation vector is

 (18)

Under these assumptions the negative log-likelihood posterior function (4a) measures the discrepancy of the model state with observations in the norm. This leads to the following revised cost function:

 (19) J(xi)=12∥xi−xbi∥2B−1i+1λ∥∥R−1/2i[H(xi)−yi]∥∥1.

In this paper we choose , which leads to a cost function (19) that is similar to (4a

). This translates in an assumed variance of each observation error component equal to eight. This is no restriction of generality: the solution process discussed here can be applied to any value of

. For example, one can choose for an error component variance of one.

Following the Alternating Direction Method of Multipliers (ADMM) [4] we use the scaled innovation vector (17) to obtain the -3D-Var problem:

 (20) minJ(xi,zi)=12∥xi−xbi∥2B−1i+12∥zi∥1subject tozi=R−1/2i[H(xi)−yi].

The augmented Lagrangian equation for (20) is given by

 L = 12∥xi−xbi∥2B−1i+12∥zi∥1−λTi[R−1/2i[H(xi)−yi]−zi] +μ2∥R−1/2i[H(xi)−yi]−zi∥22,

where is the Lagrange multiplier and is the penalty parameter (a positive number). Using the identity

 μ2∥∥∥R−1/2i[H(xi)−yi]−zi−λiμ∥∥∥22=μ2∥∥R−1/2i[H(xi)−yi]−zi∥∥22 (22) +12μ∥λi∥22−λTi(R−1/2i[H(xi)−yi]−zi),

the augmented Lagrangian (3.1) can be written as

 (23) L=12∥xi−xbi∥2B−1i+12∥zi∥1−12μ∥λ∥22+μ2∥∥∥R−1/2i[H(xi)−yi]−zi−λiμ∥∥∥22.

The cost function (23) can be iteratively minimized as follows:

1. Initialize , , ,

2. Start with , , , and .

3. Fix , , and , and solve

 x{k+1}i := argminx 12∥∥x−xbi∥∥2B−1i +μ{k}2∥∥ ∥∥R−1/2i[H(x)−yi]−z{k}i−λ{k}iμ{k}∥∥ ∥∥22. To solve (24) we carry out a regular L2-3D-Var minimization of the form (4) (24b) x{k+1}i := argminx 12∥∥x−xbi∥∥2B−1i+12∥∥H(x)−y{k}i∥∥2R{k}−1i, but with modified observations and a scaled covariance matrix: (24c) y{k}i:=yi+R1/2i⎛⎝z{k}i+λ{k}iμ{k}⎞⎠,R{k}i:=Ri/μ{k},
4. Fix , , and , and solve

 (25a) z{k+1}i:=argminz∥z∥1+μ{k}2∥∥ ∥∥d{k+1}i−z−λ{k}iμ{k}∥∥ ∥∥22, where (25b) d{k+1}i:=R−1/2i[H(x{k+1}i)−yi]. The above minimization subproblem can be solved by using the shrinkage procedure defined in Algorithm 1: (25c) z{k+1}i:=\textscLONEShrinkage(μ{k};d{k+1}i;λ{k}i).
5. Update :

 λ{k+1}i:=λ{k}i−d{k+1}i+z{k+1}i.
6. Update :

 μ{k+1}:=ρμ{k},ρ>1.

### 3.2 Huber-norm 3D-Var

Using the norm throughout spoils the smoothness properties near the mean. The smoothness property of norm near the mean is highly desirable and can be retained by using Huber norm. The Huber norm treats the errors using norm in the vicinity of the mean, and using norm far from the mean [16]. From a statistical perspective, small scaled observation errors (17) are assumed to have independent standard normal distributions, while large scaled observation errors are assumed to have independent Laplace distributions (18). The good properties of the traditional 3D-Var are preserved when the data is reasonable, but outliers are treated with the more robust norm.

The 3D-Var problem in the Huber norm reads:

 (26) minJ(xi,zi)=12∥xi−xbi∥2B−1i+12∥zi∥\textschub subject tozi=R−1/2i[H(xi)−yi],

where

 (27) ∥zi∥\textschub=m∑ℓ=1g\textschub(zi,ℓ),g\textschub(a)={a2/2,|a|≤τ|a|−1/2,|a|>τ.

Here is the -th entry of the vector . The threshold

represents the number of standard deviations after which we switch from

to norm.

#### 3.2.1 ADMM solution for Huber-3D-Var

The Huber norm minimization problem (26) can be solved using the ADMM framework in a manner similar to the one described in Section 3.1. The only difference is in Step 3: after fixing , , and we do not solve the -norm problem (25a). Rather, we solve the Huber-norm problem

 (28a) z{k+1}i:=argminz∥z∥\textschub+μ{k}2∥∥ ∥∥d{k+1}i−z−λ{k}iμ{k}∥∥ ∥∥22, where d{k+1}i is defined in (25b). The closed-form solution of (28a) is given by the Huber shrinkage procedure defined in Algorithm 2:

#### 3.2.2 Half-quadratic optimization solution for Huber-3D-Var

An alternative approach to carry out the minimization (26) is to reformulate it as a multiplicative half-quadratic optimization problem [21]. We construct an augmented cost function which involves the auxiliary variable

 (29a) where ψ is a dual potential function determined by using the theory of convex conjugacy such that (29b) J(xi)=minuiA(xi,ui).

The minimizer of is calculated using alternate minimization. Let the solution at the end of iteration read . At iteration we proceed as follows.

1. First calculate where is fixed such, that

 A(x{k}i,u{k+1}i)≤A(x{k}i,u),∀u.

This amounts to finding according to

 (30a) u{k+1}i=σ(R−1/2i[H(x{k}i)−yi]) (where the function σ is applied to each element of the argument vector). For Huber regularization (30b) σ(a)={1,|a|≤τ;τ/|a|,|a|>τ.
2. Next, calculate where is fixed, such that

 A(x{k+1}i,u{k+1}i)≤A(x,u{k+1}i),∀x.

This is achieved by solving the following minimization problem:

 x{k+1}i = argminxi  12∥xi−xbi∥2B−1i+12m∑ℓ=1(u{k+1}i,ℓz2i,ℓ/2) = argminxi  12∥xi−xbi∥2B−1i +12(R−1/2i[H(xi)−yi])Tdiag(u{k+1}i/2)(R−1/2i[H(xi)−yi]).

This minimization problem is equivalent to solving a regular -3D-Var problem

 (31)

with a modified observation error covariance

 (32) R{k+1}i:=R1/2i⋅diag(2/u{k+1}i)⋅R1/2i.

For the case where is diagonal equation (32) divides each variance by . The smaller the entry is the lower the weight given to observation in the 3D-Var data assimilation procedure becomes.

## 4 Robust 4D-Var

### 4.1 L1-4D-Var

Given the background value of the initial state , the covariance of the initial background errors , the observations at , and the corresponding observation error covariances (), the -4D-Var problem looks for the MAP estimate of the true initial conditions by solving the following constrained optimization problem:

 (33) minx0 J(x0) :=12∥x0−xb0∥2B−10+12N∑i=1∥zi∥1subject to:  zi=R−1/2i[H(xi)−yi],i=1,2,⋯,N,  xi=Mi−1,i(xi−1),i=1,2,⋯,N.

The augmented Lagrangian associated with (33) reads:

 L = 12∥∥x0−xb0∥∥2B−10+12N∑i=1∥zi∥1 −N∑i=1θTi(xi−Mi−1,i(xi−1))+ν2N∑i=1∥xi−Mi−1,i(xi−1)∥2P−1i −N∑i=1λTi(R−1/2i[H(xi)−yi]−zi)+μ2N∑i=1∥∥R−1/2i[H(xi)−yi]−zi∥∥22,

where ’s are model error scaling matrices.

Using the identity (22) the Lagrangian (4.1) becomes

 L = 12∥∥x0−xb0∥∥2B−10+12N∑i=1∥zi∥1 −N∑i=1θTi(xi−Mi−1,i(xi−1))+ν2N∑i=1∥xi−Mi−1,i(xi−1)∥2P−1i +μ2N∑i=1∥∥R−1/2i[H(xi)−yi]−zi−λi/μ∥∥22−12μN∑i=1∥λi∥22

The problem (33) can be solved by alternating minimization of the Lagrangian (4.1), as follows:

1. Initialize .

2. Run the forward model starting from to obtain , .

3. Set , , for , .

4. Fix and , and minimize the Lagrangian to solve for :

 x{k+1}0 := argmin L, L = +ν2N∑i=1∥xi−Mi−1,i(xi−1)∥2P−1i +μ{k}2N∑i=1∥∥R−1/2i[H(xi)−yi]−z{k}i−λ{k}i/μ{k}∥∥22

This is equivalent to solving the traditional -4D-Var problem

 x{k+1}0:=argmin J(x0) = 12∥x0−xb0∥2B−10+12N∑i=1∥H(xi)−˜yi∥2˜R−1i subject toxi+1 = Mi,i+1(xi),i=0,1,2,⋯,N−1,

with the modified observation vectors and observation error covariances (24c).

5. Run the forward model starting from to obtain , .

6. Fix and , and find , as

 z{k+1}