Spectral estimates for saddle point matrices arising in weak constraint four-dimensional variational data assimilation

08/21/2019 ∙ by Ieva Daužickaitė, et al. ∙ University of Reading 0

We consider the large-sparse symmetric linear systems of equations that arise in the solution of weak constraint four-dimensional variational data assimilation. These systems can be written as saddle point systems with a 3x3 block structure but block eliminations can be performed to reduce them to saddle point systems with a 2x2 block structure, or further to symmetric positive definite systems. In this paper, we analyse how sensitive the spectra of these matrices are to the number of observations of the underlying dynamical system. We also obtain bounds on the eigenvalues of the matrices. Numerical experiments are used to confirm the theoretical analysis and bounds.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Data assimilation estimates the state of a dynamical system by combining observations of the system with a prior estimate. The latter is called a background state and it is usually an output of a numerical model that simulates the dynamics of the system. The impact that the observations and the background state have on the state estimate depends on their errors whose statistical properties we assume are known. Data assimilation is used to produce initial conditions in numerical weather prediction (NWP) [22, 39], as well as other areas, e.g. flood forecasting [7], research into atmospheric composition [11], and neuroscience [27]. In operational applications, the process is made more challenging by the size of the system, e.g. the numerical model may be operating on state variables and observations may be incorporated [28, 23]. Moreover, there is usually a constraint on the time that can be spent on calculations.

The solution, called the analysis, is obtained by combining the observations and the background state in an optimal way. One approach is to solve a weighted least-squares problem, which requires minimising a cost function. An active research topic in this area is the weak constraint four-dimensional variational (4D-Var) data assimilation method [42, 43, 10, 5, 13, 16, 14]. It is employed in the search for states of the system over a time period, called the assimilation window. This method uses a cost function that is formulated under the assumption that the numerical model is not perfect and penalises the weighted discrepancy between the analysis and the observations, the analysis and the background state, and the difference between the analysis and the trajectory given by integrating the dynamical model.

Effective minimisation techniques need evaluations of the cost function and its gradient that involve expensive operations with the dynamical model and its linearised variant. Such approaches are impractical in operational applications. One way to approximate the minimum of the weak constraint 4D-Var is to use a Gauss-Newton method [17] that requires minimising a series of linearised quadratic cost functions and using the minima to update the state estimate. The state estimate update is found by solving sparse symmetric linear systems using an iterative method [33].

To increase the potential of using parallel computations when computing the state update with weak constraint 4D-Var, Fisher and Gürol [13] introduced a symmetric block saddle point formulation. These resulting large linear systems are solved using Krylov subspace solvers [14, 33, 3], whose convergence is affected by the spectra of the coefficient matrices. We derive bounds for the eigenvalues of the block matrix using the work of Rusten and Winther [32]. Also, inspired by the practice in solving saddle point systems that arise from interior point methods [18, 26], we reduce the block system to a block saddle point formulation and derive eigenvalue bounds for this system. We also consider a block formulation with a positive definite coefficient matrix, which corresponds to the standard method [42, 43]. Some of the blocks in the and block saddle point coefficient matrices, and the block positive definite coefficient matrix depend on the available observations of the dynamical system. We present a novel examination of how adding new observations influence the spectrum of these coefficient matrices.

In Section 2, we formulate the data assimilation problem and introduce weak constraint 4D-Var with the block and block saddle point formulations and the block symmetric positive definite formulation. Eigenvalue bounds for the saddle point and positive definite matrices and results on how the extreme eigenvalues and the bounds depend on the number of observations are presented in Section 3. Section 4 illustrates the theoretical considerations using numerical examples, and concluding remarks and future directions are presented in Section 5.

2 Variational Data Assimilation

The state of the dynamical system of interest at times

is represented by the state vectors

with . A nonlinear model that is assumed to have errors describes the transition from the state at time to the state at time , i.e.

(1)

where represents the model error at time . It is further assumed that the model errors are Gaussian with zero mean and covariance matrix

, and that they are uncorrelated in time, i.e. there is no relationship between the model errors at different times. In NWP, the model comes from the discretization of the partial differential equations that describe the flow and thermodynamics of a stratified multiphase fluid in interaction with radiation

[22]. It also involves parameters that characterize processes arising at spatial scales that are smaller than the distance between the grid points [31]. Errors due to the discretization of the equations, errors in the boundary conditions, inaccurate parameters etc. are components of the model error [19].

The background information about the state at time is denoted by . usually comes from a previous short range forecast and is chosen to be the first guess of the state. It is assumed that the background term has errors that are Gaussian with zero mean and covariance matrix .

Observations of the dynamical system at time are given by . In NWP, there are considerably fewer observations than state variables, i.e. . Also, there may be indirect observations of the variables in the state vector and a comparison is obtained by mapping the state variables to the observation space using a nonlinear operator . For example, satellite observations used in NWP provide top of the atmosphere radiance data, whereas the model operates on different meteorological variables, e.g. temperature, pressure, wind etc. [1] Hence, values of meteorological variables are transformed into top of the atmosphere radiances in order to compare the model output with the observations. In this case, the operator is nonlinear and complicated. Approximations made when mapping the state variables to the observation space, different spatial and temporal scales between the model and some observations (observations may give information at a finer resolution than the model), and pre-processing, or quality control, of the observations (see, e.g. Section 5.8 of Kalnay [22]) comprise the representativity error [21]. The observation error is made up of the representativity error combined with the error arising due to the limited precision of the measurements. It is assumed to be Gaussian with zero mean and covariance matrix . The observation errors are assumed to be uncorrelated in time [23].

2.1 Weak constraint 4D-Var

In weak constraint 4D-Var, the analysis is obtained by minimising the following nonlinear cost function

(2)

This cost function is referred to as the state control variable formulation. Here the control variables are defined as the variables with respect to which the cost function is minimised, i.e. in (2). Choosing different control variables and obtaining different formulations of the cost function is possible [42]. If the model is assumed to have no errors (i.e. ), the cost function simplifies as the last term in (2) is removed; this is called strong constraint 4D-Var. Rejecting this perfect model assumption and using weak constraint 4D-Var may lead to a better analysis [43].

Iterative gradient-based optimisation methods are used in practical data assimilation [40, 23]. These require the cost function and its gradient to be evaluated at every iteration. In operational applications, integrating the model over the assimilation window to evaluate the cost function is computationally expensive. The gradient is obtained by the adjoint method (see, e.g., Section 2 of Lawless [23] and Section 3.2 of Talagrand [40] for an introduction), which is a few times more computationally expensive than evaluating the cost function. This makes the minimisation of the nonlinear weak-constraint 4D-Var cost function impractical. Hence, approximations have to be made. We introduce such an approach in the next section.

2.2 Incremental formulation

Minimisation of the 4D-Var cost function in an operational setting is made feasible by employing an iterative Gauss-Newton method, as first proposed by Courtier et al. [8] for the strong constraint 4D-Var. In this approach, the solution of the weak constraint 4D-Var is approximated by solving a sequence of linearised problems, i.e. the -th approximation of the state is

(3)

where is obtained as the minimiser of the linearised cost function

(4)

where , , (as in (1)) and and are the model and the observation operator , respectively, linearised at . Minimisation of (4) is called the inner loop. The -th outer loop consists of updating the approximation of the state (3), linearising the model and the observation operator , and computing the values , and . This cost function is quadratic, which allows the use of effective minimisation techniques, such as conjugate gradients (see Chapter 5 of Nocedal and Wright [29]). In NWP, the computational cost of minimising the 4D-Var cost function is reduced by using a version of the inner loop cost function that is defined for a model with lower spatial resolution, i.e. on a coarser grid [12]. We do not consider such an approach in the subsequent work, because our results on the change of the spectra of the coefficient matrices and the bounds (that are introduced in the following section) hold for models with any spatial resolution.

For ease of notation, we introduce the following four-dimensional (in the sense that they contain information in space and time) vectors and matrices. These vectors and matrices are indicated in bold.

where and . We also define the matrices

where

is the identity matrix,

and . We define the block diagonal covariance matrices

and . The state update (3) may then be written as

and the quadratic cost function (4) becomes

(5)

where . We omit the superscript for the outer iteration in the subsequent discussions. The minimum of (5) can be found by solving linear systems. We discuss different formulations of these in the next three subsections.

2.2.1 block saddle point formulation

In pursuance of exploiting parallel computations in data assimilation, Fisher and Gürol [13] proposed obtaining the state increment by solving a saddle point system (see also Freitag and Green [14]). New variables are introduced

(6)
(7)

The gradient of the cost function (5) with respect to provides the optimality constraint

(8)

Multiplying (6) by and (7) by together with (8), yields a coupled linear system of equations:

(9)

where the coefficient matrix is given by

(10)

is a sparse symmetric indefinite saddle point matrix that has a block form. Such systems are explored in the optimization literature [18, 25, 26]. When iteratively solving these systems, it is usually assumed that calculations involving the blocks on the diagonal are computationally expensive, while the off-diagonal blocks are cheap to apply and easily approximated. However, in our application, operations with the diagonal blocks are relatively cheap and the off-diagonal blocks are computationally expensive, particularly because of the integrations of the model and its adjoint in and .

Recall that the sizes of the blocks , and depend on the number of observations . Thus, the size of and possibly some of its characteristics are also affected by . The saddle point systems that arise in different outer loops vary in the right hand sides and the linearisation states of and .

Because of the memory requirements of sparse direct solvers, they cannot be used to solve the block saddle point systems that arise in an operational setting. Iterative solvers (such as MINRES [30], GMRES [34]) need to be used. These Krylov subspace methods require matrix-vector products with at each iteration. Note that the matrix-vector product , , involves multiplying and by , and by , and and by . These may be performed in parallel. Furthermore, multiplication of each component of each block matrix with the respective part of the vector can be performed in parallel. The possibility of multiplying a vector with the blocks in and in parallel is particularly attractive, because the expensive operations involving the linearised model and its adjoint can be done at the same time for every .

2.2.2 block saddle point formulation

The saddle point systems with block coefficient matrices that arise from interior point methods are often reduced to block systems [18, 26]. We now explore using this approach in the weak constraint 4D-Var setting.

Multiplying equation (6) by and eliminating in (8) gives the following equivalent system of equations

(11)

where

(12)

The reduced matrix is a sparse symmetric indefinite saddle point matrix with a block form. Unlike the block matrix , its size is independent of the number of observations. involves the matrix , which is usually available in data assimilation applications. The computationally most expensive blocks and are again the off-diagonal blocks.

Solving (11) in parallel might be less appealing than (9): for a Krylov subspace method, the block need not be formed separately, i.e. only operators to perform the matrix-vector products with , and need to be stored. Hence, a matrix-vector product , , requires multiplying and by , and by (which may be done in parallel) and subsequently by , followed by by . Hence, the cost of matrix-vector products for the and block formulations differs in the cost of matrix-vector products with and , and the block formulation requires some sequential calculations. However, notice that the expensive calculations that involve applying the operators and (the linearised model and its adjoint) can still be performed in parallel.

2.2.3 block formulation

The block system can be further reduced to a block system, that is, to the standard formulation (see, e.g., Trémolet [42] and A. El-Said [10] for a more detailed consideration):

Observe that the coefficient matrix

(13)

is the negative Schur complement of in . The matrix is block tridiagonal and symmetric positive definite, hence the conjugate gradient method by Hestenes and Stiefel [20] can be used. The computations with the linearised model in at every time step can again be performed in parallel. However, the adjoint of the linearised model in can only be applied after the computations with the model are finished, thus limiting the potential for parallelism.

3 Eigenvalues of the saddle point formulations

The rate of convergence of Krylov subspace iterative solvers for symmetric systems depends on the spectrum of the coefficient matrix (see, for example, Section 10 in the survey paper [3] and Lectures 35 and 38 in the textbook [41] for a discussion). Simoncini and Szyld [37] have shown that any eigenvalues of the saddle point systems that lie close to the origin can cause the iterative solver MINRES to stagnate for a number of iterations while the rate of convergence can improve if the eigenvalues are clustered.

In the following, we examine how the eigenvalues of the , and block matrices , , and change when new observations are added. This is done by considering the shift in the extreme eigenvalues of these matrices, that is the smallest and largest positive and negative eigenvalues. We then present bounds for the eigenvalues of these matrices. The bounds for the spectrum of are obtained by exploiting the earlier work of Rusten and Winther [32]. We derive bounds for the intervals that contain the spectra of and .

3.1 Preliminaries

In order to determine how changing the number of observations influences the spectra of , , and

, we explore the extreme singular values and eigenvalues of some blocks in

, and . We state two theorems that we will require. Here we employ the notation to denote the -th largest eigenvalue of a matrix and subscripts and are used to denote the smallest and largest eigenvalues, respectively.

Theorem 1 (See Section 8.1.2 of Golub and Van Loan [15]).

If and are Hermitian matrices, then

.

Theorem 2 (Cauchy’s Interlace Theorem, see Theorem 4.2 in Chapter 4 of Stewart and Sun [38]).

If is an Hermitian matrix and is a principal submatrix of (a matrix obtained by eliminating a row and a corresponding column of ), then

.

In the following lemmas we describe how the smallest and largest singular values of (here and are as defined in Section 2.2) and the extreme eigenvalues of the observation error covariance matrix change when new observations are introduced. The same is done for the largest eigenvalues of assuming that is diagonal. In these lemmas the subscript denotes the number of available observations and the subscript indicates that a new observation is added to the system with observations, i.e. matrices and correspond to a system with observations and and correspond to the system with an additional observation. We write and , where , , and correspond to the new observation.

Lemma 1.

Let and be the smallest and largest singular values of , and and be the smallest and largest singular values of . Then

i.e. the smallest and largest singular values of increase or are unchanged when new observations are added.

Proof.

We consider the eigenvalues of and , which are the squares of the singular values of and , respectively (see Section 2.4.2 of Golub and Van Loan [15]). We can write

Then by Theorem 1,

and since is a rank 1 symmetric positive semidefinite matrix, .

The proof for the largest singular values is analogous. ∎

Lemma 2.

Consider the observation error covariance matrices and . Then

i.e. the largest (respectively, smallest) eigenvalue of increases (respectively, decreases), or is unchanged when new observations are introduced.

Proof.

When adding an observation, a row and a corresponding column are appended to while the other entries of are unchanged. The result is immediate by applying Theorem 2. ∎

Lemma 3.

If the observation errors are uncorrelated, i.e. is diagonal, then

i.e. for diagonal , the largest eigenvalue of increases or is unchanged when new observations are introduced.

Proof.

The proof is similar to that of Lemma 1. For diagonal :

Then

Hence, by Theorem 1,

and since the result is proved. ∎

Notation

In the following, we use the notation given in Table 1 for the eigenvalues of , and , and the eigenvalues and singular values of the blocks within them. We use subscripts and to denote the smallest and largest eigenvalues or singular values, respectively, and denote the smallest nonzero singular value of . In addition, denotes the norm.

Matrix
Eigenvalue
Matrix
Singular value
Table 1: Notation for the eigenvalues and singular values.

We also use

(14)
(15)

3.2 Bounds for the block formulation

To determine the numbers of positive and negative eigenvalues of given in (10), we write as a congruence transformation

where is the identity matrix. Thus, by Sylvester’s law of inertia (see Section 8.1.5 of Golub and Van Loan [15]), and have the same inertia, i.e. the same number of positive, negative, and zero eigenvalues. Since the blocks , and are symmetric positive definite matrices, has positive and negative eigenvalues. In the following theorem, we explore how the extreme eigenvalues of change when new observations are introduced.

Theorem 3.

The smallest and largest negative eigenvalues of either move away from the origin or are unchanged when new observations are introduced. The same holds for the largest positive eigenvalue, while the smallest positive eigenvalue approaches the origin or is unchanged.

Proof.

Let denote where . To account for an additional observation, a row and a corresponding column is added to , hence is a principal submatrix of . Let

be the eigenvalues of , and

be the eigenvalues of . Then by Theorem 2:

To obtain information on not only how the eigenvalues of change because of new observations, but also on where the eigenvalues lie when the number of observations is fixed, we formulate intervals for the negative and positive eigenvalues of .

Theorem 4.

The negative eigenvalues of lie in the interval

(16)

and the positive eigenvalues lie in the interval

(17)

where , and are defined in (14), (15), and Table 1.

Proof.

Lemma 2.1 of Rusten and Winther [32] gives eigenvalue intervals for matrices of the form . Applying these intervals in the case and yields the required results. ∎

We further present two corollaries that describe how the bounds in Theorem 4 change if additional observations are introduced and conclude that the change of the bounds is consistent with the results in Theorem 3.

Corollary 1.

The interval for the positive eigenvalues of in (17) either expands or is unchanged when new observations are added.

Proof.

First, consider the positive upper bound . By Lemma 1, increases or is unchanged when additional observations are included. If , the same holds for (by Lemma 2). If , changing the number of observations does not affect . Hence, the positive upper bound increases or is unchanged.

The positive lower bound is unaltered if . If , the bound decreases or is unchanged by Lemma 2. ∎

Corollary 2.

If , the upper bound for the negative eigenvalues of in (16) is either unchanged or moves away from the origin when new observations are added. If , the same holds for the lower bound for negative eigenvalues in (16).

Proof.

The results follow from the facts that and do not change if observations are added, whereas and increase or are unchanged by Lemma 1. ∎

If or , it is unclear how the interval for the negative eigenvalues in (16) changes, because can increase, decrease or be unchanged, and both and can increase or be unchanged.

3.3 Bounds for the block formulation

given in (12) is equal to the following congruence transformation

where is the identity matrix. Then by Sylvester’s law, has positive and negative eigenvalues. The change of the extreme negative and positive eigenvalues of due to the additional observations is analysed in the subsequent theorem. However, the result holds only in the case of uncorrelated observation errors, unlike the general analysis for in Theorem 3.

Theorem 5.

If the observation errors are uncorrelated, i.e. is diagonal, then the smallest and largest negative eigenvalues of either move away from the origin or are unchanged when new observations are added. Contrarily, the smallest and largest positive eigenvalues of approach the origin or are unchanged.

Proof.

Matrices and do not depend on the number of observations. In Lemma 3, we have shown that for diagonal . Hence, when denotes with , we can write

where has negative and zero eigenvalues. Let

be the eigenvalues of , and

be the eigenvalues of . By Theorem 1,