Efficient computation of matrix-vector products with full observation weighting matrices in data assimilation

09/05/2021
by   Guannan Hu, et al.
University of Reading
0

Recent studies have demonstrated improved skill in numerical weather prediction via the use of spatially correlated observation error covariance information in data assimilation systems. In this case, the observation weighting matrices (inverse error covariance matrices) used in the assimilation may be full matrices rather than diagonal. Thus, the computation of matrix-vector products in the variational minimization problem may be very time-consuming, particularly if the parallel computation of the matrix-vector product requires a high degree of communication between processing elements. Hence, we introduce a well-known numerical approximation method, called the fast multipole method (FMM), to speed up the matrix-vector multiplications in data assimilation. We explore a particular type of FMM that uses a singular value decomposition (SVD-FMM) and adjust it to suit our new application in data assimilation. By approximating a large part of the computation of the matrix-vector product, the SVD-FMM technique greatly reduces the computational complexity compared with the standard approach. We develop a novel possible parallelization scheme of the SVD-FMM for our application, which can reduce the communication costs. We investigate the accuracy of the SVD-FMM technique in several numerical experiments: we first assess the accuracy using covariance matrices that are created using different correlation functions and lengthscales; then investigate the impact of reconditioning the covariance matrices on the accuracy; and finally examine the feasibility of the technique in the presence of missing observations. We also provide theoretical explanations for some numerical results. Our results show that the SVD-FMM technique has potential as an efficient technique for assimilation of a large volume of observational data within a short time interval.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

05/27/2021

A generalization of the randomized singular value decomposition

The randomized singular value decomposition (SVD) is a popular and effec...
05/15/2020

Batched computation of the singular value decompositions of order two by the AVX-512 vectorization

In this paper a vectorized algorithm for simultaneously computing up to ...
09/29/2020

What if Neural Networks had SVDs?

Various Neural Networks employ time-consuming matrix operations like mat...
01/31/2020

Generalized Visual Information Analysis via Tensorial Algebra

High order data is modeled using matrices whose entries are numerical ar...
10/11/2019

Background Error Covariance Iterative Updating with Invariant Observation Measures for Data Assimilation

In order to leverage the information embedded in the background state an...
03/21/2020

A Preconditioning Technique for Computing Functions of Triangular Matrices

We propose a simple preconditioning technique that, if incorporated into...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In variational data assimilation (e.g., Lorenc et al., 2000; Rawlins et al., 2007), a nonlinear least squares problem is solved, where observations and model forecasts are blended, taking account of their uncertainties. The minimization procedure involves time-consuming computations of large matrix-vector products. In this paper we focus on the matrix-vector products involving the observations. These take the form , where is the inverse of the observation error covariance matrix and is the observation-minus-model departure vector (see section 2).

In some practical numerical weather prediction applications, observation errors are assumed to be uncorrelated, resulting in the matrix being diagonal. This reduces the number of operations required to compute matrix-vector products in the minimization, and is a pragmatic strategy when the characteristics of the observation uncertainty are not well understood (e.g., Liu and Rabier, 2003). However, a number of idealized and operational studies have shown that there are significant benefits to treating full observation error covariance matrices in terms of analysis information content, analysis accuracy and forecast skill, even when knowledge of the observation error correlations is only approximate (e.g., Healy and White, 2005; Stewart et al., 2008, 2013; Weston et al., 2014; Simonin et al., 2019; Bédard and Buehner, 2020). In particular, implementation of spatial observation error correlations modifies the length-scales of the observation increments computed by the assimilation (e.g., Fowler et al., 2018; Rainwater et al., 2015; Simonin et al., 2019), which may be especially important for multiscale systems such as convection-permitting numerical weather prediction and reanalyses (e.g., Dance et al., 2019; Hu and Franzke, 2020) or coupled ocean-atmosphere systems (e.g., Hu and Franzke, 2017)

. Furthermore, practical methods have been developed to estimate the observation uncertainty characteristics for a range of observation types (see the reviews by

Janjić et al. (2018) and Tandeo et al. (2020)). For example, Doppler radar radial winds, geostationary satellite data and atmospheric motion vectors (AMVs) have been shown to exhibit strong spatial error correlations (Waller et al., 2016a, b; Cordoba et al., 2017; Waller et al., 2019; Honda et al., 2018). We note that in practical applications, the matrix is typically treated as block diagonal with one block per observation type (for a given time). The observation errors between different types of observations are assumed to be uncorrelated.

To give an idea of the expected size of the spatially correlated observation error covariance matrices, we consider geostationary satellite observations from the SEVIRI (Spinning Enhanced Visible and Infrared Imager) instrument (Waller et al., 2016b; Michel, 2018; Schmetz et al., 2002). SEVIRI measures top of the atmosphere radiances in 12 channels, with a 15-minute repeat cycle and at approximately 3 km spatial resolution (excluding the high resolution visible (HRV) channel; Waller et al., 2016b; Schmetz et al., 2002). Each image consists of around pixels for the thermal infra-red and solar channels (Schmetz et al., 2002). Hence, there are a vast number of SEVIRI observations, even for a regional domain. For instance, the number of SEVIRI observations within the domain covered by the operational AROME model from Météo-France is around (Seity et al., 2011). In operational assimilation applications, to avoid treating spatially correlated observation errors, spatial thinning of SEVIRI observations is typically applied and this reduces the number of observations by several orders of magnitude, but also reduces their benefits in improving forecast skill (Waller et al., 2016b; Michel, 2018). The next generation of meteorological satellites will produce observations with even higher spatial resolutions (WMO, 2017, Table 1). For example, the Flexible Combined Imager (FCI) on the Meteosat Third Generation (MTG) satellite and the Advanced Baseline Imager (ABI) on the Geostationary Operational Environmental Satellite-R (Schmit et al., 2017) both take measurements at approximately 0.5 - 2 km spatial resolution, and the Advanced Geosynchronous Radiation Imager (AGRI) on the Fengyu-4 geostationary meteorological satellite produces observations at a resolution from 1-4 km (Yang et al., 2017).

Accounting for spatially correlated error statistics in the data assimilation algorithm has potential to increase the computational cost for several reasons: (1) the computation of large matrix-vector products using parallel computing techniques, (2) the need to compute products using the inverse covariance matrix which may be different in each assimilation cycle, (3) changes to the convergence behaviour of the minimization procedure. We now review these issues in more detail.

The first issue is that the parallel computation of a large matrix-vector product with observation data distributed across multiple processing elements (PEs) may become expensive due to excessive communications between PEs and load imbalance overheads (Deng, 2012). Different partitions of the matrix will lead to different parallelization schemes, which will be described in section 2. In general, in order to reduce the time spent on communication operations, it is necessary to reduce the number and/or size of messages transferred. In the context of data assimilation, much of this communication can be avoided if all the observations with mutually correlated errors are assigned to one PE, as in Simonin et al. (2019). However, this relies on the data volume of observations with mutually correlated errors being small enough for the product to be calculated on one node. It is an open question how to implement computations for larger data volumes with distributed data. While our paper is focused on parallel implementations for variational assimilation, studies such as Anderson (2003) and Nino-Ruiz et al. (2019) address parallel implementations for ensemble methods.

The second issue is that there is a need to use the inverse observation error covariance matrix () in the computations. The most commonly used observation uncertainty diagnosis techniques provide an estimate of the observation error covariance matrix itself rather than its inverse (Desroziers et al., 2005). Since the observation distribution changes each assimilation cycle (due to quality control etc.) there is a different inverse matrix each cycle. Simonin et al. (2019) deal with this by using a Cholesky decomposition method (Golub and Van Loan, 1996) which avoids the need to compute the inverse covariance matrix directly and is applicable to any form of error covariance matrix. Guillet et al. (2019) model the inverse of a spatially correlated observation error covariance matrix directly using a diffusion operator and fast unstructured meshing techniques. However, this method only deals with spatial error correlations and it is unclear if this approach is also suitable when spatial error correlations and inter-channel error correlations are combined.

The third issue is that the use of correlated observation error statistics changes the convergence behaviour of the minimization procedure. Indeed, the pre-operational experiments of Weston et al. (2014) showed problems with the minimization that were improved by reconditioning the observation error covariance matrix. The convergence of the minimization procedure has been further studied by Tabeart et al. (2018, 2020a, 2021)

who found that the minimum eigenvalue of the observation error covariance matrix is a key parameter governing the speed of convergence. Hence, the use of reconditioning methods is common in operational applications with correlated observation error statistics

(e.g., Bormann et al., 2016; Campbell et al., 2017; Tabeart et al., 2020a, b).

The aim of this paper is to carry out an investigation of a method to accelerate parallel matrix-vector products with distributed observational data, using a novel application of a well-known numerical approximation algorithm, the fast multipole method (FMM; Rokhlin, 1985; Greengard and Rokhlin, 1997). We explore a particular type of FMM that uses a singular value decomposition (SVD) (Gimbutas and Rokhlin, 2003). We call this method the SVD-FMM. The key idea of the SVD-FMM is to split up the computation of the matrix-vector product into separate near-field and far-field calculations. The near-field calculations are done by a standard matrix-vector multiplication of a local sub-matrix and sub-vector. The far-field calculations are carried out using the singular values and singular vectors of the sub-matrices of . The sub-matrices and sub-vectors are determined by selecting specific rows and columns of corresponding to a partition of observations in the domain. The number of singular vectors employed in the calculation is chosen by the user. We will show that usually only a small number of singular vectors is needed to obtain a good accuracy (section 5). The SVD-FMM technique reduces the number of floating point operations required compared with the standard approach for matrix-vector multiplication. Additionally, we develop a novel possible parallel algorithm for the SVD-FMM for our application in data assimilation, which can greatly reduce the costs of communication between PEs. The SVD-FMM allows us to assimilate a large volume of observational data in a short time interval, and has the potential to be used in practical applications.

In this initial investigation, our numerical results focus on evaluating the accuracy of the SVD-FMM. Thus our experiments are carried out for an idealized problem, using serial, rather than parallel computing. Nevertheless, we do compare the efficiency of the proposed parallelization scheme with different parallel formulations of standard matrix-vector multiplications in terms of communication costs. In our current implementation, the method needs to be applied after we obtain a representation of the matrix . In order to have a clear focus solely on computing matrix-vector products we do not address the computation of the inverse observation error covariance matrix. Instead, we assume that the inverse observation error covariance is already known.

In our experiments we show that the SVD-FMM can work well with the inverses of a variety of covariance matrices. In particular, we apply the SVD-FMM to the inverses of covariance matrices created using different correlation functions and lengthscales. We also use the reconditioning methods of Tabeart et al. (2020b) together with the SVD-FMM to gain insight into how the accuracy of the SVD-FMM should change with different levels of reconditioning. In practice, the observation distribution varies each assimilation cycle due to factors such as quality control and the removal of cloudy satellite radiance observations. Therefore, we further carry out some experiments to demonstrate that the SVD-FMM is feasible even if there are some missing observations.

The rest of this paper is organised as follows: We provide the parallel formulations of standard matrix-vector multiplication in section 2. In section 3 we present our novel algorithm for the SVD-FMM and compare the complexity and the communication costs between the SVD-FMM and standard, parallel matrix-vector multiplication. In section 4 we explain our experimental design for our idealized experiments. In particular, we describe the generation of observations and observation error covariance matrices as well as the reconditioning methods. In section 5 we show the results of our numerical experiments with varying correlation functions, length scales, reconditioning methods and condition numbers. We also show how the results change with missing observations. Finally we give a summary in section 6 and conclude that our proposed algorithm has potential for use in operational data assimilation for fast computation of large matrix-vector products.

2 Parallelization of standard matrix-vector multiplication

In this section we describe three distinct standard parallel formulations for computing large matrix-vector products see Grama et al., 2003, Section 8.1; Deng, 2012, Section 6.2.1. We will also discuss how to exploit the symmetric structure of for parallelization (Geus and Röllin, 2001). These matrix-vector products arise in the solution of the variational minimization problem in data assimilation in the form

(1)

where denotes the inverse of the observation error covariance matrix, denotes the observation-minus-model departure vector, and denotes the result of the multiplication. For the purposes of this description, we assume that the matrix and vector are already known.

To see how matrix-vector products in the form of Eq. (1) arise in data assimilation, we consider the observation penalty term of the variational data assimilation cost function. This is given by multiplying the inverse observation error covariance matrix by the observation-minus-model differences

(2)

where denotes the observation vector, denotes model state and denotes the observation operator that maps model state to the observation (; e.g., Lorenc et al., 2000; Rawlins et al., 2007; Simonin et al., 2019; Nichols, 2010). To solve the variational minimization problem, the gradient of Eq. (2) is needed (e.g., Simonin et al., 2019)

(3)

Thus matrix-vector products of the form of Eq. (1) arise in both the computation of the cost function and its gradient, with and .

The parallelization schemes for computing Eq. (1) start with the distribution of observations over PEs. We consider a simple domain decomposition, in which the observations are distributed over a number of PEs according to their geophysical locations. This will result in a split of the components of matrix and vector across PEs. We will introduce four different partitions of the matrix (see Fig. 1). Each of them will lead to a unique parallelization scheme. The communication costs of each scheme depend on various parameters, including i) the time to prepare a message for transmission, ii) the time it takes for a message to travel (latency), iii) how many words can traverse per second (bandwidth), iv) how many PEs to communicate with and v) the message size (Grama et al., 2003). Since the first three parameters are determined by the configuration of the parallel machine, we discuss the communication costs for different parallelization schemes using the last two parameters.

Figure 1: Different ways of partitioning matrices (represented by squares) and vectors (represented by bars) for parallel computation of the matrix-vector products. Different colours demonstrate the portions of matrices and vectors that are distributed over four PEs (, , and ).

2.1 Row-wise partitioning

We first consider a row-wise partitioning, in which case each PE stores one row or several rows of and one element or a portion of . Fig. 1(a) illustrates the partitioning using four PEs. In order for each PE to perform its computation, we need to distribute the full vector among all the PEs. This requires an all-to-all broadcast. Based on table 4.1 in Grama et al. (2003), the communication cost of this communication operation is , where is the number of PEs, is the startup time and is the per-word transfer time (Grama et al., 2003). The two time parameters depend on computer architecture and performance. After the communication, each PE multiplies its rows with the vector, which requires on the order of floating point operations.

2.2 Column-wise partitioning

Instead of storing row(s), each PE can store the column(s) of . Fig. 1(b) shows an example of using four PEs. With a column-wise partitioning, each PE can calculate a partial result locally. Each PE multiplies columns of with elements of , which requires floating point operations. After the local computation, we need to perform an all-to-one reduction to sum up the partial results given by each PE, which takes time (Grama et al., 2003, Table 4.1). After the all-to-one reduction only one PE contains the full vector and thus, we may need to redistribute . This requires a scatter operation which takes time (Grama et al., 2003, Table 4.1). Although the column-wise partitioning circumvents the use of an expensive all-to-all broadcast operation, it increases the message size for data-transfer operations. The message size for the all-to-one reduction is , whereas the message size for the all-to-all broadcast used for the row-wise partitioning is . It should be noticed that in data assimilation applications, the matrix is known to be symmetric, which means that the parallel algorithm using column-wise partitioning can also be used with row-wise partitioning.

2.3 Block 2-D partitioning

The row-wise partitioning and column-wise partitioning are 1-D partitioning. We now consider a 2-D partitioning, which distributes the blocks of among PEs. An example of the block 2-D partitioning is shown in Fig. 1(c). Note that the matrix is equally separated by PEs, while the vector is only distributed among PEs that own the diagonal blocks of , i.e., and . The first step is for each PE that stores a portion of to broadcast that portion to the other PEs in the same column, which requires a column-wise one-to-all broadcast. The communication time of this operation is . The next step is to perform a row-wise all-to-one reduction, which requires another times. Grama et al. (2003) showed that computing the matrix-vector product using the block 2-D partitioning of the matrix can be faster than using 1-D partitioning for the same number of PEs. However, a potential problem is that when using the block 2-D partitioning, the vector elements are not distributed among all the PEs and this can result in load imbalance.

2.4 Partitioning for symmetric matrices

A possible partitioning of and taking into account the symmetric structure of is shown in Fig. 1(d). The first step is to exchange the portions of among PEs. This requires an all-to-all broadcast with an average message size of . It should be noticed that not all PEs need to send their portion of to all other PEs. In the case of four PEs, sends data to , and , to and , and to . The second step is to multiply the vector elements with the local part of the matrix. The last step is to exchange the results of local computation using an all-to-one reduction. The average message size for this operation is . In the case of four PEs, collects the results from all other PEs, collects the results from and , and collects the result from . The last PE does not need to collect the results from other PEs. Slightly different parallelization schemes may be used for this kind of partitioning, depending on the computer architecture (Geus and Röllin, 2001). This partitioning saves the storage space of each PE (only half of the matrix is stored) and keeps the load balanced. Furthermore, this partitioning has a very large advantage for sparse matrices, in which case each PE only needs to communicate with neighbouring PEs (Geus and Röllin, 2001).

In the next section, we describe a different approach for calculating large matrix-vector products, that is applicable to any matrix (most useful for full or dense matrices). This approach reduces the computational cost for serial calculations of Eq. (1) and the message size for communication operations.

Figure 2: The number of floating point operations for the direct computation of the matrix-vector product and SVD-FMM with our configuration of boxes and .

3 Application of the fast multipole method (FMM) to data assimilation

The fast multipole method (FMM) was originally presented for the rapid evaluation of all pairwise interactions between a large number of charged particles (Rokhlin, 1985; Greengard and Rokhlin, 1997). The mathematical form of this fast summation is equivalent to Eq. (1). While the direct calculation of Eq. (1) requires work, the FMM needs only operations. In addition, the FMM relies on a hierarchical division of the computational domain, which is well suited for parallel computing. In the classical FMM, the matrix is given by some functions describing pairwise interactions between particles, such as those describing gravitational and electrostatic potentials. In our application, however, the matrix is obtained by inverting the matrix . Therefore, we need a generalized FMM that does not require an underlying analytic function (Gimbutas and Rokhlin, 2003).

3.1 Separation of the matrix-vector product into near- and far-field terms

Since the SVD-FMM is not commonly used in meteorology, we first explain the key idea behind the technique and give a simple example to aid the reader’s understanding, before going into the mathematical details in later subsections. The basic idea of the SVD-FMM is to calculate Eq. (1) in two parts:

(4)

where the column indices of and the indices of are divided into two mutually independent sets denoted by and . The indices are selected according to a partition of observations (described in more detail in section 3.2). The first part of Eq. (4), referred to as the near-field calculation, is computed using a standard matrix-vector multiplication and the second part, referred to as the far-field calculation, is estimated using an approximation. In the SVD-FMM, the far-field calculation is computed using the singular value decomposition (SVD) of . We provide a simple example to aid the reader’s understanding.

Example 3.1

Suppose we have a matrix and a vector given by

A standard matrix-vector multiplication gives

We can also compute by partitioning the problem as in Eq. (4) by letting (the near field) and (the far field), such that

(5)

The singular value decomposition (SVD) of the sub-matrix in the second term is given by

where are the orthonormal left singular vectors, are the scalar singular values, and are the orthonormal right singular vectors. If then we can truncate the SVD to give an approximation for the far field term of Eq. (5) as

More generally, for a larger problem, we can use only the first few singular vectors and singular values to estimate the far-field calculation. If the matrix remains fixed, we can use the same singular vectors and singular values to compute for any vector . This will reduce the cost for calculating Eq. (1) (unless the matrix is sparse). Furthermore, storing the singular vectors and singular values requires less memory than storing the full far-field sub-matrix. Note that if the matrix changes too often, then we would need to perform the SVD again and again, which would be computationally expensive.

The example we have discussed in this section only shows the basic idea of the SVD-FMM. The actual SVD-FMM algorithm is multi-level and more complicated. Before we can address this, we explain the partitioning of the observations that is needed for the multi-level algorithm.

3.2 Partition of observations into a hierarchical structure of nested boxes

In this section, we describe the partition of observations and explain the notation that we use. Note that we use the nomenclature found in the FMM literature (e.g., Gimbutas and Rokhlin, 2003). Suppose that we have observations. These could be located on a latitude-longitude grid, or irregularly distributed. We first find a minimal square (or rectangle) that covers the locations of all of the observations, and then hierarchically subdivide the observation domain into smaller and smaller boxes to generate a box-tree - a hierarchical structure of nested boxes. An example is given in Fig. 3. In the tree structure, level 0 refers to the biggest box that covers the entire observation domain, and level refers to the boxes that are obtained from level by subdividing each box into four smaller boxes of equal size. In our particular example, we choose level 3 as the highest level, which is the minimal number of levels required for the present SVD-FMM approach. The boxes at the highest level are called leaf boxes.

Figure 3: Illustration of the hierarchy of boxes of level 0-3 in our example. Level 0 refers to the box that covers the entire observation domain, while level refers to the boxes that are obtained by equally subdividing each box in level into four. The dots represent the observation locations and the numbers are the box indices. The shaded boxes in level 2 are the boxes that are in the near-field of box 16, while the shaded boxes in level 3 are the boxes that are in the interaction list of box 68. The definitions of near-field and interaction list can be found in the main text.

In practical applications, the number of levels is determined such that the averaged number of observations in the leaf boxes is smaller than a prescribed value. For unevenly distributed observations, an adaptive tree structure could be created (Gimbutas and Rokhlin, 2003), in which smaller boxes are generated only where data is dense.

We number the boxes in all levels except level 0 by a Z-order curve as shown in Fig. 3 (Gargantini, 1982). This facilitates easy formulae for box indexing as will be described later in this subsection. We call boxes neighbours if they are on the same level and connect to each other. In each level the near-field of a box is made of itself and all its neighbours. The far-field of the box is made of everything else. We use and to denote the lists of boxes that are in the near-field and far-field of box , respectively. For example, the near-field of box 16 in Fig. 3 consists of the 9 shaded boxes, namely, , and the far-field of box 16 is the other 7 boxes, i.e., . The smallest number of boxes in a box’s near-field is 4, which occurs when the box is on a corner of the observation domain.

In a box-tree the children of box in level , denoted by , refer to the four boxes in level that are subdivided from itself, such as . In the Z-curve ordering, the indices of the children of box are given by , , , and . The parent of box , denoted by , refers to the box in a coarser level that contains box . For instance, .

The interaction list of box , denoted by , is the set of boxes which are children of the neighbours of box ’s parents and which are in the far-field of box . For example, and . Note that the interaction list of a box in level 2 is exactly the same as its far-field.

3.3 The SVD-FMM algorithm

In this section we describe the multi-level SVD-FMM algorithm (Gimbutas and Rokhlin, 2003). For concreteness we choose a particular configuration of boxes (as shown in Fig. 3) to describe the algorithm, but it can be generalized to other configurations. For each box we can write Eq. (4) as

(6)

where , and denote the sets of observation indices in box , and , respectively, and and denote sub-matrices of that are comprised of specific rows and columns of . We call the target and the source.

The first matrix-vector product in Eq. (6) is calculated from the near-field components. It is computed directly without using any approximation. The second matrix-vector product in Eq. (6), containing the far-field components, is estimated by matrix decomposition, using a multi-level approach exploiting the hierarchical box-tree structure established in section 3.2. Before giving the mathematical details, we first give an overview of the algorithm.

The key idea of the multi-level approach is to perform the SVD of the sub-matrices of given by and use the singular vectors and singular values just obtained to create a multipole expansion, a local expansion and three translation operators, and then use them to estimate the result. The multipole expansion can be interpreted as a short representation of the sub-vector of given by , which is obtained by projecting the components of onto the basis given by singular vectors and hence, is a short vector with components. The local expansion can be considered as the short representation of the sub-vector of given by , which is also a vector with components. It is computed from the multipole expansions of a group of boxes that are in ’s interaction list using the translation operators. The translation operators allow exploitation of the hierarchical structure established in the box-tree, by transforming the projection from one basis of singular vectors into another basis of singular vectors. There are three kinds of translation operators: the multipole-to-multipole (M2M) translation operator transforms the multipole expansion of a box to the multipole expansion of its parent (see Fig. 4); the multipole-to-local (M2L) translation operator translates the multipole expansion of a box to the local expansion of another box in the same level (see Fig. 5); and the local-to-local (L2L) translation operator converts the local expansion of a non-leaf box to the local expansion of its children (see Fig. 6).

Figure 4: Illustration of the multipole-to-multipole translation operator: transforms the multipole expansions () of box to the multipole expansion () of box , where box is the parent of box .
Figure 5: Illustration of the multipole-to-local translation operator: transforms the multipole expansions () of box to the local expansion () of box , where box is in the interaction list of box . The arrow illustrates the transformation from one particular to . For each we have a unique .
Figure 6: Illustration of the local-to-local translation operator: transforms the local expansion () of box to the local expansions () of box , where box is the child of box .

Due to the use of these expansions and translation operators, the SVD-FMM computes matrix-vector products more efficiently than the standard approach, which is reflected in both serial efficiency (algorithmic complexity) and parallel efficiency (see sections 3.5 and 3.6 for more details).

Figure 7: Illustration of the SVD-FMM algorithm.

We now describe the mathematical steps of the SVD-FMM algorithm in more detail. A schematic illustrating the algorithm is given in Fig. 7.

Initialization: Compute the SVD of sub-matrices of and the translation operators. We compute truncated SVDs of sub-matrices of for each box in level 2 and 3

(7)

where and are the -th left singular vectors, and are the -th singular values, and and are the -th right singular vectors. The symbols and denote the length of and , respectively. The symbol denotes the number of singular vectors and the number of singular values.

(Note that matrix is symmetric for the data assimilation problem discussed here. Therefore, in practice we actually only need to compute one SVD per box , and for each box we have , and . However, we retain the different notations for the purpose of clarifying the algorithm.)

The singular vectors and singular values are then used to generate multipole expansions and local expansions and a set of translation operators. The M2M translation operator (see Fig. 4) converts the multipole expansion of a child box into the multipole expansion of its parent and is defined as

(8)

for each , where and denote the elements of and that correspond to the -th observation in box . The M2L translation operator (see Fig. 5) converts the multipole expansion of box into the local expansion of box , for each in the interaction list of . It is defined as

(9)

for each , where and denote the elements of and that correspond to the -th observation in box . The L2L translation operator (see Fig. 6) transfers the local expansion of a parent box to its children and is defined as

(10)

for each , where and denote the elements of and that correspond to the -th observation in the far-field of box .

Step 1: Compute the multipole expansion for the leaf boxes. The multipole expansion for each leaf box is computed by

(11)

where and .

Step 2: Compute the multipole expansion for the non-leaf boxes. The multipole expansion for each box in level 2 is computed by translating the multipole expansions from its children in level 3 using the translation operator,

(12)

where , and denotes the child of .

Step 3: Compute the first part of the local expansion. For each box in level 2 and 3, we translate the multipole expansions from boxes in its interaction list into local expansions by

(13)

where , and is in the interaction list of .

Step 4: Compute the second part of the local expansion. We transfer the local expansions for each box in level 2 to their children in level 3 using the translation operator:

(14)

where , and is ’s parent.

Step 5: Complete the local expansion for the leaf boxes. For each leaf box, the final local expansion is given by adding two parts together:

(15)

where and .

Final Step: Adding the far-field calculation to the near-field calculation. The final result for each leaf box () is obtained by adding the far-field calculation to the near-field calculation:

(16)

where . The near-field calculation is given by

(17)

The far-field calculation is given by

(18)

We note that the stepwise algorithm presented does not give a necessary order for the calculations. For example, the computation of local expansions for leaf boxes in Step 3 can be carried out once Step 1 is done.

3.4 Accuracy

In this section we discuss which factors affect the accuracy of the SVD-FMM. The first aspect to consider is the relative magnitude of the near-field and far-field terms in Eq. (6). The near-field term is computed directly, while the far-field term is approximated. Numerical roundoff error for the direct calculation of the near field term is expected to be small. Therefore, the main source of error for the SVD-FMM is from the far-field calculation. Thus, if the magnitude of the near-field term is much larger than the far-field term, the error in the final result due to the SVD-FMM approximation will tend to be small.

The accuracy of the far-field calculation is determined by the accuracy of the truncated SVD of the sub-matrices or in Eq. (7) and the accuracy of the translation operators given by Eqs. (8 - 10). The truncation error in Eq. (7) is dependent on the number of singular vectors used () and the th singular value of the sub-matrices or (e.g., Bernstein, 2009, Fact 9.14.28). Additionally, Gimbutas and Rokhlin (2003) show that the error bounds on the translation operators rely on the th singular value of the sub-matrices. Thus, the optimal number of the singular vectors used in the SVD-FMM may depend on the application. It should be determined by considering the trade-off between accuracy and efficiency, as the more the singular vectors are used, the more accurate the results but the slower the computation.

The SVD-FMM algorithm uses truncated SVDs (Eq. (7)) and thus, may cause rank deficiency of the matrix . Nevertheless, the Hessian of the full variational problem will still be full rank (e.g., Tabeart et al., 2021). In addition, the matrix used in the solution of the variational problem is not exactly the inverse of the matrix . However, only the matrix (and not ) is used in the solution of the variational problem. Furthermore, studies have shown that even approximate forms of spatial observation error correlations provide significant benefits to analysis accuracy compared with diagonal approximations (e.g., Stewart et al., 2008, 2013; Healy and White, 2005).

3.5 Algorithmic Complexity

In this section we compare the number of floating point operations (flops) required for computing matrix-vector products using the standard approach and the SVD-FMM. Let be the average number of observations in a leaf box (for in the highest level), and be the number of leaf boxes, where is the total number of observations. Computing Eq. (11) for a fixed and a fixed requires operations ( multiplication and add operations). Hence, calculating Eq. (11) for each value of requires operations. Finally, for each value of , Eq. (11) requires operations. Similarly, Eq. (12) requires operations. Eq. (13) requires less than operations for leaf boxes and operations for boxes in level 2, because there are at most 27 entries in the interaction list of each leaf box and 12 of each box at level 2. Eq. (14) requires operations. Eq. (17) requires at most operations, since the maximum number of boxes in the near field of each box is 9. Finally, Eq. (18) requires operations. Thus, the operation count of the entire algorithm (excluding the initialization step) approximately sums to or .

The SVDs and translation operators only need to be computed once if the distribution of observations does not change. The singular vectors, singular values and translation operators can be used for any vector . However, if the distribution of the observations varies too often, then the computational costs in the initialization step and the computational cost of inverting the observation error covariance matrix could make the algorithm very expensive.

For comparison, the direct matrix-vector multiplication of Eq. (1) requires operations. In data assimilation, is often computed using a forward and backward substitution (Golub and Van Loan, 1996; Weston et al., 2014; Simonin et al., 2019) which also requires operations. Fig. 2 compares the operation counts for direct computation and the SVD-FMM with our configuration of boxes and . We observe that once the number of observations exceeds 500, the SVD-FMM requires fewer floating point operations than direct matrix-vector multiplication, and the difference increases with the number of observations.

3.6 Parallelization

In this section we describe a novel parallel algorithm for the SVD-FMM. The FMM has several opportunities for parallelism (Greengard and Gropp, 1990) and our approach is not the only possibility. We have presented this section in order to provide a preliminary exploration of the potential of the SVD-FMM as a practical technique for operational data assimilation. However, we note that for our experimental results (section 5) we have not used this parallel algorithm. The number of observations considered in our idealized experiments is much smaller than in operational applications, so that the serial calculations can be done within an acceptable time.

We should first distribute the matrix and vector elements across PEs according to the partitioning of observations in the box-tree. For each leaf box , we should assign the sub-vector of given by and the sub-matrix of given by to one PE. For each box in level 2, we should allocate the sub-matrix of given by to one PE. For our particular configuration of boxes, we have 64 leaf boxes and 16 non-leaf boxes in level 2, hence we could use 80 PEs. However, to make our parallelization scheme easily comparable with the parallel formulations of matrix-vector multiplication given in sections 2.1-2.4, we actually discuss the case where we choose 16 PEs out of 64 and let them store the data for level 2 boxes. We describe a possible parallelization of each mathematical step of the SVD-FMM as follows:

  • Parallelization of the Initialization Step.

    • Each PE can calculate Eq. (7) independently. Once the singular vectors and singular values are obtained, the sub-matrices stored on each PE can be discarded.

    • To compute Eq. (8), each PE that is assigned to a parent box should send to three PEs that are assigned to its children. The calculation of the M2M translation operators would require 16 one-to-all broadcast operations to be performed simultaneously. The message size for each operation should be and PEs should be involved.

    • To compute Eq. (9), each PE should send a portion of to at most 27 PEs, because the maximum number of boxes in an interaction list is 27. The calculation of the M2L translation operators should require an all-to-all broadcast with the message size of .

    • To compute Eq. (10), each PE should send a portion of to the PE that is assigned to its parent. The computation of the L2L translation operators would require 16 all-to-one reduction operations to be carried out in the same time. The message size for each operation should be for being a level 2 box. The communications required for calculating three translation operators can be done together using an all-to-all broadcast. After the initialization step, each PE that is assigned only to a leaf box will store , , and , and those assigned to both a leaf box and a level 2 box will store and .

  • Parallelization of Step 1. This step is perfectly parallel.

  • Parallelization of Step 2. Each PE should compute and then send the result to the PE that is assigned to its parent. Computing the multipole expansions for level 2 boxes would require an all-to-one reduction. The message size for this communication operation should be .

  • Parallelization of Step 3. Each PE should compute and then send the result to another PE. Each PE should collect the partial result of Eq. (13) from at most 27 PEs. This step would need an all-to-all broadcast with a message size of or .

  • Parallelization of Step 4. Each PE that is assigned to a box in level 2 should compute and then send the result to the PEs that are assigned to its children. This step would use an one-to-all broadcast with a message size of .

  • Parallelization of Step 5. This step is perfectly parallel.

  • Parallelization of final step. The far-field calculation for each leaf-box is perfectly parallel. For the computation of the near-field calculation, each PE should obtain the elements of from at most 8 PEs, which is the maximum number of one box’s neighbours. This would require an all-to-all broadcast. The average message size for this operation is . We can use one all-to-all broadcast to complete the communication tasks for this step and step 3.

Table 3.6 summarises the communication costs for the SVD-FMM and four parallel formulations of the matrix-vector multiplications described in section 2. The major advantage of using the SVD-FMM is that the message size for each communication operation is dramatically reduced.

In the proposed parallelization scheme, we suggested using PEs and letting one of every four PEs conduct the computations for boxes in level 2. This could be particularly useful if the supercomputer configuration has non-uniform memory access (NUMA) nodes with locally shared memory (Grama et al., 2003). In this case, we could avoid the remote communications required in Step 2, Step 4 and similar steps in the Initialization. Alternatively, we could assign the tasks for level 2 and level 3 to different PEs, i.e., using 80 PEs for our configuration of boxes. This could allow each PE to carry out a similar amount of work. More generally, we note that the practical implementation of the given parallelization scheme should be adjusted to suit the configuration of the supercomputer.

Table 1: The communication time for four distinct parallel formulations of matrix-vector multiplication with full matrices and the parallelization scheme for the SVD-FMM. The number of the observations is , the number of leaf boxes is and the number of singular vectors is . The # PEs represents the number of PEs that are involved in a communication operation. The symbols and are the startup time and the per-word transfer time respectively, which are determined by the configuration of the parallel machine.
Parallelization Scheme Communication Operation # PEs Message Size Communication Time
Row-wise all-to-all broadcast
Column-wise all-to-one reduction
scatter
Block one-to-all broadcast
all-to-one reduction
Symmetric all-to-all broadcast
all-to-all reduction
SVD-FMM all-to-one reduction 4
all-to-all broadcast , or
one-to-all broadcast 4

4 Experimental Design

Our numerical experiments are designed to demonstrate the potential of applying the SVD-FMM to compute the matrix-vector products involved in operational data assimilation. We investigate the accuracy of using the SVD-FMM under different scenarios that may occur in practical applications. We conduct serial calculations rather than using parallel computing because in this initial study we have chosen to study idealized problems of moderate size as this is sufficient for our goals.

For this initial study, we have not included matrix inversion as part of the SVD-FMM algorithm. However, to obtain our experimental results we require an inverse observation error covariance matrix. We use the INV function in MATLAB (MATLAB R2020b, a) to compute the inverses of and reconditioned . The generation of the matrix is described in section 4.2 and reconditioning methods are given in section 4.3. The INV function uses an LDL decomposition (Golub and Van Loan, 1996). The required SVDs of sub-matrices of (denoted ) are computed using the SVDS function in MATLAB (MATLAB R2020b, b), which uses a Lanczos method and is efficient for finding a few singular values and vectors of a large matrix (Larsen, 1998; Baglama and Reichel, 2005).

In the results section (section 5) we use the log of root-mean-squared error (RMSE) to assess the accuracy of SVD-FMM, which is defined as

(19)

where superscript denotes the matrix-vector product computed using the SVD-FMM and denotes the matrix-vector product obtained by the standard approach. Note that the shown in section 5 is averaged over 100 realizations of , where each one uses a different .

4.1 Observation distribution and observation-minus-model departures

To simulate observation data, we assume our observations are regularly distributed over a region from to N and W to E with a grid-length of 12 km. This is similar to moderately thinned geostationary satellite data used in operational forecasting over the UK (Waller et al., 2016b). This results in 3456 observations and an error covariance matrix of size . For convenience, we directly sample the observation-minus-model departure vector

from a Gaussian distribution with mean zero and (innovation) covariance given by

plus background error covariance. The background error covariance is modelled using the SOAR correlation function with a lengthscale of 20 km and a standard deviation of 0.6 (section

4). The correlation lengthscale and standard deviation are selected according to Ballard et al. (2016) to be appropriate for km-scale numerical weather prediction. The results presented in section 5 have been averaged over 100 realizations of .

In practice, the observation distribution may be different at each assimilation cycle because of factors such as quality control. Therefore, in some of our experiments we have also applied the SVD-FMM to compute the matrix-vector product with randomly chosen missing observations.

4.2 Modelling of the observation error covariance matrix

Modelling observation error covariance matrices using correlation functions is an useful approach to deal with observation error correlations (e.g., Stewart et al., 2013; Tabeart et al., 2018). Here we use several correlation functions to model the observation error covariance matrices. The first is the Gaussian correlation function (e.g., Haben, 2011),

(20)

where is the correlation length scale and denotes the great-circle distance between two observations. The second is the first-order auto-regressive (FOAR) correlation function, also called the Markov correlation function (e.g., Stewart et al., 2013),

(21)

We also use the SOAR correlation function (e.g., Daley, 1994; Tabeart et al., 2018),

(22)

and the Matérn 5/2 correlation function, (e.g., Rasmussen and Williams, 2006),

(23)

The observation error covariance matrix can be generated using the correlation functions by

(24)

where is a diagonal matrix whose diagonal elements are a prescribed standard deviation. For our experiments, we choose the standard deviation as one and the correlation lengthscales as , , km. These correlation lengthscales are selected based on the horizontal error correlations estimated for geostationary satellite data used in operational km-scale data assimilation (Waller et al., 2016b) and for AMVs (Cordoba et al., 2017). It should be noted that the Markov and Matérn 5/2 correlation functions may lead to a sparse , in which case the SVD-FMM may not be optimal to use. We use these correlation functions in our experiments because we wish to show that the SVD-FMM can be applied to a variety of matrices.

4.3 Reconditioning of the observation error covariance matrix

In practical applications, observation error covariance matrices are often ill-conditioned (Tabeart et al., 2020b; Haben et al., 2011a). Moreover, if the matrix has a large condition number, this can lead to poor convergence of the minimisation problem in variational data assimilation (Weston et al., 2014; Tabeart et al., 2018, 2020a). In our applications, the condition number of the matrices that are created using correlation functions is dependent on the chosen function and correlation lengthscale (Haben, 2011; Haben et al., 2011b).

We use two common matrix reconditioning techniques to reduce the matrix condition number in our experiments: the ridge regression method and the minimum eigenvalue method

(Tabeart et al., 2020b; Weston et al., 2014; Campbell et al., 2017; Bormann et al., 2016). In the ridge regression (RR) method the reconditioned matrix is given by

(25)

where

is the identity matrix and

(26)

where is the maximum eigenvalue of , is the minimum eigenvalue of R and is the required new condition number.

The minimum eigenvalue (ME) method changes the eigenvalue spectrum of the matrix by setting all eigenvalues smaller than a threshold value to the threshold value. The threshold value () is given in terms of the required condition number as

(27)

The reconditioned matrix is then constructed via

(28)

where

is a square matrix whose columns are the eigenvectors of

and is a diagonal matrix with the diagonal elements being the new eigenvalues.

5 Results

5.1 The number of singular vectors and the size of singular values

Figure 8: The for the SVD-FMM against the number of singular vectors used in the approximation. The matrices are given by inverting the FOAR and SOAR covariance matrices with correlation lengthscale km. The is averaged over 100 realizations of .
Figure 9: The for the SVD-FMM using singular vectors against the log of mean ()th singular value of the sub-matrices of the inverse FOAR and SOAR covariance matrices used in Fig. 8.

We carried out an experiment to assess the accuracy of the SVD-FMM, as the number of singular vectors () changes. As discussed in section 3.4, we expect the accuracy to depend on both and the th singular values for each sub-matrix. Fig. 8 provides the for the SVD-FMM with the FOAR and SOAR covariance matrices as a function of . As anticipated, the decreases, and hence the accuracy increases, as more singular vectors are used. Moreover, Fig. 9 demonstrates that the for the SVD-FMM using singular vectors has an approximately linear relationship with the log of the mean ()th singular value of the sub-matrices, which is given by

(29)

for and . Additionally, we note that the SVD-FMM with the FOAR covariance matrix is more accurate than with the SOAR covariance matrix for all the values of considered. This is because the mean th singular value of the sub-matrices of the FOAR matrix is consistently smaller than that of the SOAR matrix (Fig. 9). We have also carried out several other experiments using different correlation lengthscales, different correlation functions and reconditioned covariance matrices and found similar qualitative results: the accuracy of the SVD-FMM using singular vectors relies on the th singular values of the sub-matrices.

These results using the mean singular values of the sub-matrices provide some guidance as to how to set the value of in a given application. However, as they require the computation of the SVDs of all of the relevant sub-matrices of , they are time-consuming to compute.

5.2 Varying correlation lengthscale

Figure 10: As Fig. 8, but matrices are given by inverting the SOAR covariance matrices with three correlation lengthscales.

Different observations can exhibit correlated errors over different lengthscales. To examine how correlation length affects the accuracy of the SVD-FMM, we use three correlation lengthscales for the SOAR correlation function: , and km. Fig. 10 reveals an increase of the of the SVD-FMM with correlation lengthscale. Nevertheless, we still only need a few singular vectors to obtain a small . However, if we desire to obtain a given accuracy, we may need to use a larger value of for longer lengthscales. In addition, we note that the magnitude of the elements of increases as correlation lengthscale increases. For a smaller lengthscale, the far-field elements of are close to zero and could be neglected to give a sparse matrix. Moreover, compared to the SOAR correlation function, the Matérn and Markov functions are more likely to give a sparse if the correlation lengthscale is small.

The variation of the accuracy of the SVD-FMM with correlation lengthscale can also be explained by the singular values of the sub-matrices. We find in our numerical experiments (not shown) that the singular values of the sub-matrices are smaller when the correlation lengthscale is smaller. It is known that the maximum singular value of the full matrix (inverse SOAR covariance matrix) also decreases as the correlation lengthscale reduces (Haben, 2011, section 5.3.2). Therefore, we investigated whether it would be possible to use the singular values of the full matrix to estimate the accuracy of the SVD-FMM. Thompson (1972) showed that the singular values of the sub-matrices are bounded by the singular values of the full matrix, i.e.,

(30)

where denotes a sub-matrix of , denotes the -th singular value of and denotes the -th singular value of . Equality can be obtained for some choices of . However, since the dimensions of the sub-matrices used in SVD-FMM are much smaller than the dimensions of the full matrix, our numerical experiments showed that is typically much smaller than in practice. We compared the maximum singular values of different full matrices numerically and found that they could provide a rough guide to relative accuracies: for a given value of , the best accuracy was obtained for the full matrix with the smallest maximum singular value.

5.3 Reconditioned covariance matrices

Figure 11: As Fig. 8, but matrices are given by inverting reconditioned SOAR covariance matrices with correlation lengthscale km. The ridge regression (RR) and minimum eigenvalue (ME) methods are used to reduce the condition number of SOAR covariance matrix to target values ().

The observation error covariance matrices used in data assimilation procedures are often ill-conditioned and thus, in order to invert them we need to reduce their condition numbers using proper techniques (Tabeart et al., 2020b). Reconditioning the covariance matrices may also improve the convergence behaviour of the minimisation procedure (Weston et al., 2014; Tabeart et al., 2020a, 2021). We evaluate how reconditioning affects the accuracy of the SVD-FMM; we reduce the condition number of the SOAR covariance matrix with km to three required new condition numbers (, and ) using the RR and ME methods. Fig. 11 shows that reconditioning the SOAR correlation matrix can improve the accuracy of SVD-FMM; the smaller the condition number of the SOAR covariance matrices after reconditioning, the better the accuracy.

We also find that the RR method gives smaller than the ME method for the same required condition number. This difference is caused by the different ways the two reconditioning methods reduce the condition number, which leads to different size singular values of the full inverse covariance matrices satisfying:

(31)

where and denote the maximum singular value of the reconditioned matrices obtained using the RR and ME methods, respectively. Combing Eq. (30) and Eq. (31), the leading singular values of the sub-matrices of are expected to typically be smaller than the leading singular values of the sub-matrices of . Hence, the SVD-FMM with reconditioned matrices using RR method will usually have smaller error. A derivation of Eq. (31) is presented in Appendix.

5.4 Varying correlation functions

Figure 12: As Fig. 8, but matrices are given by inverting reconditioned Gaussian, FOAR, SOAR and Matern covariance matrices with a correlation lengthscale of 80 km. All covariance matrices are reconditioned to have a new condition number of 1000 using the RR method prior to inversion.

To provide more numerical evidence for the wide applicability of the SVD-FMM, we use other correlation functions than SOAR and FOAR to create covariance matrices. To allow for comparison with Fig. 11 we use the RR method to reduce the condition number of all the matrices to 1000 before inverting them. Fig. 12 shows that the SVD-FMM can work well with the inverses of the matrices given by different correlation functions. We note that although the condition number of each matrix is identical after reconditioning, the accuracy of the SVD-FMM still relates to the original condition numbers; the larger the condition number prior to reconditioning, the greater the . In our experiments the FOAR correlation matrix has a condition number on the order of thousands, while the Gaussian correlation matrix is extremely badly conditioned and has a condition number on the order of prior to reconditioning.

By comparing Fig. 12 with Fig. 11, we also observe that reducing the condition number of the Gaussian correlation matrix to 1000 using the RR method gives larger than reducing the condition number of the SOAR correlation matrix to 3000 using the same method. Therefore, to acquire a comparable accuracy using the SVD-FMM with the same value of , we may need to reduce the condition number of two matrices to different values. The matrix with a larger condition number prior to reconditioning may need a smaller condition number after reconditioning.

5.5 Removing a portion of observations

In operational data assimilation, the number of observations varies each assimilation cycle due to quality control, among other reasons. This will lead to a decrease of the dimension of covariance matrix and disrupt the structure of the matrix. The resultant covariance matrix lacks some of the rows and columns of the original one. The missing observations may cause a problem if a leaf box becomes empty or contains fewer observations than . This could be solved by using a different partition of observations. In our experiment, we randomly select the locations of the missing observations, and with up to 25% missing observations, every leaf box always contains enough observations. Fig. 13 shows that the SVD-FMM can still work well with missing observations without losing accuracy. In our experiments using an inverse SOAR covariance matrix, we find that the missing observations actually lead to a slight decrease in the compared to the full set of observations. The difference in the log(RMSE) between removing 10% of the observations and removing 25% of the observations is not statistically significant.

Figure 13: As Fig. 8, but matrices are given by inverse SOAR covariance matrices with correlation lengthscale km and randomly chosen missing observations.

6 Conclusion and discussion

Some observations have been shown to exhibit strong spatial error correlations, e.g., Doppler radar radial winds, geostationary satellite data, and AMVs. Accounting for this information in data assimilation systems can improve analysis accuracy and forecast skill. However, it can make the computation of the products of observation weighting matrices and observation-minus-model departure vectors very expensive, in terms of not only the computational complexity, but also the communication costs in parallel computing. We have therefore investigated the use of the SVD-FMM for the rapid computation of these matrix-vector products. This numerical approximation method is best suited for full or dense precision matrices. If the observation weighting matrix is sparse it might be appropriate to only compute the near-field calculation. Moreover, the SVD-FMM is suitable for use for large problems, when the overhead of computing the SVD-FMM expansions is outweighed by the reductions in floating-point operations and communication costs compared with alternative approaches, and where each PE has enough memory to store its part of the error covariance matrix of these observations. The exact range of the number of observations that makes the SVD-FMM useful depends on the computer architecture and the constraints of the operational schedule.

We proposed a novel possible parallelization scheme for the SVD-FMM in our application. In comparison to the parallel formulations of direct matrix-vector multiplication (section 2), the parallelization scheme for the SVD-FMM largely reduces the size of the messages transferred. This reduces communication costs, making the use of full observation error covariance matrices associated with large spatial extents potentially feasible in operational data assimilation.

In our idealised experiments we have examined the accuracy of the SVD-FMM, in terms of applying it to the inverses of various observation error covariance matrices. These matrices are created using commonly used correlation functions, such as Gaussian and SOAR correlation functions, and different correlation lengthscales. In some of our experiments, we have applied reconditioning methods to the covariance matrices before inverting them. We have also investigated the feasibility of the SVD-FMM with randomly chosen missing observations. We find consistent results as discussed in section 3.4: i) the accuracy of the SVD-FMM increases as more singular vectors are used and ii) the variation of the accuracy is approximately linear with the mean spectrum of singular values of the sub-matrices used in the approximation. Furthermore, our experiments indicate that a comparison of the maximum singular values of the full inverse matrices themselves can be used as a rough guide to determine which matrices will give more accurate results with the SVD-FMM.

The most computationally expensive parts of our current implementation of the SVD-FMM are the inversions of covariance matrices and the SVDs of the sub-matrices of inverse matrices. They need to be re-performed every time the observation error covariance matrix is changed. This happens when the number of observations or the distribution of observations are changed. Nevertheless, our work has shown that the SVD-FMM has potential for use in operational data assimilation for fast computation of the products of the inverse observation error covariance matrices and observation-minus-model departure vectors. This will allow a large volume of observational data to be assimilated within a short time interval, which is particularly important for applications such as convection-permitting hazardous weather forecasting.

Acknowledgements

The authors would like to express their thanks to Amos Lawless, Nancy K. Nichols, David Simonin and Joanne A. Waller for their useful discussions. G. Hu and S. L. Dance were funded in part by the UK EPSRC DARE project (EP/P002331/1). The MATLAB code for SVD-FMM developed by the authors is available on request from the corresponding author.

Conflict of Interest

We declare we have no competing interests.

References

  • Anderson (2003) J. L. Anderson. A Local Least Squares Framework for Ensemble Filtering. Monthly Weather Review, 131(4):634 – 642, 2003. doi: 10.1175/1520-0493(2003)131<0634:ALLSFF>2.0.CO;2. URL https://journals.ametsoc.org/view/journals/mwre/131/4/1520-0493_2003_131_0634_allsff_2.0.co_2.xml.
  • Baglama and Reichel (2005) J. Baglama and L. Reichel. Augmented implicitly restarted Lanczos bidiagonalization methods. SIAM Journal on Scientific Computing, 27(1):19–42, 2005.
  • Ballard et al. (2016) S. P. Ballard, Z. Li, D. Simonin, and J.-F. Caron. Performance of 4D-Var NWP-based nowcasting of precipitation at the Met Office for summer 2012. Quarterly Journal of the Royal Meteorological Society, 142(694):472–487, 2016. doi: https://doi.org/10.1002/qj.2665. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.2665.
  • Bédard and Buehner (2020) J. Bédard and M. Buehner. A practical assimilation approach to extract smaller-scale information from observations with spatially correlated errors: An idealized study. Quarterly Journal of the Royal Meteorological Society, 146(726):468–482, 2020. doi: https://doi.org/10.1002/qj.3687. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.3687.
  • Bernstein (2009) D. S. Bernstein. Matrix Mathematics: Theory, Facts, and Formulas - Second Edition. Princeton University Press, Princeton, 2009. ISBN 978-1-4008-3334-4. doi: https://doi.org/10.1515/9781400833344.
  • Bormann et al. (2016) N. Bormann, M. Bonavita, R. Dragani, R. Eresmaa, M. Matricardi, and A. McNally. Enhancing the impact of IASI observations through an updated observation-error covariance matrix. Quarterly Journal of the Royal Meteorological Society, 142(697):1767–1780, 2016.
  • Campbell et al. (2017) W. F. Campbell, E. A. Satterfield, B. Ruston, and N. L. Baker. Accounting for correlated observation error in a dual-formulation 4D variational data assimilation system. Monthly Weather Review, 145(3):1019–1032, 2017.
  • Cordoba et al. (2017) M. Cordoba, S. L. Dance, G. A. Kelly, N. K. Nichols, and J. A. Waller. Diagnosing atmospheric motion vector observation errors for an operational high-resolution data assimilation system. Quarterly Journal of the Royal Meteorological Society, 143(702):333–341, 2017. doi: 10.1002/qj.2925. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.2925.
  • Daley (1994) R. Daley. Atmospheric data analysis. Cambridge university press, 1994.
  • Dance et al. (2019) S. L. Dance, S. P. Ballard, R. N. Bannister, P. Clark, H. L. Cloke, T. Darlington, D. L. A. Flack, S. L. Gray, L. Hawkness-Smith, N. Husnoo, A. J. Illingworth, G. A. Kelly, H. W. Lean, D. Li, N. K. Nichols, J. C. Nicol, A. Oxley, R. S. Plant, N. M. Roberts, I. Roulstone, D. Simonin, R. J. Thompson, and J. A. Waller. Improvements in Forecasting Intense Rainfall: Results from the FRANC (Forecasting Rainfall Exploiting New Data Assimilation Techniques and Novel Observations of Convection) Project. Atmosphere, 10(3), 2019. ISSN 2073-4433. doi: 10.3390/atmos10030125. URL https://www.mdpi.com/2073-4433/10/3/125.
  • Deng (2012) Y. Deng. Applied Parallel Computing. World Scientific Publishing Co., Inc., USA, 2012. ISBN 9789814307604.
  • Desroziers et al. (2005) G. Desroziers, L. Berre, B. Chapnik, and P. Poli. Diagnosis of observation, background and analysis-error statistics in observation space. Quarterly Journal of the Royal Meteorological Society, 131(613):3385–3396, 2005. doi: https://doi.org/10.1256/qj.05.108. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1256/qj.05.108.
  • Fowler et al. (2018) A. M. Fowler, S. L. Dance, and J. A. Waller. On the interaction of observation and prior error correlations in data assimilation. Quarterly Journal of the Royal Meteorological Society, 144(710):48–62, 2018. doi: https://doi.org/10.1002/qj.3183. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.3183.
  • Gargantini (1982) I. Gargantini. An effective way to represent quadtrees. Communications of the ACM, 25(12):905–910, 1982.
  • Geus and Röllin (2001) R. Geus and S. Röllin. Towards a fast parallel sparse symmetric matrix–vector multiplication. Parallel Computing, 27(7):883–896, 2001. ISSN 0167-8191. doi: https://doi.org/10.1016/S0167-8191(01)00073-4. URL https://www.sciencedirect.com/science/article/pii/S0167819101000734. Linear systems and associated problems.
  • Gimbutas and Rokhlin (2003) Z. Gimbutas and V. Rokhlin. A generalized fast multipole method for nonoscillatory kernels. SIAM Journal on Scientific Computing, 24(3):796–817, 2003.
  • Golub and Van Loan (1996) G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1996.
  • Grama et al. (2003) A. Grama, G. Karypis, V. Kumar, and A. Gupta. Introduction to parallel computing. Addison-Wesley, 2003.
  • Greengard and Gropp (1990) L. Greengard and W. D. Gropp. A parallel version of the fast multipole method. Computers & Mathematics with Applications, 20(7):63–71, 1990.
  • Greengard and Rokhlin (1997) L. Greengard and V. Rokhlin. A Fast Algorithm for Particle Simulations. Journal of Computational Physics, 135(2):280 – 292, 1997. ISSN 0021-9991. doi: https://doi.org/10.1006/jcph.1997.5706. URL http://www.sciencedirect.com/science/article/pii/S0021999197957065.
  • Guillet et al. (2019) O. Guillet, A. T. Weaver, X. Vasseur, Y. Michel, S. Gratton, and S. Gürol. Modelling spatially correlated observation errors in variational data assimilation using a diffusion operator on an unstructured mesh. Quarterly Journal of the Royal Meteorological Society, 145(722):1947–1967, 2019. doi: 10.1002/qj.3537. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.3537.
  • Haben et al. (2011a) S. Haben, A. Lawless, and N. Nichols. Conditioning and preconditioning of the variational data assimilation problem. Computers & Fluids, 46(1):252 – 256, 2011a. ISSN 0045-7930. doi: https://doi.org/10.1016/j.compfluid.2010.11.025. URL http://www.sciencedirect.com/science/article/pii/S0045793010003373. 10th ICFD Conference Series on Numerical Methods for Fluid Dynamics (ICFD 2010).
  • Haben (2011) S. A. Haben. Conditioning and preconditioning of the minimisation problem in variational data assimilation. PhD thesis, University of Reading, 2011.
  • Haben et al. (2011b) S. A. Haben, A. Lawless, and N. Nichols. Conditioning of incremental variational data assimilation, with application to the Met Office system. Tellus A: Dynamic Meteorology and Oceanography, 63(4):782–792, 2011b. doi: 10.1111/j.1600-0870.2011.00527.x. URL https://doi.org/10.1111/j.1600-0870.2011.00527.x.
  • Healy and White (2005) S. Healy and A. White.

    Use of discrete Fourier transforms in the 1D-Var retrieval problem.

    Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 131(605):63–72, 2005.
  • Honda et al. (2018) T. Honda, T. Miyoshi, G.-Y. Lien, S. Nishizawa, R. Yoshida, S. A. Adachi, K. Terasaki, K. Okamoto, H. Tomita, and K. Bessho. Assimilating All-Sky Himawari-8 Satellite Infrared Radiances: A Case of Typhoon Soudelor (2015). Monthly Weather Review, 146(1):213 – 229, 2018. doi: 10.1175/MWR-D-16-0357.1. URL https://journals.ametsoc.org/view/journals/mwre/146/1/mwr-d-16-0357.1.xml.
  • Hu and Franzke (2017) G. Hu and C. L. E. Franzke. Data Assimilation in a Multi-Scale Model. Mathematics of Climate and Weather Forecasting, 3(1):118–139, 2017. doi: doi:10.1515/mcwf-2017-0006. URL https://doi.org/10.1515/mcwf-2017-0006.
  • Hu and Franzke (2020) G. Hu and C. L. E. Franzke. Evaluation of Daily Precipitation Extremes in Reanalysis and Gridded Observation-Based Data Sets Over Germany. Geophysical Research Letters, 47(18):e2020GL089624, 2020. doi: https://doi.org/10.1029/2020GL089624. URL https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2020GL089624. e2020GL089624 10.1029/2020GL089624.
  • Janjić et al. (2018) T. Janjić, N. Bormann, M. Bocquet, J. Carton, S. Cohn, S. Dance, S. Losa, N. Nichols, R. Potthast, J. Waller, et al. On the representation error in data assimilation. Quarterly Journal of the Royal Meteorological Society, 144(713):1257–1278, 2018.
  • Larsen (1998) R. Larsen. Lanczos Bidiagonalization With Partial Reorthogonalization. DAIMI Report Series, 27(537), Dec. 1998. doi: 10.7146/dpb.v27i537.7070. URL https://tidsskrift.dk/daimipb/article/view/7070.
  • Liu and Rabier (2003) Z.-Q. Liu and F. Rabier. The potential of high-density observations for numerical weather prediction: A study with simulated observations. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 129(594):3013–3035, 2003.
  • Lorenc et al. (2000) A. Lorenc, S. Ballard, R. Bell, N. Ingleby, P. Andrews, D. Barker, J. Bray, A. Clayton, T. Dalby, D. Li, et al. The Met. Office global three-dimensional variational data assimilation scheme. Quarterly Journal of the Royal Meteorological Society, 126(570):2991–3012, 2000.
  • MATLAB R2020b (a) MATLAB R2020b. INV function in MATLAB R2020b. https://uk.mathworks.com/help/matlab/ref/inv.html, a. Last visited: 2020-11-23.
  • MATLAB R2020b (b) MATLAB R2020b. SVDS function in MATLAB R2020b. https://uk.mathworks.com/help/matlab/ref/svds.html, b. Last visited: 2020-11-23.
  • Michel (2018) Y. Michel. Revisiting Fisher’s approach to the handling of horizontal spatial correlations of observation errors in a variational framework. Quarterly Journal of the Royal Meteorological Society, 144(716):2011–2025, 2018. doi: https://doi.org/10.1002/qj.3249. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.3249.
  • Nichols (2010) N. K. Nichols. Mathematical Concepts of Data Assimilation. In W. Lahoz, B. Khattatov, and R. Menard, editors, Data Assimilation: Making Sense of Observations, pages 13–39. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010. ISBN 978-3-540-74703-1. doi: 10.1007/978-3-540-74703-1_2. URL https://doi.org/10.1007/978-3-540-74703-1_2.
  • Nino-Ruiz et al. (2019) E. D. Nino-Ruiz, A. Sandu, and X. Deng.

    A parallel implementation of the ensemble Kalman filter based on modified Cholesky decomposition.

    Journal of Computational Science, 36:100654, 2019. ISSN 1877-7503. doi: https://doi.org/10.1016/j.jocs.2017.04.005. URL https://www.sciencedirect.com/science/article/pii/S187775031730399X.
  • Rainwater et al. (2015) S. Rainwater, C. H. Bishop, and W. F. Campbell. The benefits of correlated observation errors for small scales. Quarterly Journal of the Royal Meteorological Society, 141(693):3439–3445, 2015. doi: https://doi.org/10.1002/qj.2582. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.2582.
  • Rasmussen and Williams (2006) C. E. Rasmussen and C. K. I. Williams.

    Gaussian processes in machine learning

    .
    MIT Press, Cambridge, Massachusetts, 2006.
  • Rawlins et al. (2007) F. Rawlins, S. Ballard, K. Bovis, A. Clayton, D. Li, G. Inverarity, A. Lorenc, and T. Payne. The Met Office global four-dimensional variational data assimilation scheme. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 133(623):347–362, 2007.
  • Rokhlin (1985) V. Rokhlin. Rapid solution of integral equations of classical potential theory. Journal of computational physics, 60(2):187–207, 1985.
  • Schmetz et al. (2002) J. Schmetz, P. Pili, S. Tjemkes, D. Just, J. Kerkmann, S. Rota, and A. Ratier. An introduction to meteosat second generation (msg). Bulletin of the American Meteorological Society, 83(7):977 – 992, 2002. doi: 10.1175/1520-0477(2002)083<0977:AITMSG>2.3.CO;2.
  • Schmit et al. (2017) T. J. Schmit, P. Griffith, M. M. Gunshor, J. M. Daniels, S. J. Goodman, and W. J. Lebair. A Closer Look at the ABI on the GOES-R Series. Bulletin of the American Meteorological Society, 98(4):681 – 698, 2017. doi: 10.1175/BAMS-D-15-00230.1.
  • Seity et al. (2011) Y. Seity, P. Brousseau, S. Malardel, G. Hello, P. Bénard, F. Bouttier, C. Lac, and V. Masson. The AROME-France Convective-Scale Operational Model. Monthly Weather Review, 139(3):976 – 991, 2011. doi: 10.1175/2010MWR3425.1.
  • Simonin et al. (2019) D. Simonin, J. A. Waller, S. P. Ballard, S. L. Dance, and N. K. Nichols. A pragmatic strategy for implementing spatially correlated observation errors in an operational system: An application to Doppler radial winds. Quarterly Journal of the Royal Meteorological Society, 145(723):2772–2790, 2019.
  • Stewart et al. (2008) L. M. Stewart, S. Dance, and N. Nichols. Correlated observation errors in data assimilation. International journal for numerical methods in fluids, 56(8):1521–1527, 2008.
  • Stewart et al. (2013) L. M. Stewart, S. L. Dance, and N. K. Nichols. Data assimilation with correlated observation errors: experiments with a 1-D shallow water model. Tellus A: Dynamic Meteorology and Oceanography, 65(1):19546, 2013.
  • Tabeart et al. (2018) J. M. Tabeart, S. L. Dance, S. A. Haben, A. S. Lawless, N. K. Nichols, and J. A. Waller. The conditioning of least-squares problems in variational data assimilation. Numerical Linear Algebra with Applications, 25(5):e2165, 2018.
  • Tabeart et al. (2020a) J. M. Tabeart, S. L. Dance, A. S. Lawless, S. Migliorini, N. K. Nichols, F. Smith, and J. A. Waller. The impact of using reconditioned correlated observation-error covariance matrices in the Met Office 1D-Var system. Quarterly Journal of the Royal Meteorological Society, 146(728):1372–1390, 2020a.
  • Tabeart et al. (2020b) J. M. Tabeart, S. L. Dance, A. S. Lawless, N. K. Nichols, and J. A. Waller. Improving the condition number of estimated covariance matrices. Tellus A: Dynamic Meteorology and Oceanography, 72(1):1–19, 2020b.
  • Tabeart et al. (2021) J. M. Tabeart, S. L. Dance, A. S. Lawless, N. K. Nichols, and J. A. Waller. New bounds on the condition number of the Hessian of the preconditioned variational data assimilation problem. Numerical Linear Algebra with Applications, page e2405, 2021. doi: https://doi.org/10.1002/nla.2405. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/nla.2405.
  • Tandeo et al. (2020) P. Tandeo, P. Ailliot, M. Bocquet, A. Carrassi, T. Miyoshi, M. Pulido, and Y. Zhen. A Review of Innovation-Based Methods to Jointly Estimate Model and Observation Error Covariance Matrices in Ensemble Data Assimilation. Monthly Weather Review, 148(10):3973 – 3994, 2020. doi: 10.1175/MWR-D-19-0240.1. URL https://journals.ametsoc.org/view/journals/mwre/148/10/mwrD190240.xml.
  • Thompson (1972) R. Thompson. Principal submatrices IX: Interlacing inequalities for singular values of submatrices. Linear Algebra and its Applications, 5(1):1 – 12, 1972. ISSN 0024-3795. doi: https://doi.org/10.1016/0024-3795(72)90013-4. URL http://www.sciencedirect.com/science/article/pii/0024379572900134.
  • Waller et al. (2016a) J. Waller, D. Simonin, S. Dance, N. Nichols, and S. Ballard. Diagnosing observation error correlations for Doppler radar radial winds in the Met Office UKV model using observation-minus-background and observation-minus-analysis statistics. Monthly Weather Review, 144(10):3533–3551, 2016a.
  • Waller et al. (2016b) J. A. Waller, S. P. Ballard, S. L. Dance, G. Kelly, N. K. Nichols, and D. Simonin. Diagnosing horizontal and inter-channel observation error correlations for SEVIRI observations using observation-minus-background and observation-minus-analysis statistics. Remote sensing, 8(7):581, 2016b.
  • Waller et al. (2019) J. A. Waller, E. Bauernschubert, S. L. Dance, N. K. Nichols, R. Potthast, and D. Simonin. Observation Error Statistics for Doppler Radar Radial Wind Superobservations Assimilated into the DWD COSMO-KENDA System. Monthly Weather Review, 147(9):3351 – 3364, 2019. doi: 10.1175/MWR-D-19-0104.1. URL https://journals.ametsoc.org/view/journals/mwre/147/9/mwr-d-19-0104.1.xml.
  • Weston et al. (2014) P. Weston, W. Bell, and J. Eyre. Accounting for correlated error in the assimilation of high-resolution sounder data. Quarterly Journal of the Royal Meteorological Society, 140(685):2420–2429, 2014.
  • WMO (2017) WMO. Guidelines on Best Practices for Achieving User Readiness for New Meteorological Satellites, 2017. WMO-No. 1187.
  • Yang et al. (2017) J. Yang, Z. Zhang, C. Wei, F. Lu, and Q. Guo. Introducing the New Generation of Chinese Geostationary Weather Satellites, Fengyun-4. Bulletin of the American Meteorological Society, 98(8):1637 – 1658, 2017. doi: 10.1175/BAMS-D-16-0065.1.

Appendix: Maximum Singular Values of Reconditioned Matrices Using Rr and Me Methods

In this appendix we derive the result given in Eq. (31). The RR method increases each of eigenvalues of the original matrix by the same amount via adding an increment given by Eq. (26) to the diagonal. In contrast, the ME method changes only the smallest eigenvalues to a value given by Eq. (27). Let denote the minimum eigenvalue of and denote the minimum eigenvalue of , then we have

(32)

The reciprocal of the minimum eigenvalue of a matrix is the maximum eigenvalue of its inverse and hence we have

(33)

Since the eigenvalues and singular values of a symmetric positive semidefinite matrix are the same (Bernstein, 2009, Definition 5.6.1), we obtain