1 Introduction
In variational data assimilation (e.g., Lorenc et al., 2000; Rawlins et al., 2007), a nonlinear least squares problem is solved, where observations and model forecasts are blended, taking account of their uncertainties. The minimization procedure involves timeconsuming computations of large matrixvector products. In this paper we focus on the matrixvector products involving the observations. These take the form , where is the inverse of the observation error covariance matrix and is the observationminusmodel departure vector (see section 2).
In some practical numerical weather prediction applications, observation errors are assumed to be uncorrelated, resulting in the matrix being diagonal. This reduces the number of operations required to compute matrixvector products in the minimization, and is a pragmatic strategy when the characteristics of the observation uncertainty are not well understood (e.g., Liu and Rabier, 2003). However, a number of idealized and operational studies have shown that there are significant benefits to treating full observation error covariance matrices in terms of analysis information content, analysis accuracy and forecast skill, even when knowledge of the observation error correlations is only approximate (e.g., Healy and White, 2005; Stewart et al., 2008, 2013; Weston et al., 2014; Simonin et al., 2019; Bédard and Buehner, 2020). In particular, implementation of spatial observation error correlations modifies the lengthscales of the observation increments computed by the assimilation (e.g., Fowler et al., 2018; Rainwater et al., 2015; Simonin et al., 2019), which may be especially important for multiscale systems such as convectionpermitting numerical weather prediction and reanalyses (e.g., Dance et al., 2019; Hu and Franzke, 2020) or coupled oceanatmosphere systems (e.g., Hu and Franzke, 2017)
. Furthermore, practical methods have been developed to estimate the observation uncertainty characteristics for a range of observation types (see the reviews by
Janjić et al. (2018) and Tandeo et al. (2020)). For example, Doppler radar radial winds, geostationary satellite data and atmospheric motion vectors (AMVs) have been shown to exhibit strong spatial error correlations (Waller et al., 2016a, b; Cordoba et al., 2017; Waller et al., 2019; Honda et al., 2018). We note that in practical applications, the matrix is typically treated as block diagonal with one block per observation type (for a given time). The observation errors between different types of observations are assumed to be uncorrelated.To give an idea of the expected size of the spatially correlated observation error covariance matrices, we consider geostationary satellite observations from the SEVIRI (Spinning Enhanced Visible and Infrared Imager) instrument (Waller et al., 2016b; Michel, 2018; Schmetz et al., 2002). SEVIRI measures top of the atmosphere radiances in 12 channels, with a 15minute repeat cycle and at approximately 3 km spatial resolution (excluding the high resolution visible (HRV) channel; Waller et al., 2016b; Schmetz et al., 2002). Each image consists of around pixels for the thermal infrared and solar channels (Schmetz et al., 2002). Hence, there are a vast number of SEVIRI observations, even for a regional domain. For instance, the number of SEVIRI observations within the domain covered by the operational AROME model from MétéoFrance is around (Seity et al., 2011). In operational assimilation applications, to avoid treating spatially correlated observation errors, spatial thinning of SEVIRI observations is typically applied and this reduces the number of observations by several orders of magnitude, but also reduces their benefits in improving forecast skill (Waller et al., 2016b; Michel, 2018). The next generation of meteorological satellites will produce observations with even higher spatial resolutions (WMO, 2017, Table 1). For example, the Flexible Combined Imager (FCI) on the Meteosat Third Generation (MTG) satellite and the Advanced Baseline Imager (ABI) on the Geostationary Operational Environmental SatelliteR (Schmit et al., 2017) both take measurements at approximately 0.5  2 km spatial resolution, and the Advanced Geosynchronous Radiation Imager (AGRI) on the Fengyu4 geostationary meteorological satellite produces observations at a resolution from 14 km (Yang et al., 2017).
Accounting for spatially correlated error statistics in the data assimilation algorithm has potential to increase the computational cost for several reasons: (1) the computation of large matrixvector products using parallel computing techniques, (2) the need to compute products using the inverse covariance matrix which may be different in each assimilation cycle, (3) changes to the convergence behaviour of the minimization procedure. We now review these issues in more detail.
The first issue is that the parallel computation of a large matrixvector product with observation data distributed across multiple processing elements (PEs) may become expensive due to excessive communications between PEs and load imbalance overheads (Deng, 2012). Different partitions of the matrix will lead to different parallelization schemes, which will be described in section 2. In general, in order to reduce the time spent on communication operations, it is necessary to reduce the number and/or size of messages transferred. In the context of data assimilation, much of this communication can be avoided if all the observations with mutually correlated errors are assigned to one PE, as in Simonin et al. (2019). However, this relies on the data volume of observations with mutually correlated errors being small enough for the product to be calculated on one node. It is an open question how to implement computations for larger data volumes with distributed data. While our paper is focused on parallel implementations for variational assimilation, studies such as Anderson (2003) and NinoRuiz et al. (2019) address parallel implementations for ensemble methods.
The second issue is that there is a need to use the inverse observation error covariance matrix () in the computations. The most commonly used observation uncertainty diagnosis techniques provide an estimate of the observation error covariance matrix itself rather than its inverse (Desroziers et al., 2005). Since the observation distribution changes each assimilation cycle (due to quality control etc.) there is a different inverse matrix each cycle. Simonin et al. (2019) deal with this by using a Cholesky decomposition method (Golub and Van Loan, 1996) which avoids the need to compute the inverse covariance matrix directly and is applicable to any form of error covariance matrix. Guillet et al. (2019) model the inverse of a spatially correlated observation error covariance matrix directly using a diffusion operator and fast unstructured meshing techniques. However, this method only deals with spatial error correlations and it is unclear if this approach is also suitable when spatial error correlations and interchannel error correlations are combined.
The third issue is that the use of correlated observation error statistics changes the convergence behaviour of the minimization procedure. Indeed, the preoperational experiments of Weston et al. (2014) showed problems with the minimization that were improved by reconditioning the observation error covariance matrix. The convergence of the minimization procedure has been further studied by Tabeart et al. (2018, 2020a, 2021)
who found that the minimum eigenvalue of the observation error covariance matrix is a key parameter governing the speed of convergence. Hence, the use of reconditioning methods is common in operational applications with correlated observation error statistics
(e.g., Bormann et al., 2016; Campbell et al., 2017; Tabeart et al., 2020a, b).The aim of this paper is to carry out an investigation of a method to accelerate parallel matrixvector products with distributed observational data, using a novel application of a wellknown numerical approximation algorithm, the fast multipole method (FMM; Rokhlin, 1985; Greengard and Rokhlin, 1997). We explore a particular type of FMM that uses a singular value decomposition (SVD) (Gimbutas and Rokhlin, 2003). We call this method the SVDFMM. The key idea of the SVDFMM is to split up the computation of the matrixvector product into separate nearfield and farfield calculations. The nearfield calculations are done by a standard matrixvector multiplication of a local submatrix and subvector. The farfield calculations are carried out using the singular values and singular vectors of the submatrices of . The submatrices and subvectors are determined by selecting specific rows and columns of corresponding to a partition of observations in the domain. The number of singular vectors employed in the calculation is chosen by the user. We will show that usually only a small number of singular vectors is needed to obtain a good accuracy (section 5). The SVDFMM technique reduces the number of floating point operations required compared with the standard approach for matrixvector multiplication. Additionally, we develop a novel possible parallel algorithm for the SVDFMM for our application in data assimilation, which can greatly reduce the costs of communication between PEs. The SVDFMM allows us to assimilate a large volume of observational data in a short time interval, and has the potential to be used in practical applications.
In this initial investigation, our numerical results focus on evaluating the accuracy of the SVDFMM. Thus our experiments are carried out for an idealized problem, using serial, rather than parallel computing. Nevertheless, we do compare the efficiency of the proposed parallelization scheme with different parallel formulations of standard matrixvector multiplications in terms of communication costs. In our current implementation, the method needs to be applied after we obtain a representation of the matrix . In order to have a clear focus solely on computing matrixvector products we do not address the computation of the inverse observation error covariance matrix. Instead, we assume that the inverse observation error covariance is already known.
In our experiments we show that the SVDFMM can work well with the inverses of a variety of covariance matrices. In particular, we apply the SVDFMM to the inverses of covariance matrices created using different correlation functions and lengthscales. We also use the reconditioning methods of Tabeart et al. (2020b) together with the SVDFMM to gain insight into how the accuracy of the SVDFMM should change with different levels of reconditioning. In practice, the observation distribution varies each assimilation cycle due to factors such as quality control and the removal of cloudy satellite radiance observations. Therefore, we further carry out some experiments to demonstrate that the SVDFMM is feasible even if there are some missing observations.
The rest of this paper is organised as follows: We provide the parallel formulations of standard matrixvector multiplication in section 2. In section 3 we present our novel algorithm for the SVDFMM and compare the complexity and the communication costs between the SVDFMM and standard, parallel matrixvector multiplication. In section 4 we explain our experimental design for our idealized experiments. In particular, we describe the generation of observations and observation error covariance matrices as well as the reconditioning methods. In section 5 we show the results of our numerical experiments with varying correlation functions, length scales, reconditioning methods and condition numbers. We also show how the results change with missing observations. Finally we give a summary in section 6 and conclude that our proposed algorithm has potential for use in operational data assimilation for fast computation of large matrixvector products.
2 Parallelization of standard matrixvector multiplication
In this section we describe three distinct standard parallel formulations for computing large matrixvector products see Grama et al., 2003, Section 8.1; Deng, 2012, Section 6.2.1. We will also discuss how to exploit the symmetric structure of for parallelization (Geus and Röllin, 2001). These matrixvector products arise in the solution of the variational minimization problem in data assimilation in the form
(1) 
where denotes the inverse of the observation error covariance matrix, denotes the observationminusmodel departure vector, and denotes the result of the multiplication. For the purposes of this description, we assume that the matrix and vector are already known.
To see how matrixvector products in the form of Eq. (1) arise in data assimilation, we consider the observation penalty term of the variational data assimilation cost function. This is given by multiplying the inverse observation error covariance matrix by the observationminusmodel differences
(2) 
where denotes the observation vector, denotes model state and denotes the observation operator that maps model state to the observation (; e.g., Lorenc et al., 2000; Rawlins et al., 2007; Simonin et al., 2019; Nichols, 2010). To solve the variational minimization problem, the gradient of Eq. (2) is needed (e.g., Simonin et al., 2019)
(3) 
Thus matrixvector products of the form of Eq. (1) arise in both the computation of the cost function and its gradient, with and .
The parallelization schemes for computing Eq. (1) start with the distribution of observations over PEs. We consider a simple domain decomposition, in which the observations are distributed over a number of PEs according to their geophysical locations. This will result in a split of the components of matrix and vector across PEs. We will introduce four different partitions of the matrix (see Fig. 1). Each of them will lead to a unique parallelization scheme. The communication costs of each scheme depend on various parameters, including i) the time to prepare a message for transmission, ii) the time it takes for a message to travel (latency), iii) how many words can traverse per second (bandwidth), iv) how many PEs to communicate with and v) the message size (Grama et al., 2003). Since the first three parameters are determined by the configuration of the parallel machine, we discuss the communication costs for different parallelization schemes using the last two parameters.
2.1 Rowwise partitioning
We first consider a rowwise partitioning, in which case each PE stores one row or several rows of and one element or a portion of . Fig. 1(a) illustrates the partitioning using four PEs. In order for each PE to perform its computation, we need to distribute the full vector among all the PEs. This requires an alltoall broadcast. Based on table 4.1 in Grama et al. (2003), the communication cost of this communication operation is , where is the number of PEs, is the startup time and is the perword transfer time (Grama et al., 2003). The two time parameters depend on computer architecture and performance. After the communication, each PE multiplies its rows with the vector, which requires on the order of floating point operations.
2.2 Columnwise partitioning
Instead of storing row(s), each PE can store the column(s) of . Fig. 1(b) shows an example of using four PEs. With a columnwise partitioning, each PE can calculate a partial result locally. Each PE multiplies columns of with elements of , which requires floating point operations. After the local computation, we need to perform an alltoone reduction to sum up the partial results given by each PE, which takes time (Grama et al., 2003, Table 4.1). After the alltoone reduction only one PE contains the full vector and thus, we may need to redistribute . This requires a scatter operation which takes time (Grama et al., 2003, Table 4.1). Although the columnwise partitioning circumvents the use of an expensive alltoall broadcast operation, it increases the message size for datatransfer operations. The message size for the alltoone reduction is , whereas the message size for the alltoall broadcast used for the rowwise partitioning is . It should be noticed that in data assimilation applications, the matrix is known to be symmetric, which means that the parallel algorithm using columnwise partitioning can also be used with rowwise partitioning.
2.3 Block 2D partitioning
The rowwise partitioning and columnwise partitioning are 1D partitioning. We now consider a 2D partitioning, which distributes the blocks of among PEs. An example of the block 2D partitioning is shown in Fig. 1(c). Note that the matrix is equally separated by PEs, while the vector is only distributed among PEs that own the diagonal blocks of , i.e., and . The first step is for each PE that stores a portion of to broadcast that portion to the other PEs in the same column, which requires a columnwise onetoall broadcast. The communication time of this operation is . The next step is to perform a rowwise alltoone reduction, which requires another times. Grama et al. (2003) showed that computing the matrixvector product using the block 2D partitioning of the matrix can be faster than using 1D partitioning for the same number of PEs. However, a potential problem is that when using the block 2D partitioning, the vector elements are not distributed among all the PEs and this can result in load imbalance.
2.4 Partitioning for symmetric matrices
A possible partitioning of and taking into account the symmetric structure of is shown in Fig. 1(d). The first step is to exchange the portions of among PEs. This requires an alltoall broadcast with an average message size of . It should be noticed that not all PEs need to send their portion of to all other PEs. In the case of four PEs, sends data to , and , to and , and to . The second step is to multiply the vector elements with the local part of the matrix. The last step is to exchange the results of local computation using an alltoone reduction. The average message size for this operation is . In the case of four PEs, collects the results from all other PEs, collects the results from and , and collects the result from . The last PE does not need to collect the results from other PEs. Slightly different parallelization schemes may be used for this kind of partitioning, depending on the computer architecture (Geus and Röllin, 2001). This partitioning saves the storage space of each PE (only half of the matrix is stored) and keeps the load balanced. Furthermore, this partitioning has a very large advantage for sparse matrices, in which case each PE only needs to communicate with neighbouring PEs (Geus and Röllin, 2001).
In the next section, we describe a different approach for calculating large matrixvector products, that is applicable to any matrix (most useful for full or dense matrices). This approach reduces the computational cost for serial calculations of Eq. (1) and the message size for communication operations.
3 Application of the fast multipole method (FMM) to data assimilation
The fast multipole method (FMM) was originally presented for the rapid evaluation of all pairwise interactions between a large number of charged particles (Rokhlin, 1985; Greengard and Rokhlin, 1997). The mathematical form of this fast summation is equivalent to Eq. (1). While the direct calculation of Eq. (1) requires work, the FMM needs only operations. In addition, the FMM relies on a hierarchical division of the computational domain, which is well suited for parallel computing. In the classical FMM, the matrix is given by some functions describing pairwise interactions between particles, such as those describing gravitational and electrostatic potentials. In our application, however, the matrix is obtained by inverting the matrix . Therefore, we need a generalized FMM that does not require an underlying analytic function (Gimbutas and Rokhlin, 2003).
3.1 Separation of the matrixvector product into near and farfield terms
Since the SVDFMM is not commonly used in meteorology, we first explain the key idea behind the technique and give a simple example to aid the reader’s understanding, before going into the mathematical details in later subsections. The basic idea of the SVDFMM is to calculate Eq. (1) in two parts:
(4) 
where the column indices of and the indices of are divided into two mutually independent sets denoted by and . The indices are selected according to a partition of observations (described in more detail in section 3.2). The first part of Eq. (4), referred to as the nearfield calculation, is computed using a standard matrixvector multiplication and the second part, referred to as the farfield calculation, is estimated using an approximation. In the SVDFMM, the farfield calculation is computed using the singular value decomposition (SVD) of . We provide a simple example to aid the reader’s understanding.
Example 3.1
Suppose we have a matrix and a vector given by
A standard matrixvector multiplication gives
We can also compute by partitioning the problem as in Eq. (4) by letting (the near field) and (the far field), such that
(5) 
The singular value decomposition (SVD) of the submatrix in the second term is given by
where are the orthonormal left singular vectors, are the scalar singular values, and are the orthonormal right singular vectors. If then we can truncate the SVD to give an approximation for the far field term of Eq. (5) as
More generally, for a larger problem, we can use only the first few singular vectors and singular values to estimate the farfield calculation. If the matrix remains fixed, we can use the same singular vectors and singular values to compute for any vector . This will reduce the cost for calculating Eq. (1) (unless the matrix is sparse). Furthermore, storing the singular vectors and singular values requires less memory than storing the full farfield submatrix. Note that if the matrix changes too often, then we would need to perform the SVD again and again, which would be computationally expensive.
The example we have discussed in this section only shows the basic idea of the SVDFMM. The actual SVDFMM algorithm is multilevel and more complicated. Before we can address this, we explain the partitioning of the observations that is needed for the multilevel algorithm.
3.2 Partition of observations into a hierarchical structure of nested boxes
In this section, we describe the partition of observations and explain the notation that we use. Note that we use the nomenclature found in the FMM literature (e.g., Gimbutas and Rokhlin, 2003). Suppose that we have observations. These could be located on a latitudelongitude grid, or irregularly distributed. We first find a minimal square (or rectangle) that covers the locations of all of the observations, and then hierarchically subdivide the observation domain into smaller and smaller boxes to generate a boxtree  a hierarchical structure of nested boxes. An example is given in Fig. 3. In the tree structure, level 0 refers to the biggest box that covers the entire observation domain, and level refers to the boxes that are obtained from level by subdividing each box into four smaller boxes of equal size. In our particular example, we choose level 3 as the highest level, which is the minimal number of levels required for the present SVDFMM approach. The boxes at the highest level are called leaf boxes.
In practical applications, the number of levels is determined such that the averaged number of observations in the leaf boxes is smaller than a prescribed value. For unevenly distributed observations, an adaptive tree structure could be created (Gimbutas and Rokhlin, 2003), in which smaller boxes are generated only where data is dense.
We number the boxes in all levels except level 0 by a Zorder curve as shown in Fig. 3 (Gargantini, 1982). This facilitates easy formulae for box indexing as will be described later in this subsection. We call boxes neighbours if they are on the same level and connect to each other. In each level the nearfield of a box is made of itself and all its neighbours. The farfield of the box is made of everything else. We use and to denote the lists of boxes that are in the nearfield and farfield of box , respectively. For example, the nearfield of box 16 in Fig. 3 consists of the 9 shaded boxes, namely, , and the farfield of box 16 is the other 7 boxes, i.e., . The smallest number of boxes in a box’s nearfield is 4, which occurs when the box is on a corner of the observation domain.
In a boxtree the children of box in level , denoted by , refer to the four boxes in level that are subdivided from itself, such as . In the Zcurve ordering, the indices of the children of box are given by , , , and . The parent of box , denoted by , refers to the box in a coarser level that contains box . For instance, .
The interaction list of box , denoted by , is the set of boxes which are children of the neighbours of box ’s parents and which are in the farfield of box . For example, and . Note that the interaction list of a box in level 2 is exactly the same as its farfield.
3.3 The SVDFMM algorithm
In this section we describe the multilevel SVDFMM algorithm (Gimbutas and Rokhlin, 2003). For concreteness we choose a particular configuration of boxes (as shown in Fig. 3) to describe the algorithm, but it can be generalized to other configurations. For each box we can write Eq. (4) as
(6) 
where , and denote the sets of observation indices in box , and , respectively, and and denote submatrices of that are comprised of specific rows and columns of . We call the target and the source.
The first matrixvector product in Eq. (6) is calculated from the nearfield components. It is computed directly without using any approximation. The second matrixvector product in Eq. (6), containing the farfield components, is estimated by matrix decomposition, using a multilevel approach exploiting the hierarchical boxtree structure established in section 3.2. Before giving the mathematical details, we first give an overview of the algorithm.
The key idea of the multilevel approach is to perform the SVD of the submatrices of given by and use the singular vectors and singular values just obtained to create a multipole expansion, a local expansion and three translation operators, and then use them to estimate the result. The multipole expansion can be interpreted as a short representation of the subvector of given by , which is obtained by projecting the components of onto the basis given by singular vectors and hence, is a short vector with components. The local expansion can be considered as the short representation of the subvector of given by , which is also a vector with components. It is computed from the multipole expansions of a group of boxes that are in ’s interaction list using the translation operators. The translation operators allow exploitation of the hierarchical structure established in the boxtree, by transforming the projection from one basis of singular vectors into another basis of singular vectors. There are three kinds of translation operators: the multipoletomultipole (M2M) translation operator transforms the multipole expansion of a box to the multipole expansion of its parent (see Fig. 4); the multipoletolocal (M2L) translation operator translates the multipole expansion of a box to the local expansion of another box in the same level (see Fig. 5); and the localtolocal (L2L) translation operator converts the local expansion of a nonleaf box to the local expansion of its children (see Fig. 6).
Due to the use of these expansions and translation operators, the SVDFMM computes matrixvector products more efficiently than the standard approach, which is reflected in both serial efficiency (algorithmic complexity) and parallel efficiency (see sections 3.5 and 3.6 for more details).
We now describe the mathematical steps of the SVDFMM algorithm in more detail. A schematic illustrating the algorithm is given in Fig. 7.
Initialization: Compute the SVD of submatrices of and the translation operators. We compute truncated SVDs of submatrices of for each box in level 2 and 3
(7)  
where and are the th left singular vectors, and are the th singular values, and and are the th right singular vectors. The symbols and denote the length of and , respectively. The symbol denotes the number of singular vectors and the number of singular values.
(Note that matrix is symmetric for the data assimilation problem discussed here. Therefore, in practice we actually only need to compute one SVD per box , and for each box we have , and . However, we retain the different notations for the purpose of clarifying the algorithm.)
The singular vectors and singular values are then used to generate multipole expansions and local expansions and a set of translation operators. The M2M translation operator (see Fig. 4) converts the multipole expansion of a child box into the multipole expansion of its parent and is defined as
(8) 
for each , where and denote the elements of and that correspond to the th observation in box . The M2L translation operator (see Fig. 5) converts the multipole expansion of box into the local expansion of box , for each in the interaction list of . It is defined as
(9) 
for each , where and denote the elements of and that correspond to the th observation in box . The L2L translation operator (see Fig. 6) transfers the local expansion of a parent box to its children and is defined as
(10) 
for each , where and denote the elements of and that correspond to the th observation in the farfield of box .
Step 1: Compute the multipole expansion for the leaf boxes. The multipole expansion for each leaf box is computed by
(11) 
where and .
Step 2: Compute the multipole expansion for the nonleaf boxes. The multipole expansion for each box in level 2 is computed by translating the multipole expansions from its children in level 3 using the translation operator,
(12) 
where , and denotes the child of .
Step 3: Compute the first part of the local expansion. For each box in level 2 and 3, we translate the multipole expansions from boxes in its interaction list into local expansions by
(13) 
where , and is in the interaction list of .
Step 4: Compute the second part of the local expansion. We transfer the local expansions for each box in level 2 to their children in level 3 using the translation operator:
(14) 
where , and is ’s parent.
Step 5: Complete the local expansion for the leaf boxes. For each leaf box, the final local expansion is given by adding two parts together:
(15) 
where and .
Final Step: Adding the farfield calculation to the nearfield calculation. The final result for each leaf box () is obtained by adding the farfield calculation to the nearfield calculation:
(16) 
where . The nearfield calculation is given by
(17) 
The farfield calculation is given by
(18) 
We note that the stepwise algorithm presented does not give a necessary order for the calculations. For example, the computation of local expansions for leaf boxes in Step 3 can be carried out once Step 1 is done.
3.4 Accuracy
In this section we discuss which factors affect the accuracy of the SVDFMM. The first aspect to consider is the relative magnitude of the nearfield and farfield terms in Eq. (6). The nearfield term is computed directly, while the farfield term is approximated. Numerical roundoff error for the direct calculation of the near field term is expected to be small. Therefore, the main source of error for the SVDFMM is from the farfield calculation. Thus, if the magnitude of the nearfield term is much larger than the farfield term, the error in the final result due to the SVDFMM approximation will tend to be small.
The accuracy of the farfield calculation is determined by the accuracy of the truncated SVD of the submatrices or in Eq. (7) and the accuracy of the translation operators given by Eqs. (8  10). The truncation error in Eq. (7) is dependent on the number of singular vectors used () and the th singular value of the submatrices or (e.g., Bernstein, 2009, Fact 9.14.28). Additionally, Gimbutas and Rokhlin (2003) show that the error bounds on the translation operators rely on the th singular value of the submatrices. Thus, the optimal number of the singular vectors used in the SVDFMM may depend on the application. It should be determined by considering the tradeoff between accuracy and efficiency, as the more the singular vectors are used, the more accurate the results but the slower the computation.
The SVDFMM algorithm uses truncated SVDs (Eq. (7)) and thus, may cause rank deficiency of the matrix . Nevertheless, the Hessian of the full variational problem will still be full rank (e.g., Tabeart et al., 2021). In addition, the matrix used in the solution of the variational problem is not exactly the inverse of the matrix . However, only the matrix (and not ) is used in the solution of the variational problem. Furthermore, studies have shown that even approximate forms of spatial observation error correlations provide significant benefits to analysis accuracy compared with diagonal approximations (e.g., Stewart et al., 2008, 2013; Healy and White, 2005).
3.5 Algorithmic Complexity
In this section we compare the number of floating point operations (flops) required for computing matrixvector products using the standard approach and the SVDFMM. Let be the average number of observations in a leaf box (for in the highest level), and be the number of leaf boxes, where is the total number of observations. Computing Eq. (11) for a fixed and a fixed requires operations ( multiplication and add operations). Hence, calculating Eq. (11) for each value of requires operations. Finally, for each value of , Eq. (11) requires operations. Similarly, Eq. (12) requires operations. Eq. (13) requires less than operations for leaf boxes and operations for boxes in level 2, because there are at most 27 entries in the interaction list of each leaf box and 12 of each box at level 2. Eq. (14) requires operations. Eq. (17) requires at most operations, since the maximum number of boxes in the near field of each box is 9. Finally, Eq. (18) requires operations. Thus, the operation count of the entire algorithm (excluding the initialization step) approximately sums to or .
The SVDs and translation operators only need to be computed once if the distribution of observations does not change. The singular vectors, singular values and translation operators can be used for any vector . However, if the distribution of the observations varies too often, then the computational costs in the initialization step and the computational cost of inverting the observation error covariance matrix could make the algorithm very expensive.
For comparison, the direct matrixvector multiplication of Eq. (1) requires operations. In data assimilation, is often computed using a forward and backward substitution (Golub and Van Loan, 1996; Weston et al., 2014; Simonin et al., 2019) which also requires operations. Fig. 2 compares the operation counts for direct computation and the SVDFMM with our configuration of boxes and . We observe that once the number of observations exceeds 500, the SVDFMM requires fewer floating point operations than direct matrixvector multiplication, and the difference increases with the number of observations.
3.6 Parallelization
In this section we describe a novel parallel algorithm for the SVDFMM. The FMM has several opportunities for parallelism (Greengard and Gropp, 1990) and our approach is not the only possibility. We have presented this section in order to provide a preliminary exploration of the potential of the SVDFMM as a practical technique for operational data assimilation. However, we note that for our experimental results (section 5) we have not used this parallel algorithm. The number of observations considered in our idealized experiments is much smaller than in operational applications, so that the serial calculations can be done within an acceptable time.
We should first distribute the matrix and vector elements across PEs according to the partitioning of observations in the boxtree. For each leaf box , we should assign the subvector of given by and the submatrix of given by to one PE. For each box in level 2, we should allocate the submatrix of given by to one PE. For our particular configuration of boxes, we have 64 leaf boxes and 16 nonleaf boxes in level 2, hence we could use 80 PEs. However, to make our parallelization scheme easily comparable with the parallel formulations of matrixvector multiplication given in sections 2.12.4, we actually discuss the case where we choose 16 PEs out of 64 and let them store the data for level 2 boxes. We describe a possible parallelization of each mathematical step of the SVDFMM as follows:

Parallelization of the Initialization Step.

Each PE can calculate Eq. (7) independently. Once the singular vectors and singular values are obtained, the submatrices stored on each PE can be discarded.

To compute Eq. (8), each PE that is assigned to a parent box should send to three PEs that are assigned to its children. The calculation of the M2M translation operators would require 16 onetoall broadcast operations to be performed simultaneously. The message size for each operation should be and PEs should be involved.

To compute Eq. (9), each PE should send a portion of to at most 27 PEs, because the maximum number of boxes in an interaction list is 27. The calculation of the M2L translation operators should require an alltoall broadcast with the message size of .

To compute Eq. (10), each PE should send a portion of to the PE that is assigned to its parent. The computation of the L2L translation operators would require 16 alltoone reduction operations to be carried out in the same time. The message size for each operation should be for being a level 2 box. The communications required for calculating three translation operators can be done together using an alltoall broadcast. After the initialization step, each PE that is assigned only to a leaf box will store , , and , and those assigned to both a leaf box and a level 2 box will store and .


Parallelization of Step 1. This step is perfectly parallel.

Parallelization of Step 2. Each PE should compute and then send the result to the PE that is assigned to its parent. Computing the multipole expansions for level 2 boxes would require an alltoone reduction. The message size for this communication operation should be .

Parallelization of Step 3. Each PE should compute and then send the result to another PE. Each PE should collect the partial result of Eq. (13) from at most 27 PEs. This step would need an alltoall broadcast with a message size of or .

Parallelization of Step 4. Each PE that is assigned to a box in level 2 should compute and then send the result to the PEs that are assigned to its children. This step would use an onetoall broadcast with a message size of .

Parallelization of Step 5. This step is perfectly parallel.

Parallelization of final step. The farfield calculation for each leafbox is perfectly parallel. For the computation of the nearfield calculation, each PE should obtain the elements of from at most 8 PEs, which is the maximum number of one box’s neighbours. This would require an alltoall broadcast. The average message size for this operation is . We can use one alltoall broadcast to complete the communication tasks for this step and step 3.
Table 3.6 summarises the communication costs for the SVDFMM and four parallel formulations of the matrixvector multiplications described in section 2. The major advantage of using the SVDFMM is that the message size for each communication operation is dramatically reduced.
In the proposed parallelization scheme, we suggested using PEs and letting one of every four PEs conduct the computations for boxes in level 2. This could be particularly useful if the supercomputer configuration has nonuniform memory access (NUMA) nodes with locally shared memory (Grama et al., 2003). In this case, we could avoid the remote communications required in Step 2, Step 4 and similar steps in the Initialization. Alternatively, we could assign the tasks for level 2 and level 3 to different PEs, i.e., using 80 PEs for our configuration of boxes. This could allow each PE to carry out a similar amount of work. More generally, we note that the practical implementation of the given parallelization scheme should be adjusted to suit the configuration of the supercomputer.
Parallelization Scheme  Communication Operation  # PEs  Message Size  Communication Time 

Rowwise  alltoall broadcast  
Columnwise  alltoone reduction  
scatter  
Block  onetoall broadcast  
alltoone reduction  
Symmetric  alltoall broadcast  
alltoall reduction  
SVDFMM  alltoone reduction  4  
alltoall broadcast  , or  
onetoall broadcast  4 
4 Experimental Design
Our numerical experiments are designed to demonstrate the potential of applying the SVDFMM to compute the matrixvector products involved in operational data assimilation. We investigate the accuracy of using the SVDFMM under different scenarios that may occur in practical applications. We conduct serial calculations rather than using parallel computing because in this initial study we have chosen to study idealized problems of moderate size as this is sufficient for our goals.
For this initial study, we have not included matrix inversion as part of the SVDFMM algorithm. However, to obtain our experimental results we require an inverse observation error covariance matrix. We use the INV function in MATLAB (MATLAB R2020b, a) to compute the inverses of and reconditioned . The generation of the matrix is described in section 4.2 and reconditioning methods are given in section 4.3. The INV function uses an LDL decomposition (Golub and Van Loan, 1996). The required SVDs of submatrices of (denoted ) are computed using the SVDS function in MATLAB (MATLAB R2020b, b), which uses a Lanczos method and is efficient for finding a few singular values and vectors of a large matrix (Larsen, 1998; Baglama and Reichel, 2005).
In the results section (section 5) we use the log of rootmeansquared error (RMSE) to assess the accuracy of SVDFMM, which is defined as
(19) 
where superscript denotes the matrixvector product computed using the SVDFMM and denotes the matrixvector product obtained by the standard approach. Note that the shown in section 5 is averaged over 100 realizations of , where each one uses a different .
4.1 Observation distribution and observationminusmodel departures
To simulate observation data, we assume our observations are regularly distributed over a region from to N and W to E with a gridlength of 12 km. This is similar to moderately thinned geostationary satellite data used in operational forecasting over the UK (Waller et al., 2016b). This results in 3456 observations and an error covariance matrix of size . For convenience, we directly sample the observationminusmodel departure vector
from a Gaussian distribution with mean zero and (innovation) covariance given by
plus background error covariance. The background error covariance is modelled using the SOAR correlation function with a lengthscale of 20 km and a standard deviation of 0.6 (section
4). The correlation lengthscale and standard deviation are selected according to Ballard et al. (2016) to be appropriate for kmscale numerical weather prediction. The results presented in section 5 have been averaged over 100 realizations of .In practice, the observation distribution may be different at each assimilation cycle because of factors such as quality control. Therefore, in some of our experiments we have also applied the SVDFMM to compute the matrixvector product with randomly chosen missing observations.
4.2 Modelling of the observation error covariance matrix
Modelling observation error covariance matrices using correlation functions is an useful approach to deal with observation error correlations (e.g., Stewart et al., 2013; Tabeart et al., 2018). Here we use several correlation functions to model the observation error covariance matrices. The first is the Gaussian correlation function (e.g., Haben, 2011),
(20) 
where is the correlation length scale and denotes the greatcircle distance between two observations. The second is the firstorder autoregressive (FOAR) correlation function, also called the Markov correlation function (e.g., Stewart et al., 2013),
(21) 
We also use the SOAR correlation function (e.g., Daley, 1994; Tabeart et al., 2018),
(22) 
and the Matérn 5/2 correlation function, (e.g., Rasmussen and Williams, 2006),
(23) 
The observation error covariance matrix can be generated using the correlation functions by
(24) 
where is a diagonal matrix whose diagonal elements are a prescribed standard deviation. For our experiments, we choose the standard deviation as one and the correlation lengthscales as , , km. These correlation lengthscales are selected based on the horizontal error correlations estimated for geostationary satellite data used in operational kmscale data assimilation (Waller et al., 2016b) and for AMVs (Cordoba et al., 2017). It should be noted that the Markov and Matérn 5/2 correlation functions may lead to a sparse , in which case the SVDFMM may not be optimal to use. We use these correlation functions in our experiments because we wish to show that the SVDFMM can be applied to a variety of matrices.
4.3 Reconditioning of the observation error covariance matrix
In practical applications, observation error covariance matrices are often illconditioned (Tabeart et al., 2020b; Haben et al., 2011a). Moreover, if the matrix has a large condition number, this can lead to poor convergence of the minimisation problem in variational data assimilation (Weston et al., 2014; Tabeart et al., 2018, 2020a). In our applications, the condition number of the matrices that are created using correlation functions is dependent on the chosen function and correlation lengthscale (Haben, 2011; Haben et al., 2011b).
We use two common matrix reconditioning techniques to reduce the matrix condition number in our experiments: the ridge regression method and the minimum eigenvalue method
(Tabeart et al., 2020b; Weston et al., 2014; Campbell et al., 2017; Bormann et al., 2016). In the ridge regression (RR) method the reconditioned matrix is given by(25) 
where
is the identity matrix and
(26) 
where is the maximum eigenvalue of , is the minimum eigenvalue of R and is the required new condition number.
The minimum eigenvalue (ME) method changes the eigenvalue spectrum of the matrix by setting all eigenvalues smaller than a threshold value to the threshold value. The threshold value () is given in terms of the required condition number as
(27) 
The reconditioned matrix is then constructed via
(28) 
where
is a square matrix whose columns are the eigenvectors of
and is a diagonal matrix with the diagonal elements being the new eigenvalues.5 Results
5.1 The number of singular vectors and the size of singular values
We carried out an experiment to assess the accuracy of the SVDFMM, as the number of singular vectors () changes. As discussed in section 3.4, we expect the accuracy to depend on both and the th singular values for each submatrix. Fig. 8 provides the for the SVDFMM with the FOAR and SOAR covariance matrices as a function of . As anticipated, the decreases, and hence the accuracy increases, as more singular vectors are used. Moreover, Fig. 9 demonstrates that the for the SVDFMM using singular vectors has an approximately linear relationship with the log of the mean ()th singular value of the submatrices, which is given by
(29) 
for and . Additionally, we note that the SVDFMM with the FOAR covariance matrix is more accurate than with the SOAR covariance matrix for all the values of considered. This is because the mean th singular value of the submatrices of the FOAR matrix is consistently smaller than that of the SOAR matrix (Fig. 9). We have also carried out several other experiments using different correlation lengthscales, different correlation functions and reconditioned covariance matrices and found similar qualitative results: the accuracy of the SVDFMM using singular vectors relies on the th singular values of the submatrices.
These results using the mean singular values of the submatrices provide some guidance as to how to set the value of in a given application. However, as they require the computation of the SVDs of all of the relevant submatrices of , they are timeconsuming to compute.
5.2 Varying correlation lengthscale
Different observations can exhibit correlated errors over different lengthscales. To examine how correlation length affects the accuracy of the SVDFMM, we use three correlation lengthscales for the SOAR correlation function: , and km. Fig. 10 reveals an increase of the of the SVDFMM with correlation lengthscale. Nevertheless, we still only need a few singular vectors to obtain a small . However, if we desire to obtain a given accuracy, we may need to use a larger value of for longer lengthscales. In addition, we note that the magnitude of the elements of increases as correlation lengthscale increases. For a smaller lengthscale, the farfield elements of are close to zero and could be neglected to give a sparse matrix. Moreover, compared to the SOAR correlation function, the Matérn and Markov functions are more likely to give a sparse if the correlation lengthscale is small.
The variation of the accuracy of the SVDFMM with correlation lengthscale can also be explained by the singular values of the submatrices. We find in our numerical experiments (not shown) that the singular values of the submatrices are smaller when the correlation lengthscale is smaller. It is known that the maximum singular value of the full matrix (inverse SOAR covariance matrix) also decreases as the correlation lengthscale reduces (Haben, 2011, section 5.3.2). Therefore, we investigated whether it would be possible to use the singular values of the full matrix to estimate the accuracy of the SVDFMM. Thompson (1972) showed that the singular values of the submatrices are bounded by the singular values of the full matrix, i.e.,
(30) 
where denotes a submatrix of , denotes the th singular value of and denotes the th singular value of . Equality can be obtained for some choices of . However, since the dimensions of the submatrices used in SVDFMM are much smaller than the dimensions of the full matrix, our numerical experiments showed that is typically much smaller than in practice. We compared the maximum singular values of different full matrices numerically and found that they could provide a rough guide to relative accuracies: for a given value of , the best accuracy was obtained for the full matrix with the smallest maximum singular value.
5.3 Reconditioned covariance matrices
The observation error covariance matrices used in data assimilation procedures are often illconditioned and thus, in order to invert them we need to reduce their condition numbers using proper techniques (Tabeart et al., 2020b). Reconditioning the covariance matrices may also improve the convergence behaviour of the minimisation procedure (Weston et al., 2014; Tabeart et al., 2020a, 2021). We evaluate how reconditioning affects the accuracy of the SVDFMM; we reduce the condition number of the SOAR covariance matrix with km to three required new condition numbers (, and ) using the RR and ME methods. Fig. 11 shows that reconditioning the SOAR correlation matrix can improve the accuracy of SVDFMM; the smaller the condition number of the SOAR covariance matrices after reconditioning, the better the accuracy.
We also find that the RR method gives smaller than the ME method for the same required condition number. This difference is caused by the different ways the two reconditioning methods reduce the condition number, which leads to different size singular values of the full inverse covariance matrices satisfying:
(31) 
where and denote the maximum singular value of the reconditioned matrices obtained using the RR and ME methods, respectively. Combing Eq. (30) and Eq. (31), the leading singular values of the submatrices of are expected to typically be smaller than the leading singular values of the submatrices of . Hence, the SVDFMM with reconditioned matrices using RR method will usually have smaller error. A derivation of Eq. (31) is presented in Appendix.
5.4 Varying correlation functions
To provide more numerical evidence for the wide applicability of the SVDFMM, we use other correlation functions than SOAR and FOAR to create covariance matrices. To allow for comparison with Fig. 11 we use the RR method to reduce the condition number of all the matrices to 1000 before inverting them. Fig. 12 shows that the SVDFMM can work well with the inverses of the matrices given by different correlation functions. We note that although the condition number of each matrix is identical after reconditioning, the accuracy of the SVDFMM still relates to the original condition numbers; the larger the condition number prior to reconditioning, the greater the . In our experiments the FOAR correlation matrix has a condition number on the order of thousands, while the Gaussian correlation matrix is extremely badly conditioned and has a condition number on the order of prior to reconditioning.
By comparing Fig. 12 with Fig. 11, we also observe that reducing the condition number of the Gaussian correlation matrix to 1000 using the RR method gives larger than reducing the condition number of the SOAR correlation matrix to 3000 using the same method. Therefore, to acquire a comparable accuracy using the SVDFMM with the same value of , we may need to reduce the condition number of two matrices to different values. The matrix with a larger condition number prior to reconditioning may need a smaller condition number after reconditioning.
5.5 Removing a portion of observations
In operational data assimilation, the number of observations varies each assimilation cycle due to quality control, among other reasons. This will lead to a decrease of the dimension of covariance matrix and disrupt the structure of the matrix. The resultant covariance matrix lacks some of the rows and columns of the original one. The missing observations may cause a problem if a leaf box becomes empty or contains fewer observations than . This could be solved by using a different partition of observations. In our experiment, we randomly select the locations of the missing observations, and with up to 25% missing observations, every leaf box always contains enough observations. Fig. 13 shows that the SVDFMM can still work well with missing observations without losing accuracy. In our experiments using an inverse SOAR covariance matrix, we find that the missing observations actually lead to a slight decrease in the compared to the full set of observations. The difference in the log(RMSE) between removing 10% of the observations and removing 25% of the observations is not statistically significant.
6 Conclusion and discussion
Some observations have been shown to exhibit strong spatial error correlations, e.g., Doppler radar radial winds, geostationary satellite data, and AMVs. Accounting for this information in data assimilation systems can improve analysis accuracy and forecast skill. However, it can make the computation of the products of observation weighting matrices and observationminusmodel departure vectors very expensive, in terms of not only the computational complexity, but also the communication costs in parallel computing. We have therefore investigated the use of the SVDFMM for the rapid computation of these matrixvector products. This numerical approximation method is best suited for full or dense precision matrices. If the observation weighting matrix is sparse it might be appropriate to only compute the nearfield calculation. Moreover, the SVDFMM is suitable for use for large problems, when the overhead of computing the SVDFMM expansions is outweighed by the reductions in floatingpoint operations and communication costs compared with alternative approaches, and where each PE has enough memory to store its part of the error covariance matrix of these observations. The exact range of the number of observations that makes the SVDFMM useful depends on the computer architecture and the constraints of the operational schedule.
We proposed a novel possible parallelization scheme for the SVDFMM in our application. In comparison to the parallel formulations of direct matrixvector multiplication (section 2), the parallelization scheme for the SVDFMM largely reduces the size of the messages transferred. This reduces communication costs, making the use of full observation error covariance matrices associated with large spatial extents potentially feasible in operational data assimilation.
In our idealised experiments we have examined the accuracy of the SVDFMM, in terms of applying it to the inverses of various observation error covariance matrices. These matrices are created using commonly used correlation functions, such as Gaussian and SOAR correlation functions, and different correlation lengthscales. In some of our experiments, we have applied reconditioning methods to the covariance matrices before inverting them. We have also investigated the feasibility of the SVDFMM with randomly chosen missing observations. We find consistent results as discussed in section 3.4: i) the accuracy of the SVDFMM increases as more singular vectors are used and ii) the variation of the accuracy is approximately linear with the mean spectrum of singular values of the submatrices used in the approximation. Furthermore, our experiments indicate that a comparison of the maximum singular values of the full inverse matrices themselves can be used as a rough guide to determine which matrices will give more accurate results with the SVDFMM.
The most computationally expensive parts of our current implementation of the SVDFMM are the inversions of covariance matrices and the SVDs of the submatrices of inverse matrices. They need to be reperformed every time the observation error covariance matrix is changed. This happens when the number of observations or the distribution of observations are changed. Nevertheless, our work has shown that the SVDFMM has potential for use in operational data assimilation for fast computation of the products of the inverse observation error covariance matrices and observationminusmodel departure vectors. This will allow a large volume of observational data to be assimilated within a short time interval, which is particularly important for applications such as convectionpermitting hazardous weather forecasting.
Acknowledgements
The authors would like to express their thanks to Amos Lawless, Nancy K. Nichols, David Simonin and Joanne A. Waller for their useful discussions. G. Hu and S. L. Dance were funded in part by the UK EPSRC DARE project (EP/P002331/1). The MATLAB code for SVDFMM developed by the authors is available on request from the corresponding author.
Conflict of Interest
We declare we have no competing interests.
References
 Anderson (2003) J. L. Anderson. A Local Least Squares Framework for Ensemble Filtering. Monthly Weather Review, 131(4):634 – 642, 2003. doi: 10.1175/15200493(2003)131<0634:ALLSFF>2.0.CO;2. URL https://journals.ametsoc.org/view/journals/mwre/131/4/15200493_2003_131_0634_allsff_2.0.co_2.xml.
 Baglama and Reichel (2005) J. Baglama and L. Reichel. Augmented implicitly restarted Lanczos bidiagonalization methods. SIAM Journal on Scientific Computing, 27(1):19–42, 2005.
 Ballard et al. (2016) S. P. Ballard, Z. Li, D. Simonin, and J.F. Caron. Performance of 4DVar NWPbased nowcasting of precipitation at the Met Office for summer 2012. Quarterly Journal of the Royal Meteorological Society, 142(694):472–487, 2016. doi: https://doi.org/10.1002/qj.2665. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.2665.
 Bédard and Buehner (2020) J. Bédard and M. Buehner. A practical assimilation approach to extract smallerscale information from observations with spatially correlated errors: An idealized study. Quarterly Journal of the Royal Meteorological Society, 146(726):468–482, 2020. doi: https://doi.org/10.1002/qj.3687. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.3687.
 Bernstein (2009) D. S. Bernstein. Matrix Mathematics: Theory, Facts, and Formulas  Second Edition. Princeton University Press, Princeton, 2009. ISBN 9781400833344. doi: https://doi.org/10.1515/9781400833344.
 Bormann et al. (2016) N. Bormann, M. Bonavita, R. Dragani, R. Eresmaa, M. Matricardi, and A. McNally. Enhancing the impact of IASI observations through an updated observationerror covariance matrix. Quarterly Journal of the Royal Meteorological Society, 142(697):1767–1780, 2016.
 Campbell et al. (2017) W. F. Campbell, E. A. Satterfield, B. Ruston, and N. L. Baker. Accounting for correlated observation error in a dualformulation 4D variational data assimilation system. Monthly Weather Review, 145(3):1019–1032, 2017.
 Cordoba et al. (2017) M. Cordoba, S. L. Dance, G. A. Kelly, N. K. Nichols, and J. A. Waller. Diagnosing atmospheric motion vector observation errors for an operational highresolution data assimilation system. Quarterly Journal of the Royal Meteorological Society, 143(702):333–341, 2017. doi: 10.1002/qj.2925. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.2925.
 Daley (1994) R. Daley. Atmospheric data analysis. Cambridge university press, 1994.
 Dance et al. (2019) S. L. Dance, S. P. Ballard, R. N. Bannister, P. Clark, H. L. Cloke, T. Darlington, D. L. A. Flack, S. L. Gray, L. HawknessSmith, N. Husnoo, A. J. Illingworth, G. A. Kelly, H. W. Lean, D. Li, N. K. Nichols, J. C. Nicol, A. Oxley, R. S. Plant, N. M. Roberts, I. Roulstone, D. Simonin, R. J. Thompson, and J. A. Waller. Improvements in Forecasting Intense Rainfall: Results from the FRANC (Forecasting Rainfall Exploiting New Data Assimilation Techniques and Novel Observations of Convection) Project. Atmosphere, 10(3), 2019. ISSN 20734433. doi: 10.3390/atmos10030125. URL https://www.mdpi.com/20734433/10/3/125.
 Deng (2012) Y. Deng. Applied Parallel Computing. World Scientific Publishing Co., Inc., USA, 2012. ISBN 9789814307604.
 Desroziers et al. (2005) G. Desroziers, L. Berre, B. Chapnik, and P. Poli. Diagnosis of observation, background and analysiserror statistics in observation space. Quarterly Journal of the Royal Meteorological Society, 131(613):3385–3396, 2005. doi: https://doi.org/10.1256/qj.05.108. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1256/qj.05.108.
 Fowler et al. (2018) A. M. Fowler, S. L. Dance, and J. A. Waller. On the interaction of observation and prior error correlations in data assimilation. Quarterly Journal of the Royal Meteorological Society, 144(710):48–62, 2018. doi: https://doi.org/10.1002/qj.3183. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.3183.
 Gargantini (1982) I. Gargantini. An effective way to represent quadtrees. Communications of the ACM, 25(12):905–910, 1982.
 Geus and Röllin (2001) R. Geus and S. Röllin. Towards a fast parallel sparse symmetric matrix–vector multiplication. Parallel Computing, 27(7):883–896, 2001. ISSN 01678191. doi: https://doi.org/10.1016/S01678191(01)000734. URL https://www.sciencedirect.com/science/article/pii/S0167819101000734. Linear systems and associated problems.
 Gimbutas and Rokhlin (2003) Z. Gimbutas and V. Rokhlin. A generalized fast multipole method for nonoscillatory kernels. SIAM Journal on Scientific Computing, 24(3):796–817, 2003.
 Golub and Van Loan (1996) G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1996.
 Grama et al. (2003) A. Grama, G. Karypis, V. Kumar, and A. Gupta. Introduction to parallel computing. AddisonWesley, 2003.
 Greengard and Gropp (1990) L. Greengard and W. D. Gropp. A parallel version of the fast multipole method. Computers & Mathematics with Applications, 20(7):63–71, 1990.
 Greengard and Rokhlin (1997) L. Greengard and V. Rokhlin. A Fast Algorithm for Particle Simulations. Journal of Computational Physics, 135(2):280 – 292, 1997. ISSN 00219991. doi: https://doi.org/10.1006/jcph.1997.5706. URL http://www.sciencedirect.com/science/article/pii/S0021999197957065.
 Guillet et al. (2019) O. Guillet, A. T. Weaver, X. Vasseur, Y. Michel, S. Gratton, and S. Gürol. Modelling spatially correlated observation errors in variational data assimilation using a diffusion operator on an unstructured mesh. Quarterly Journal of the Royal Meteorological Society, 145(722):1947–1967, 2019. doi: 10.1002/qj.3537. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.3537.
 Haben et al. (2011a) S. Haben, A. Lawless, and N. Nichols. Conditioning and preconditioning of the variational data assimilation problem. Computers & Fluids, 46(1):252 – 256, 2011a. ISSN 00457930. doi: https://doi.org/10.1016/j.compfluid.2010.11.025. URL http://www.sciencedirect.com/science/article/pii/S0045793010003373. 10th ICFD Conference Series on Numerical Methods for Fluid Dynamics (ICFD 2010).
 Haben (2011) S. A. Haben. Conditioning and preconditioning of the minimisation problem in variational data assimilation. PhD thesis, University of Reading, 2011.
 Haben et al. (2011b) S. A. Haben, A. Lawless, and N. Nichols. Conditioning of incremental variational data assimilation, with application to the Met Office system. Tellus A: Dynamic Meteorology and Oceanography, 63(4):782–792, 2011b. doi: 10.1111/j.16000870.2011.00527.x. URL https://doi.org/10.1111/j.16000870.2011.00527.x.

Healy and White (2005)
S. Healy and A. White.
Use of discrete Fourier transforms in the 1DVar retrieval problem.
Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 131(605):63–72, 2005.  Honda et al. (2018) T. Honda, T. Miyoshi, G.Y. Lien, S. Nishizawa, R. Yoshida, S. A. Adachi, K. Terasaki, K. Okamoto, H. Tomita, and K. Bessho. Assimilating AllSky Himawari8 Satellite Infrared Radiances: A Case of Typhoon Soudelor (2015). Monthly Weather Review, 146(1):213 – 229, 2018. doi: 10.1175/MWRD160357.1. URL https://journals.ametsoc.org/view/journals/mwre/146/1/mwrd160357.1.xml.
 Hu and Franzke (2017) G. Hu and C. L. E. Franzke. Data Assimilation in a MultiScale Model. Mathematics of Climate and Weather Forecasting, 3(1):118–139, 2017. doi: doi:10.1515/mcwf20170006. URL https://doi.org/10.1515/mcwf20170006.
 Hu and Franzke (2020) G. Hu and C. L. E. Franzke. Evaluation of Daily Precipitation Extremes in Reanalysis and Gridded ObservationBased Data Sets Over Germany. Geophysical Research Letters, 47(18):e2020GL089624, 2020. doi: https://doi.org/10.1029/2020GL089624. URL https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2020GL089624. e2020GL089624 10.1029/2020GL089624.
 Janjić et al. (2018) T. Janjić, N. Bormann, M. Bocquet, J. Carton, S. Cohn, S. Dance, S. Losa, N. Nichols, R. Potthast, J. Waller, et al. On the representation error in data assimilation. Quarterly Journal of the Royal Meteorological Society, 144(713):1257–1278, 2018.
 Larsen (1998) R. Larsen. Lanczos Bidiagonalization With Partial Reorthogonalization. DAIMI Report Series, 27(537), Dec. 1998. doi: 10.7146/dpb.v27i537.7070. URL https://tidsskrift.dk/daimipb/article/view/7070.
 Liu and Rabier (2003) Z.Q. Liu and F. Rabier. The potential of highdensity observations for numerical weather prediction: A study with simulated observations. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 129(594):3013–3035, 2003.
 Lorenc et al. (2000) A. Lorenc, S. Ballard, R. Bell, N. Ingleby, P. Andrews, D. Barker, J. Bray, A. Clayton, T. Dalby, D. Li, et al. The Met. Office global threedimensional variational data assimilation scheme. Quarterly Journal of the Royal Meteorological Society, 126(570):2991–3012, 2000.
 MATLAB R2020b (a) MATLAB R2020b. INV function in MATLAB R2020b. https://uk.mathworks.com/help/matlab/ref/inv.html, a. Last visited: 20201123.
 MATLAB R2020b (b) MATLAB R2020b. SVDS function in MATLAB R2020b. https://uk.mathworks.com/help/matlab/ref/svds.html, b. Last visited: 20201123.
 Michel (2018) Y. Michel. Revisiting Fisher’s approach to the handling of horizontal spatial correlations of observation errors in a variational framework. Quarterly Journal of the Royal Meteorological Society, 144(716):2011–2025, 2018. doi: https://doi.org/10.1002/qj.3249. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.3249.
 Nichols (2010) N. K. Nichols. Mathematical Concepts of Data Assimilation. In W. Lahoz, B. Khattatov, and R. Menard, editors, Data Assimilation: Making Sense of Observations, pages 13–39. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010. ISBN 9783540747031. doi: 10.1007/9783540747031_2. URL https://doi.org/10.1007/9783540747031_2.

NinoRuiz et al. (2019)
E. D. NinoRuiz, A. Sandu, and X. Deng.
A parallel implementation of the ensemble Kalman filter based on modified Cholesky decomposition.
Journal of Computational Science, 36:100654, 2019. ISSN 18777503. doi: https://doi.org/10.1016/j.jocs.2017.04.005. URL https://www.sciencedirect.com/science/article/pii/S187775031730399X.  Rainwater et al. (2015) S. Rainwater, C. H. Bishop, and W. F. Campbell. The benefits of correlated observation errors for small scales. Quarterly Journal of the Royal Meteorological Society, 141(693):3439–3445, 2015. doi: https://doi.org/10.1002/qj.2582. URL https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.2582.

Rasmussen and Williams (2006)
C. E. Rasmussen and C. K. I. Williams.
Gaussian processes in machine learning
. MIT Press, Cambridge, Massachusetts, 2006.  Rawlins et al. (2007) F. Rawlins, S. Ballard, K. Bovis, A. Clayton, D. Li, G. Inverarity, A. Lorenc, and T. Payne. The Met Office global fourdimensional variational data assimilation scheme. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 133(623):347–362, 2007.
 Rokhlin (1985) V. Rokhlin. Rapid solution of integral equations of classical potential theory. Journal of computational physics, 60(2):187–207, 1985.
 Schmetz et al. (2002) J. Schmetz, P. Pili, S. Tjemkes, D. Just, J. Kerkmann, S. Rota, and A. Ratier. An introduction to meteosat second generation (msg). Bulletin of the American Meteorological Society, 83(7):977 – 992, 2002. doi: 10.1175/15200477(2002)083<0977:AITMSG>2.3.CO;2.
 Schmit et al. (2017) T. J. Schmit, P. Griffith, M. M. Gunshor, J. M. Daniels, S. J. Goodman, and W. J. Lebair. A Closer Look at the ABI on the GOESR Series. Bulletin of the American Meteorological Society, 98(4):681 – 698, 2017. doi: 10.1175/BAMSD1500230.1.
 Seity et al. (2011) Y. Seity, P. Brousseau, S. Malardel, G. Hello, P. Bénard, F. Bouttier, C. Lac, and V. Masson. The AROMEFrance ConvectiveScale Operational Model. Monthly Weather Review, 139(3):976 – 991, 2011. doi: 10.1175/2010MWR3425.1.
 Simonin et al. (2019) D. Simonin, J. A. Waller, S. P. Ballard, S. L. Dance, and N. K. Nichols. A pragmatic strategy for implementing spatially correlated observation errors in an operational system: An application to Doppler radial winds. Quarterly Journal of the Royal Meteorological Society, 145(723):2772–2790, 2019.
 Stewart et al. (2008) L. M. Stewart, S. Dance, and N. Nichols. Correlated observation errors in data assimilation. International journal for numerical methods in fluids, 56(8):1521–1527, 2008.
 Stewart et al. (2013) L. M. Stewart, S. L. Dance, and N. K. Nichols. Data assimilation with correlated observation errors: experiments with a 1D shallow water model. Tellus A: Dynamic Meteorology and Oceanography, 65(1):19546, 2013.
 Tabeart et al. (2018) J. M. Tabeart, S. L. Dance, S. A. Haben, A. S. Lawless, N. K. Nichols, and J. A. Waller. The conditioning of leastsquares problems in variational data assimilation. Numerical Linear Algebra with Applications, 25(5):e2165, 2018.
 Tabeart et al. (2020a) J. M. Tabeart, S. L. Dance, A. S. Lawless, S. Migliorini, N. K. Nichols, F. Smith, and J. A. Waller. The impact of using reconditioned correlated observationerror covariance matrices in the Met Office 1DVar system. Quarterly Journal of the Royal Meteorological Society, 146(728):1372–1390, 2020a.
 Tabeart et al. (2020b) J. M. Tabeart, S. L. Dance, A. S. Lawless, N. K. Nichols, and J. A. Waller. Improving the condition number of estimated covariance matrices. Tellus A: Dynamic Meteorology and Oceanography, 72(1):1–19, 2020b.
 Tabeart et al. (2021) J. M. Tabeart, S. L. Dance, A. S. Lawless, N. K. Nichols, and J. A. Waller. New bounds on the condition number of the Hessian of the preconditioned variational data assimilation problem. Numerical Linear Algebra with Applications, page e2405, 2021. doi: https://doi.org/10.1002/nla.2405. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/nla.2405.
 Tandeo et al. (2020) P. Tandeo, P. Ailliot, M. Bocquet, A. Carrassi, T. Miyoshi, M. Pulido, and Y. Zhen. A Review of InnovationBased Methods to Jointly Estimate Model and Observation Error Covariance Matrices in Ensemble Data Assimilation. Monthly Weather Review, 148(10):3973 – 3994, 2020. doi: 10.1175/MWRD190240.1. URL https://journals.ametsoc.org/view/journals/mwre/148/10/mwrD190240.xml.
 Thompson (1972) R. Thompson. Principal submatrices IX: Interlacing inequalities for singular values of submatrices. Linear Algebra and its Applications, 5(1):1 – 12, 1972. ISSN 00243795. doi: https://doi.org/10.1016/00243795(72)900134. URL http://www.sciencedirect.com/science/article/pii/0024379572900134.
 Waller et al. (2016a) J. Waller, D. Simonin, S. Dance, N. Nichols, and S. Ballard. Diagnosing observation error correlations for Doppler radar radial winds in the Met Office UKV model using observationminusbackground and observationminusanalysis statistics. Monthly Weather Review, 144(10):3533–3551, 2016a.
 Waller et al. (2016b) J. A. Waller, S. P. Ballard, S. L. Dance, G. Kelly, N. K. Nichols, and D. Simonin. Diagnosing horizontal and interchannel observation error correlations for SEVIRI observations using observationminusbackground and observationminusanalysis statistics. Remote sensing, 8(7):581, 2016b.
 Waller et al. (2019) J. A. Waller, E. Bauernschubert, S. L. Dance, N. K. Nichols, R. Potthast, and D. Simonin. Observation Error Statistics for Doppler Radar Radial Wind Superobservations Assimilated into the DWD COSMOKENDA System. Monthly Weather Review, 147(9):3351 – 3364, 2019. doi: 10.1175/MWRD190104.1. URL https://journals.ametsoc.org/view/journals/mwre/147/9/mwrd190104.1.xml.
 Weston et al. (2014) P. Weston, W. Bell, and J. Eyre. Accounting for correlated error in the assimilation of highresolution sounder data. Quarterly Journal of the Royal Meteorological Society, 140(685):2420–2429, 2014.
 WMO (2017) WMO. Guidelines on Best Practices for Achieving User Readiness for New Meteorological Satellites, 2017. WMONo. 1187.
 Yang et al. (2017) J. Yang, Z. Zhang, C. Wei, F. Lu, and Q. Guo. Introducing the New Generation of Chinese Geostationary Weather Satellites, Fengyun4. Bulletin of the American Meteorological Society, 98(8):1637 – 1658, 2017. doi: 10.1175/BAMSD160065.1.
Appendix: Maximum Singular Values of Reconditioned Matrices Using Rr and Me Methods
In this appendix we derive the result given in Eq. (31). The RR method increases each of eigenvalues of the original matrix by the same amount via adding an increment given by Eq. (26) to the diagonal. In contrast, the ME method changes only the smallest eigenvalues to a value given by Eq. (27). Let denote the minimum eigenvalue of and denote the minimum eigenvalue of , then we have
(32) 
The reciprocal of the minimum eigenvalue of a matrix is the maximum eigenvalue of its inverse and hence we have
(33) 
Since the eigenvalues and singular values of a symmetric positive semidefinite matrix are the same (Bernstein, 2009, Definition 5.6.1), we obtain
Comments
There are no comments yet.