## 1 Introduction

Data assimilation (DA) combines noisy data, usually from instrumental uncertainty or issues with scale, with models that are imperfect, due to simplifying assumptions or other inaccuracies, to improve predictions about the past, current, or future state of a system. Typical DA techniques require a prior uncertainty and use the likelihood of observations to calculate the posterior distribution. This Bayesian context provides not only predictions but also quantification of the uncertainty in these predictions. DA originated with numerical weather prediction and is now employed in many scientific and engineering disciplines. With infinite dimensional models such as partial differential equations (PDEs) there are often features that require fine spatial resolution to obtain accurate approximations of model solutions. An alternative to uniform fine meshes is to employ meshes that are coarse in parts of the spatial domain but fine in other parts of the domain. When the regions requiring finer resolution change with time, time-dependent adaptive meshes are advantageous. In the context of data assimilation, employing time-dependent adaptive meshes also has the potential to reduce the dimension needed for good approximation of model solutions.

Combining DA with adaptive moving mesh techniques presents both opportunities and challenges. Many DA techniques employ an ensemble of solutions to make predictions and estimate the uncertainty in the predictions. This introduces a key choice for implementation: either each ensemble solution may evolve on its own independent adaptive mesh, or the ensemble time evolution can happen on a mesh that is common to all ensemble members. In the first case, the mesh for each ensemble member may be tailored to each ensemble solution. However, when incorporating a DA scheme, this presents the challenge of combining ensemble solutions that are supported on different spatial meshes. In the second case, one needs to combine the features of the different ensemble solutions to determine a mesh that provides, on average, good approximation for all ensemble members. In addition, the relative position of the ensemble mesh(es) with respect to potentially time-dependent observation locations may also motivate the choice of ensemble mesh(es).

Our contribution in this paper is to develop a framework and techniques to utilize adaptive meshing techniques for data assimilation with application to PDE models in one and higher space dimensions. This framework allows each ensemble member to evolve on its own independent mesh and presents an adaptive common mesh for the DA update. An adaptive mesh, while not usually uniform in the standard Euclidean metric, can be viewed as uniform in some other metric. This metric is defined by a positive-definite matrix valued monitor function, also called a metric tensor, which controls mesh movement for the ensemble members and is also used to define the new adaptive common mesh. The computation of the metric tensor easily generalizes to higher spatial dimensions. Using an adaptive common mesh based on metric tensors of the ensemble meshes provides a means for determining a mesh that is common to all ensemble members while providing good approximation properties for the ensemble solutions.

The adaptive common mesh is formed through the metric tensor intersection of the metric tensors that monitor the movement of the ensemble meshes. Geometrically, the metric tensor intersection is the same as circumscribing an ellipsoid on an element of two meshes and then finding an ellipsoid that resides in the geometric intersection of the first two ellipsoids. The new element given by that intersecting ellipsoid is the result of the metric tensor intersection. This procedure must be done pairwise, so the resulting ellipsoid is not necessarily maximal. However, a greedy algorithm can be used to find an ordering that seeks to maximize the resulting ellipsoid.

The metric tensor intersection of the ensemble member meshes forms a common mesh that supports all ensemble members with high accuracy. However, the mesh points of this common mesh may not align with the observation locations. If the observation locations do not coincide with nodes in the common mesh, then the location mismatch results in errors from interpolating the observations to the common mesh. This is especially relevant in the case of time-dependent observation locations. With field alignment Ravela et al. (2007)

, a variational DA scheme is developed that assimilates both the alignment and amplitude of observations simultaneously. In addition, a two-step approach is developed, where first the locations of the state variables are adjusted to better match the observations. Rather than assume that the observations occur at the same spatial discretization used for the numerical solution, a vector of displacement is employed so that the (interpolated) numerical solution at adjusted nodes is obtained to maximize the posterior distribution of the numerical solution with displacement given the observations. The DA then proceeds with traditional correction based upon the amplitude of the errors of the numerical solution at the adjusted discretization.

The mismatch of observation locations and nodal positions of the common mesh presents another potential opportunity for DA on adaptive moving meshes: the meshes can adapt to concentrate near or align with the observation locations. An observation mesh can be formed by associating a metric tensor with the location of the (potentially time-dependent) observations. Intersecting the common mesh from the ensemble members with this observation mesh provides a new common mesh that is concentrated near observation locations. Interpolation error can have a significant effect on the accuracy of DA schemes, and concentrating the mesh near observation locations reduces the interpolation required, thereby improving the performance of the DA algorithms. Of course, a fixed common mesh can also be used to concentrate the mesh near fixed observation locations, but this approach has the benefit of adapting easily to time-dependent observation locations.

In addition, we develop spatially and temporally dynamic localization schemes, based upon the metric tensor(s) corresponding to the adaptive common mesh. Localization improves DA procedures by ensuring that observations only affect nearby points. Broadly speaking, localization schemes fall into two categories: domain localization and covariance (or R) localization. Domain localization schemes define a spatial radius and use that to define which mesh points are affected by a given observation. Covariance localization schemes use a correlation function to modify the covariance matrix that is used in the DA update, so that the covariance between an observation and the solution values decays to zero as the distance between the observation and the solution values increases.

We develop a domain localization scheme that employs the metric tensor. Employing a fixed, uniform radius of influence for the observations is that the localization scheme may not be effective if there is a steep gradient in the solution. One could predetermine the location of the gradient and adjust the localization scheme accordingly, but if the regions of large gradient are time-dependent, this will usually result in the tuned localization parameter being quite small. However, since the metric tensor provides information about the dynamics of the ensemble solution, it can be used to define an adaptive localization scheme where the localization radius can vary in time and space.

One benefit of using an adaptive moving mesh is that fewer mesh points can be used while still maintaining the same accuracy. Having an adaptive time-dependent common mesh allows for a fewer number of nodes used in the common mesh as compared to a fixed, fine common mesh, increasing the efficiency of the linear algebra, e.g., when updating the mean and covariance with an ensemble Kalman filter. An efficient implementation of the Ensemble Kalman Filter (EnKF) requires

flops when or more generally, for example when , (see, e.g., Mandel (2006)) where is the dimension of the observation space, is the dimension of the discretized dynamical system, and is the number of ensemble members. In large scale geophysical applications we typically desire (in general should be roughly the number of positive and neutral Lyapunov exponents). A reduction in based upon using fewer mesh points while maintaining or enhancing accuracy results in improved efficiency.There are several recent works on integrating adaptive spatial meshing techniques with DA, although most of the focus has been on PDE models in one space dimension. This includes methods based on evolving meshes based on the solution of a differential equation, methods in which meshes are updated statically based upon interpolation, and remeshing techniques that add or subtract mesh points as the solution structure changes. In Bonan et al. (2017), the evolution of meshes was movement of the nodes was determined by the solution of moving mesh differential equations that are coupled to the discretized PDE. The state variables of the PDE were then augmented with the position of the nodes and incorporated into a DA scheme. The test problem consisted of a two-dimensional ice sheet assumed to be radially symmetric; therefore, it reduced to a problem with one spatial dimension. In Aydoğdu et al. (2019) and Du et al. (2016) common meshes were developed based on a combining through interpolation the ensemble meshes. This allowed for update of mean and covariance for Kalman filter based DA techniques while allowing each ensemble member to evolve on its own independent mesh. That is, at each observational timestep, the ensemble members were interpolated to the common mesh, updated with the DA analysis, and then interpolated back to their respective meshes. Specifically, a uniform, non-conservative mesh was used in Aydoğdu et al. (2019), with Lagrangian observations in one spatial dimension. Higher spatial dimensions were used in Du et al. (2016), with a fixed common mesh refined near observation locations. Sampson et al. (2021) uses the same 1D non-conservative adaptive meshing scheme as in Aydoğdu et al. (2019) and extends this approach through the use of an adaptive common mesh, where, like in Bonan et al. (2017), the state vector is augmented with the node locations.

The outline of this paper is as follows. Background of data assimilation and adaptive moving mesh techniques is given in Section 2. This includes the framework we develop to include equations describing mesh movement within a DA framework. The development of adaptive meshing techniques for DA is detailed in Section 3. Metric tensors are introduced and their connection to non-uniform meshes is discussed. Techniques for combining meshes based on metric tensor intersection and for concentrating ensemble mesh(es) near observation locations are developed. The details of our implementation are in Section 4. This includes the discontinuous Galerkin discretization we employ and the specific metric tensor formulation we use to adaptively evolve the ensemble mesh(es). The details of our experimental setup and numerical results for both 1D and 2D inviscid Burgers equations are presented in Section 5.

## 2 Background on Data Assimilation and Adaptive Moving Meshes

### 2.1 Data Assimilation

Data assimilation techniques seek to combine models and data to improve predictions and quantify uncertainty typically in a Bayesian context (see, e.g., Carrassi et al. (2018), Law et al. (2015), Reich and Cotter (2015), van Leeuwen et al. (2019)). Consider a finite dimensional discrete time system for a state vector () that evolves based upon

(1) |

where propagates the state forward in time, stands for the th time step, and

is assumed to be a normally distributed model error with covariance matrix

and mean . Equation (1) can be used to forecast the state of the dynamical system, but this prediction can often be improved by including the observation () given by the data model(2) |

where is the unknown “truth” at time , is the observation operator, and is assumed to be a normally distributed observation error with covariance matrix and mean . In many applications, . We wish to determine that satisfies, in some sense, both the physical and data models. Note that implicit in the formulation of the data model (2) is that the state variables () and data variables () are supported on common spatial locations.

Many data assimilation techniques are based upon a Bayesian approach that determine the posterior distribution from the prior distribution and the likelihood. Given an observation at time and a prior estimate

of the state, Bayes’ Theorem states that

(3) |

This procedure extends to the sequential assimilation of observations at multiple times under the assumption that the state is Markovian. Note that since the model noise is independent and identically distributed (i.i.d.), the prior can be written as , where, e.g., .

For nonlinear models (1) available data assimilation techniques include the ensemble Kalman filter (EnKF), particle filters (PF), variational methods such as 4DVar, and hybrid techniques that seek to combine the best features of different types of techniques. Several of these classes of methods are naturally ensemble based (EnKF, PF and their variants, as well as several hybrid methods) while variational methods may employ ensembles of solutions to approximate derivatives or as part of an iterative linear system solver.

Ensemble DA procedures use an ensemble of solutions to make predictions for the physical state via a two-step process. First, the prediction step uses the physical model (1) to integrate the ensemble members , where is the number of ensemble members, to make the ensemble forecasts . Second, the analysis step incorporates the observations at time to adjust the prediction . EnKF does this through the following sequential process:

(4) |

(5) |

where

is the identity matrix,

is the linearization of , and is the Kalman gain matrix. The ensemble mean together with the forecast covariance in (4) is employed to make predictions with a measure of uncertainty. Again, the implicit assumption here is that all ensemble members reside on the same mesh.Variational methods such as 4DVar and 3DVar are direct formulations of Bayes’ Theorem. The analysis update of a variational DA algorithm can be viewed as the minimizer of a corresponding cost function. For example, when the prior distribution and the observational error model are Gaussian, the cost function is

(6) |

where is obtained by the evolution of the model dynamics (1) and is the background error covariance. Note that this cost function is quadratic when both the physical model and the observation operator are linear, so in this case there is a unique optimizer. In the case that either the physical model or observation operator are nonlinear, or if the error distributions are not Gaussian, the posterior distribution may not have a unique optimum. The use of non-Gaussian error distributions results to corresponding terms in the cost function (6) which may be approximated at least locally with a quadratic cost function, e.g., with variants such as incremental 4DVar.

Both EnKF and 4DVar are used in large-scale applications. 4DVar provides great flexibility in including terms in the cost function, does not require linear physical model or observation operators, and can make use of sophisticated optimization techniques. EnKF is based upon Gaussian assumptions and parameterizes the prediction and uncertainty in terms of the ensemble mean and sample covariance. Together with local linear approximation of the observation operator, EnKF is a linear solver corresponding to a quadratic cost function. For more on the advantages and disadvantages of the EnKF and 4DVar techniques, see Kalnay et al. (2007); Lorenc (2003).

Hybrid methods seek to combine the advantages of ensemble Kalman filter techniques and variational techniques. An important motivation behind hybrid methods is to incorporate flow-dependent background error covariance matrices () into a variational setting. A representative hybrid method is the ETKF-4DVar technique in which in (6) is replaced by

where is a weighting factor, is the background error covariance used in 4DVar, and is the error covariance found using the ensemble transform Kalman filter (ETKF), a square root filter similar to the EnKF (4), (5). For this reduces to the standard 4DVar, while for it is known as 4DVAR-BEN, and for , the method is known as the ETKF-4DVAR. For further discussion of hybrid methods, see, e.g., Asch et al. (2016); Bannister (2017); Lorenc (2013).

While the previous classes of techniques all rely to some extent on Gaussian assumptions by parameterizing the predicted state and its uncertainty in terms of a mean and covariance, particle filters approximate the posterior distributions in an unstructured way in terms of particles (analogous to ensemble members) and particle weights. For example, the standard or bootstrap particle filter uses the model dynamics (1) to make predictions using each particle. The weights are updated using the observation . In the bootstrap PF the update is

(7) |

where the likelihood, given Gaussian observational error model, is

(8) |

and is chosen so that . The standard particle filter and many variants suffer from weight degeneracy in which most weights are nearly zero and this decreases the effective number of particles. This leads to the need for a large number of particles () that increases exponentially with the dimension of the state space and the observation space . Several variants including the implicit PF, the equivalent weights PF, and optimal proposal PF have been developed (see, e.g., van Leeuwen et al. (2019)

) to overcome weight degeneracy and the curse of dimensionality in PFs.

Modern DA procedures employ spatial localization and covariance inflation techniques. Localization can be done either through a Schur product (also known as a Hadamard or element-wise product) with the covariance matrix, resulting in observations having little or no effect on distant nodes. It can also be done through domain localization, where the domain is broken into subdomains, and the observations of each subdomain only affects the variables supported within that subdomain. Inflation (either additive or multiplicative) increases the entries of the covariance matrix, preventing degeneracy of the procedure.

The focus of this work is employing adaptive moving meshes with ensemble DA methods. In the following we outline a framework for reducing to the discrete time, finite dimensional DA problem (1), (2). We start from the simplest cases of discretized ODE and PDE on fixed spatial meshes, and then formulate the DA problem using moving mesh methods in which a differential equation determines the movement of mesh points. We will emphasize the errors that are introduced in these different scenarios.

### 2.2 ODEs and Time-Dependent PDEs Discretized on Fixed Mesh

Given an ODE

(9) |

as the physical model, the use of a time stepping technique and the subsequent addition of additive model error is used to obtain (1). Similarly, given a time-dependent PDE, after spatial discretization we obtain a system of ODEs that can then be discretized in time to obtain a discrete time, finite dimensional physical model (1) with the addition of model noise.

### 2.3 The Physical PDE Model and Moving Mesh Equations

Consider now the physical model as a time-dependent PDE written abstractly as posed on an appropriate function space. PDEs employed in a DA context may be coupled to an equation that evolves the spatial mesh, enabling DA on an adaptive moving mesh. There are two basic approaches to adaptive moving meshes. The first is a rezoning approach, used in Aydoğdu et al. (2019); Sampson et al. (2021), which updates the mesh at each time using a given mesh generation and interpolates the PDE solution from the old mesh to the new mesh. The second approach is a quasi-Lagrange approach where the mesh is considered to move continuously with time. In this case, the discretized PDE is supplemented with an advective term to reflect mesh movement. If satisfies a PDE given by , then the total derivative is given by

(10) |

The equation for the mesh movement comes from a variational approach in which a cost function (the meshing functional) is minimized through a gradient flow differential equation:

(11) |

where is a user-specified parameter controlling the speed of mesh movement. We note that when , (11) reduces to an algebraic equation and when satisfied the mesh instantaneously satisfies a local minimizer of the meshing functional. In practice, first fix and then update the solution of the PDE by solving (10). The solution to (10), together with , gives the mesh velocity defined by (11). A new mesh is obtained by integrating (11) in time.

In an ensemble-based method, an adaptive mesh PDE discretization can be used for each of the ensemble members. However, the computations of the ensemble mean and covariance only make sense if the values of the ensemble members are taken from the same spatial locations. Therefore, if the ensemble members’ meshes can evolve independently via an adaptive moving mesh scheme, special care must be taken to calculate the mean and covariance at each DA step. Previous works have explored two general approaches to this problem. One approach is to interpolate the ensemble solutions to a common mesh at each observational time step and assimilate the PDE variables only. The common mesh approach is used here, as well as in Aydoğdu et al. (2019); Du et al. (2016). Another method is to assimilate both the PDE variables and the common mesh locations, as done in Bonan et al. (2017).

In Bonan et al. (2017), they augment the state vector with the mesh points . That is, the new state vector is given by

(12) |

The DA update is then performed on this augmented state vector. Note that this approach doubles in the DA scheme, affecting the computational cost of the DA update. In Aydoğdu et al. (2019); Sampson et al. (2021) the mesh movement is based on a Lagrangian flow of form together with upper and lower bounds on the mesh spacing that are enforced by remeshing and subtracting or adding mesh points. For example, on a spatial domain this corresponds to a meshing functional (11) of the form

(13) |

###### Example 1.

As a concrete example of a PDE and mesh movement equation, consider a 1D reaction diffusion equation:

(14) |

with, for example, homogeneous Dirichlet or homogeneous Neumann boundary conditions and smooth initial data.

A non-uniform finite difference discretization in space yields :

(15) |

Now consider a moving mesh scheme determined by the equidistribution of arc length of the solution:

(16) |

After discretizing both (15) and (16) in time we obtain a discrete time, finite dimensional model (1) with the state vector or depending on whether only the PDE solution is assimilated or if the mesh locations are as well.

### 2.4 Observations and Their Locations

Consider observations at a fixed time , supported on a potentially time-dependent set of observation locations . In practice, these observations are related to the unknown truth through the observation operator, suppressing the time dependence,

(17) |

At a fixed time , if are a subset of the nodes of the common mesh, then (17) is exactly (2), and the DA update proceeds as detailed above. However, it is often the case that the are not a subset of the nodes of the common mesh. In this case, it is natural to either interpolate the ensemble members onto a common mesh that contains the observations or interpolate the observations to the common mesh to obtain innovations of the form

(18) |

where interpolates from the common mesh to the locations in the domain of the observation operator and maps from the observation locations to (part of) the common mesh.

## 3 Development of Adaptive Mesh Methods for DA

In this section we develop techniques for DA with adaptive moving meshes. These techniques are based upon the use of a metric tensor that describes the mesh. In particular, a non-uniform mesh is uniform with respect to the metric tensor being employed. The use of metric tensors is applicable not only in one spatial dimension, but also in higher spatial dimensions. We first describe the development of metric tensors using the so-called equidistribution and alignment conditions. Metric tensor intersections are introduced next and are used to combine meshes (e.g., the meshes of ensemble members) and potentially the locations of observations into what is in some sense an “averaged” mesh. We develop techniques for defining metric tensors that will concentrate mesh points near observation locations or other locations of interest and develop adaptive localization techniques based upon the use of metric tensors. A rough outline of the overall algorithm in given in Figure 1.

### 3.1 Metric Tensors and Adaptive Meshes

In one dimension, a mesh can be defined by determining the size of each of the elements of a mesh. In higher spatial dimensions, however, it is necessary to have conditions to also determine the shape and orientation of each element in addition to its size. For this and following discussions of adaptive mesh techniques, we follow the approach of Huang and Russell (2010).

Consider a polyhedral domain (), and let be a reference element such that it is an equilateral -simplex with unit volume. Then, given a simplicial mesh , for any element , there is a unique invertible affine mapping such that . The size, shape, and orientation of can be obtained from .

One way to create adaptive meshes is through the use of a given symmetric positive-definite matrix-valued monitor function, , which is also sometimes called a metric tensor. This monitor function defines a metric, and a mesh that is uniform in this metric is said to be an -uniform mesh. That is, a mesh is -uniform if and only if all elements have a constant volume and are equilateral in the metric . These conditions are called the equidistribution and alignment criteria (Huang and Russell (2010), section 4.1.1). To be specific, let to be the average of over the element , that is,

(19) |

Then the equidistribution condition is given by

(20) |

where and is the number of simplexes/elements in . The alignment condition is equivalent to

(21) |

where and denote the determinant and trace of a matrix, respectively. Given a user-prescribed monitor function , equations (20) and (21) can be used to define an energy functional, which in turn generates mesh movement; see detail in Section 4.1.

### 3.2 Forming the Common Mesh

At each data assimilation step, the common mesh is calculated by taking the metric tensor intersection of the monitor functions for each ensemble member; see, e.g., Zhang et al. (2020a). The metric tensor intersection is better understood in geometry as the intersection of unit balls in different norms that are associated with symmetric and positive definite (SPD) matrices. Mathematically, if and are SPD matrices of the same dimension, then their intersection is denoted by “” and defined as

(22) |

where is a nonsingular matrix such that

(23) |

It is not difficult to show that the unit ball in the norm associated with is contained in the unit balls in the norms associated with and . As a consequence, when and are two metric tensors (meaning they are SPD-matrix-valued functions), the mesh associated with will combine concentration patterns of the meshes associated with and .

Consider a DA scheme with ensemble members , which are defined on different, independent meshes , . Assume that the ensemble meshes and the common mesh have the same number of mesh elements and vertices (denoted and , respectively), as well as the same connectivity. That is, they differ only in the location of the vertices. At each observational time step, first interpolate the ensemble members from the ensemble meshes to the current common mesh . Then compute the metric tensor for each of the ensemble members, denoted , and obtain a single metric tensor by matrix intersection, , which will be used to generate the common mesh at the next time step. In practice, this computation can be done sequentially. In 1D, the order does not matter since the metric tensor intersection corresponds to finding the maximum value. However, the order does matter in multi-dimensions and different orderings lead to different final metric tensors. While the examples we present in Section 5 are robust with respect to the different orderings, we have used the Greedy Algorithm based on minimizing the determinant (which is equivalent to maximizing the area of the ellipsoid from the intersection) to determine an optimal ordering.

### 3.3 Concentration of Mesh Points near Observation Locations

Using metric tensors to define a common mesh gives some amount of control over the location of the mesh points. Specifically, the common mesh can be concentrated near specific points of interest, such as observation locations. In the context of data assimilation, it may be beneficial to have more mesh points near the observation locations , where is the number of observations. The locations of the observations do not need to be fixed, as long as the location is known at the observational time .

One strategy for ensuring that there are common mesh points near the observation locations is to simply fix the observation locations as points in the common mesh. However, this can lead to meshes that are ill-conditioned when the mesh points are not allowed to move freely with the dynamics of the solution. Instead of fixing the observation location as a point in the common mesh, a monitor function can concentrate the mesh near the observation locations. Ideally, is comparable to in some measure at the locations where the mesh requires more points, and quickly decays away from these locations. One such choice is

where

(24) |

If the ensemble meshes alone determine the common mesh, set . Conversely, if the locations of the observations alone determine the common mesh, set . Note that this last scenario is similar to Du et al. (2016), where a fixed, non-uniform common mesh is employed based on the observation locations.

### 3.4 Adaptive Localization

A localization scheme can ensure that observations only affect nearby mesh points. There are broadly two categories of localization schemes: domain localization and covariance localization. For a covariance localization scheme, the covariance matrix is adjusted so that the analysis update is less affected by observations that are farther away. Several works explore this type of R-localization scheme, including in an adaptive sense Wang et al. (2018); Moosavi et al. (2018); Popov and Sandu (2019). The second category of localization schemes is domain localization, where the domain is decomposed into several subdomains. Domain localization ensures that an observation only affects the solution at mesh points within the same subdomain. It will not affect the solution outside of the given subdomain. One common domain localization scheme is that used in Hunt et al. (2007).

We define a metric tensor localization scheme (MT localization) as a domain localization scheme of Hunt et al. (2007) (cf. Section 4.5) but with the localization radii calculated for all mesh node as a function of the determinant of the monitor function as follows. Instead of having one predetermined radius of localization , at each timestep we compute the localization radius for each node:

(25) |

where , , and and are the parameters offering some control over the localization regime. It is not difficult to see that

(26) |

This shows that the larger the value of , the larger the localization radius can be. A smaller cutoff value will increase the lower bound of the localization radius, ensuring that localization can still happen, even given a sharp front.

When combining the concentration of mesh points with the MT localization scheme, the localization radius should be calculated solely based on the meshes from the previous time-step and the ensemble solutions. This is done before the common mesh is concentrated near the observations. If not, the localization radius would be incorrectly computed from the concentration scheme instead of the ensemble solutions to the PDE.

### 3.5 Remeshing ensemble members

At the beginning of each observational timestep, the ensemble members reside on their corresponding meshes . The analysis is computed on the common mesh, and the updated ensemble members are interpolated back to their individual meshes. Sometimes the analysis update is enough of a perturbation from the forecast that the meshes that worked for the forecast are no longer suitable for the analysis meshes.

If the perturbation is large enough, the forecast meshes might not still be appropriate for the analysis. One option is to add extra smoothing cycles before integrating. If necessary, we can remesh to find a mesh suitable for the analysis ensemble. The new meshes for ensemble members must satisfy the equidistribution and alignment criteria for the updated ensemble, and the result is the analysis ensemble residing on the updated meshes .

### 3.6 Algorithm for DA on Adaptive Meshes

The technique that has been developed here for DA with an adaptive moving mesh is summarized in Algorithm 1. Here we assume that all meshes have the same number of elements and nodes and the same connectivity. A main advantage of this is that those meshes can be viewed as deformations to each other and conservative interpolation schemes can be developed relatively easier between those meshes; e.g., see Zhang et al. (2020b) or Section 4.3. It is important in the DA computation to conserve ensemble members at the interpolation steps 7 and 8 in Algorithm 1 since, otherwise, the mean zero assumption for the model error in (1) will be violated.

We note that it is not a requirement for meshes to have the same number of elements and nodes and the same connectivity for the metric tensor approach to ensemble DA with adaptive meshing. This approach can also be used with a meshing scheme where mesh points may be added or eliminated based on some pre-defined criteria and/or ensemble meshes and the common mesh can have different numbers of elements and nodes. However, precaution should be taken for interpolation between ensemble meshes and the common mesh to ensure conservation of the ensemble members and therefore mean zero of the model error.

## 4 Implementation Details

### 4.1 Defining Mesh Movement

The following is a summary of the moving mesh PDE (MMPDE) approach developed in Huang (2006); Huang and Kamenski (2015); Huang et al. (1994a, b); Huang and Russell (2010). The central idea of the MMPDE moving mesh method is to view any nonuniform mesh as a uniform one in some metric ; that is, the elements have a constant volume and are equilateral in the metric . These conditions are called the equidistribution and alignment criteria; see (20) and (21). It has been shown that if a mesh begins as nonsingular (that is, the elements have positive volume), it will remain nonsingular for all time under the MMPDE method. Furthermore, the height of each of the elements will be bounded above and below by some positive constants that depend on and the initial mesh (Huang and Kamenski, 2018, Theorem 4.1).

The metric tensor is used to control the size, shape, and orientation of the elements of the mesh to be generated. Various metric tensors have been developed in Huang and Sun (2003). For this paper, we consider a Hessian-based metric tensor defined for each element as

(27) |

where is the Hessian or a recovered Hessian of the state vector on the element ; with being the eigen-decomposition of ; and is a regularization parameter defined through the following algebraic equation:

The metric tensor given by equation (27) is known to be optimal for the -norm of linear interpolation error Huang and Sun (2003).

To generate the -uniform mesh , we use here an approach different from (11) where the coordinators of the mesh nodes are evolved directly. Instead, we evolve the coordinators of the nodes of an intermediate mesh and obtain the new mesh through this intermediate mesh and interpolation. An advantage of this approach is that its formulation is simpler than that of the direct approach (cf. Huang and Kamenski (2015)). To start with, we introduce the reference computational mesh which is uniform in the Euclidean metric, and the computational mesh . The reference computational mesh has the same connectivity and the same number of vertices and elements as and stays fixed in the computation. The computational mesh serves as an intermediate variable. In this setting, for any element in , there is a unique corresponding element in . The affine map between and and its Jacobian matrix are denoted by and , respectively. With this new setting, the equidistribution and alignment criteria for -uniform meshes have a similar form as those in equations (20) and (21). To generate a mesh satisfying these conditions as closely as possible, define an energy functional as

(28) |

which is a Riemann sum of a continuous functional developed in Huang (2001) based on mesh equidistribution and alignment. Note that is a function of the coordinates of the nodes of and .

Taking as the current mesh , the MMPDE approach defines the mesh equation as a gradient system of ,

(29) |

where is considered as a row vector, is a parameter used to adjust the response time of mesh movement to the changes in .

Define the function associated with the energy (28) as

(30) |

where and the edge matrices of and are and , respectively. Using the notion of scalar-by-matrix differentiation Huang and Kamenski (2015), it is not difficult to find the derivatives of with respect to and as

(31) | |||

(32) |

Substituting (30)-(32) into (29) yields

(33) |

where is the element patch associated with the vertex , is the local index of on , and is the local velocity contributed by the element to the vertex . The local velocities are given by

(34) |

Integrating the mesh equation (33) over a physical time step, with the proper modifications for boundary vertices and with the initial mesh , yields the new computational mesh which forms a correspondence with the current mesh , i.e., . The new common mesh is defined as , which can be computed using linear interpolation.

It is common practice in moving mesh computation to smooth the metric tensor for smoother meshes. To this end, we apply a low-pass filter Huang and Russell (2010) to the metric tensor several sweeps every time it is computed.

### 4.2 DG Discretization

Numerical results will be presented in Section 5 to illustrate the DA procedure using the 1D and 2D inviscid Burgers equations. Since the Burgers equation is hyperbolic and can have discontinuous solutions such as shocks, we use the discontinuous Galerkin method (DG) for its spatial discretization. DG is a type of finite element method with the trial and test function spaces consisting of discontinuous, piecewise polynomials. It is known to be a particularly powerful numerical tool for the simulation of hyperbolic problems and has the advantages of high-order accuracy, local conservation, geometric flexibility, suitability for handling mesh-adaptivity, extremely local data structure, high parallel efficiency, and a good theoretical foundation for stability and error estimates. Over the last few decades, the DG method has been used widely in scientific and engineering computation.

Specifically, we use a quasi-Lagrangian moving mesh DG method (MMDG) to solve the Burgers equation on moving meshes Zhang et al. (2020b). The method treats the mesh movement continuous in time and leads to an extra convective term (cf. equation (10)) in the resulting discrete equations to reflect the mesh movement. Importantly, the method is (mass) conservative so that the model error has mean 0. In our computation, we use piecewise linear polynomials (P-DG) for spatial discretization and a third-order Strong Stability Preserving (SSP) Runge-Kutta scheme for temporal discretization. The reader is referred to Zhang et al. (2020b) for details of the MMDG method.

### 4.3 DG Interpolation

From Algorithm 1, we see that interpolation is needed between the ensemble meshes and the common mesh. Since these meshes are assumed to be deformations to each other, we can perform interpolation by solving a differential equation.

To be specific, we consider the deforming meshes and and the state variable that needs to be interpolated. We define a deforming mesh (with ) from to as a mesh with the nodal positions and velocities as

(35) | ||||

(36) |

Then the interpolation can be viewed as solving the following linear convective PDE on the moving mesh over :

(37) |

with the initial values as those of on .

A DG-interpolation scheme has been studied in Zhang et al. (2020b) where DG and a SSP Runge-Kutta scheme are used to discretized (37) in space and time, respectively. The scheme is conservative and positivity-preserving and can be high-order. As mentioned before, the mass conservation is important for interpolation between ensemble meshes and the common mesh to maintain the mean zero feature of the model error. Like for the PDE solver, we use this scheme for interpolation with piecewise linear polynomials (P-DG) for spatial discretization and a third-order SSP Runge-Kutta scheme for temporal discretization; see Zhang et al. (2020b) for detail.

### 4.4 DA Implementation

For our numerical experiments we employ a Local Ensemble Transform Kalman Filter (LETKF) based on Hunt et al. (2007)

. In general, an ETKF code uses a linear transform to have control over the resulting sample covariance so that the covariance of the analysis update,

, exactly satisfies the identity .Let be the ensemble mean at time , and let , be the ensemble forecast. Then define the perturbation matrix

(38) |

and the sample covariance is given by . Consider the following transformation:

(39) |

and set . Then

(40) |

as desired.

### 4.5 Localization Techniques

We will compare the metric tensor based localization scheme developed in Section 3.4 with some commonly used localization schemes. Recall from (5) the Kalman gain . R-localization modifies the Kalman gain through the Schur product of a localization matrix with the covariance matrix. One of the most common ways to define is through the Gaspari-Cohn (GC) correlation function, which is a fifth order polynomial that decays to zero.

(41) |

GC localization can be applied to either the model space or the observation space. In the model space, the localization matrix is given by

(42) |

where and are the positions of the th and th nodes and is a pre-determined localization radius. This matrix is Schur-multiplied with the ensemble covariance matrix, resulting in the Kalman update

(43) |

For localization in the observation space, the localization matrix must be Schur multiplied to both and . Therefore, two localization matrices are needed,

(44) | ||||

(45) |

Then the Kalman gain is given by

(46) |

We next outline the domain localization scheme developed in Hunt et al. (2007). Define to be the radius of localization. Then for every mesh point , if there is an observation located at the point such that , the analysis is given by the EnKF update in equation (5). If not, then the analysis update is equal to the forecast predicted by the model.

In many implementations, this radius of influence is predetermined and constant over time and space. However, there may be instances where this should be dynamic in time and space. For example, if the solution has a traveling front or shock that travels across the domain, there should be a smaller radius of localization near the region where the gradient is large and a larger radius where the solution is relatively constant, and this localization scheme should move across the domain with the traveling front or shock.

To see this, consider a solution with a single observation at and a large gradient beginning at just past this observation location; that is, . For outside of the localization radius, the observation does not affect the analysis, so . For close to the observation, consider the EnKF update for a single ensemble member, omitting the ensemble superscripts and time subscripts:

(47) |

where . For the sake of simplicity, assume and , where denotes the unit vector. Then the analysis update at the point close to the observation gives us

In particular, the larger the value of is, the greater the difference will be between the analysis and the forecast. This can be especially problematic if is large for , that is, after the shock. In that case, the analysis will be changed to be much closer to the observations, and the steep front will be smoothed out.

This problem is avoided by using a smaller radius of localization near the shock and a larger radius of localization farther away from it. Fortunately, the monitor function obtained in the adaptive moving mesh algorithm determines where mesh points will be closer together and where they will be spread far apart; by proxy, this shows where the shock or other feature of interest exists. In this way, the monitor function can inform what the localization parameter should be over time and space, enabling a dynamic update of the localization variable.

LETKF employs localization within the ensemble transform Kalman filter so that only model variables located at mesh points within a predetermined radius of an observation will assimilate that observation. This not only localizes the influence of observations but also provides a dimension reduction by creating reduced dimensional subproblems on which assimilation is performed independently.

A covariance-based localization scheme, which uses a Schur product applied to the covariance matrix, is problematic when working in this reduced dimension. For example, the reduced dimension implementation uses , taken from the Cholesky decomposition of the covariance matrix , rather than

itself. Covariance localization would use a Schur product to adjust this covariance matrix, but in doing so, it could result in a matrix with negative eigenvalues. Therefore, a covariance localization scheme is not easily implemented into the LETKF code. For the experiments where we compare the MT localization scheme to covariance localization schemes like Gaspari-Cohn, we use a traditional ETKF code.

## 5 Numerical Results

The following presents the application of these methods to the one and two dimensional inviscid Burgers equations. We generate synthetic observations by sampling from a truth run, obtained by solving this equation on an adaptive moving mesh. The ensemble members are initialized as perturbations of the initial conditions. Efficacy of the DA scheme is measured by the root mean squared error (RMSE), which is calculated as

(48) |

where is the analysis mean. A DA procedure is generally considered stable if its asymptotic behavior is on the order of the square root of the norm of the observation error. The RMSE in the experiments that follow is averaged over 10 runs.

### 5.1 Common Experimental Set-Up

The next two sections explore the use of the LETKF on the one- and two-dimensional inviscid Burgers equation. We perform identical twin experiments where the truth is generated using no model noise and observations are formed from the truth by applying the observation operator and adding noise using the observation error model with covariance matrix . Among the parameters to be tuned are the number of mesh points and the number of ensemble members. Generally speaking, more mesh points correspond to more accurate numerical solutions, lowering the RMSE. However, moving mesh methods generally require fewer mesh points than a fixed, uniform mesh. For the 1D Burgers experiments, we use mesh points, and for the 2D Burgers experiments, we use a mesh. Increasing the number of mesh points beyond the values chosen had little impact on the RMSE. The rule of thumb for ensemble-based DA schemes is that the number of ensemble members should be large enough to span the unstable subspace. For both the 1D and 2D experiments, we found ensemble members to be sufficient. That is, for larger , we found that there was no substantial improvement in RMSE.

In the 1D Burgers experiments we observe the truth at locations with an observational timestep of , and then add artificial observation error (). To avoid the observations all occurring in one region of the spatial domain, we space them linearly throughout the domain, and then perturb the locations by a small amount. These observation locations are randomly chosen, but once chosen at the beginning of each trial, they remain fixed. The truth (and observations) are taken from a fine mesh (100 mesh points) to ensure that it is fully resolved. For the 2D Burgers experiments, we use an observational timestep of with observation locations that are also uniformly spaced in a mesh, and then perturbed. Unless otherwise stated, the observational error covariance , model error covariance , and prior distribution are all set to .

For both the 1D and the 2D cases, the localization radius and inflation factors are tuned simultaneously. For MT localization, this involves determining the parameter shown in equation (25), which directly controls the maximum radius of localization. In the 1D case, we choose not to artificially limit the minimum value of the MT localization radius by choosing larger than the maximum of in (25); numerical results suggest that is sufficient. In the 2D case, we keep and note that this does affect the minimum localization radius, but that it also results in a stable DA scheme. For the GC localization schemes, this tuning experiment involves tuning the parameter as given in equations (42), (44), and (45). After each of the localization schemes has been tuned, the time series RMSE of the different localization schemes is directly compared. Finally, we compare results when choosing , , or for the common mesh. When considering long-run RMSE results, we consider the RMSE only after the DA scheme has stabilized so as to evaluate the asymptotic behavior of the DA procedure. Based on numeric results, we present the RMSE for 1D Burgers on the time interval and for 2D Burgers on the time interval . These experiments and the parameters used can be found in Table 1.

Experiment | Description | Model | Inflation Factor | Loc. Scheme | Mesh Choice | ||

1 | Localization and Inflation Tuning | (49) | varies | 50 | varies |

Comments

There are no comments yet.