Low-rank multi-parametric covariance identification

04/25/2020 ∙ by Antoni Musolas, et al. ∙ University of Oxford MIT Université catholique de Louvain 0

We propose a differential geometric construction for families of low-rank covariance matrices, via interpolation on low-rank matrix manifolds. In contrast with standard parametric covariance classes, these families offer significant flexibility for problem-specific tailoring via the choice of "anchor" matrices for the interpolation. Moreover, their low-rank facilitates computational tractability in high dimensions and with limited data. We employ these covariance families for both interpolation and identification, where the latter problem comprises selecting the most representative member of the covariance family given a data set. In this setting, standard procedures such as maximum likelihood estimation are nontrivial because the covariance family is rank-deficient; we resolve this issue by casting the identification problem as distance minimization. We demonstrate the power of these differential geometric families for interpolation and identification in a practical application: wind field covariance approximation for unmanned aerial vehicle navigation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

One of the most fundamental problems in multivariate analysis and uncertainty quantification is the construction of covariance matrices. Covariance matrices are an essential tool in climatology, econometrics, model order reduction, biostatistics, signal processing, and geostatistics, among other applications. As a specific example (which we shall revisit in this paper), covariance matrices of wind velocity fields

[33, 34, 40, 8] capture the relationships among wind velocity components at different points in space. These relationships enable recursive approximation or updating of the wind field as new pointwise measurements become available. Similarly, in oil reservoirs [47, 48], covariance matrices allow information from borehole measurements to be propagated into more accurate global estimates of the permeability field. In telecommunications [39]

, covariance matrices and their eigenvectors are paramount for discerning between signal and noise.

Widely used methods in spatial statistics [14, 52, 57] include variogram estimation [13, 11] (often a first step in kriging [56, 17, 24]) or tapering of sample covariance matrices [23, 19]. Other regularized covariance estimation methodologies include sparse covariance and precision matrix estimation [6, 18, 7] and many varieties of shrinkage [35, 36, 55, 37]

. Many of these methods make rather generic assumptions on the covariance matrix (e.g., sparsity of the precision, or some structure in the spectrum); others (e.g., Matérn covariance kernels or particular variogram models) assume that the covariance is described within some parametric family. The latter can make the estimation problem quite tractable, even with relatively limited data, since the number of degrees of freedom in the family may be small.

Unfortunately, standard parametric covariance families (e.g., Matérn) [12, 51, 53] can be insufficiently flexible for practical applications. For instance, in wind field modeling, these covariance families are insufficiently expressive to capture the non-stationary and multiscale features of the velocity or vorticity [30, 31, 32]. Nonparametric approaches such as sparse precision estimation may be less restrictive, but neither approach easily allows prior knowledge—such as known wind covariances at other conditions—to be incorporated. Moreover, most of these methods yield full-rank covariance matrices, which are impractical for high-dimensional problems. For example, direct manipulation of full-rank covariances in high dimensional settings might preclude recursive estimation (i.e., conditioning) from being performed online.

In this paper, we propose to build parametric covariance functions from piecewise-Bézier interpolation techniques on manifolds, using representative covariance matrices as anchors. (See [46] for a related approach using full-rank geodesics.) These covariance functions offer significant flexibility for problem-specific tailoring and can be reduced to any given rank; both of these features enable broad applicability. We use our proposed covariance functions for identification—finding the most representative member of a family given a data set—and interpolation—given anchors at known or “labeled” conditions, predicting covariance matrices at new conditions between those of the anchors. Observe that these objectives do not require an appeal to asymptotic statistical properties, which are not investigated here.

Interpolation on matrix manifolds has been an active research topic in the last few years; see, e.g., [5, 54, 25, 28, 9]. Here we propose to rely on piecewise-Bézier curves and surfaces on manifolds that have been investigated, e.g., in [2, 3, 21, 45]. As for the space to interpolate on, an immediate choice would have been the set of all covariance matrices, namely the set of symmetric positive-semidefinite (PSD) matrices; however this set is not a manifold. (Specifically, it is not a submanifold of .) Instead, we will consider the set of covariance matrices of fixed rank , which is known to be a manifold; see, e.g., [59]. Among others, [50] and [44] investigated the full-rank case () and pioneered building matrix functions using geodesics on the manifold of symmetric positive-definite (SPD) matrices. For the low-rank case (), several geometries are available [4, 59, 60]. We will resort to the geometry proposed in [26] and further developed in [41], as it has the crucial advantage of providing a closed-form expression for the endpoint geodesic problem, which appears as a pervasive subproblem in Bézier-type interpolation techniques on manifolds.

We point out that univariate interpolation on the manifold of fixed-rank PSD matrices has already been applied, e.g., to the wind field approximation problem in [22], to protein conformation in [38]

, to computer vision and facial recognition in 

[27, 58]

, and to parametric model order reduction in 

[42]. The present work is, to our knowledge, a first step towards the multivariate case.

There are three original contributions in this paper. First, we devise new low-rank parameterized covariance families, based on given problem-specific anchor covariance matrices. The rank of the anchor matrices is assumed to be equal to some value , usually much smaller than the size of the matrices. The resulting covariance families are shown to contain only matrices of rank less than or equal to

. In high-dimensional applications, this allows reducing the computational cost of matrix manipulations, while still explaining the majority of the variance of the data. Working with low-rank covariance families also results in robustness to

small

data. Second, we propose minimization of an appropriate loss function as an alternative to maximum likelihood estimation for selecting the most representative member of this covariance family given some data. When the covariance family is low-rank, maximum likelihood estimation is not trivial, as the probability density of the data (assumed Gaussian) is degenerate. Third, we demonstrate the previous two points in an application: wind field velocity characterization for UAV navigation. We notice that when connecting anchors labeled with different values of the prevailing wind conditions, the values in between correspond to intermediate wind conditions.

The rest of this paper is organized as follows. We summarize the tools needed to work on the manifold of fixed-rank PSD matrices in Section 2. We introduce our new covariance functions in Section 3, for both the one- and the multi-parameter cases. In Section 4, we present methods to solve the covariance identification problem via distance minimization. In Section 5, we illustrate the behaviors of the different covariance functions on a case study: wind field approximation.

2 The geometry of the set of positive-semidefinite matrices

In this section, we define useful tools to work on the manifold of positive-semidefinite (PSD) matrices, with rank and size , with .

Several metrics have been proposed for this manifold but, to our knowledge, none of them manages to turn it into a complete metric space with a closed-form expression for endpoint geodesics. We use the quotient geometry proposed in [26] and further developed in [41], with endowed with the Euclidean metric. This geometry relies on the fact that a matrix can be factorized as , where the factor has full column rank. The decomposition is not unique, as each factor , with

an orthogonal matrix, leads to the same product. As a consequence, any PSD matrix

is represented by an equivalence class:

In our computations, we work with representatives of the equivalence classes. For example, the geodesic between two PSD matrices and will be computed based on two arbitrary representatives , , of the corresponding equivalence classes. The geodesic will of course be invariant to the choice of the representatives. Moreover, this approach saves computational cost as the representatives are of size , instead of .

In [41], the authors propose an expression for the main tools to perform computations on , endowed with this geometry. The geodesic between two PSD matrices , , with representatives and , is given by:

(1)

In this expression, the vector

is defined as with a polar decomposition. In the generic case where is nonsingular, the polar decomposition is unique and the resulting curve is the unique minimizing geodesic between and . This curve has the following properties:

  1. , and .

  2. For each , ,

where the notation stands for the set of positive-semidefinite matrices of rank upper-bounded by .

Notice that the last property suggests that this manifold is not a complete metric space for this metric, because the points in the geodesics are not necessarily of rank . However, completeness is not central for statistical modeling, and, as we will see in Section 5, the fact that not all covariance matrices have the same rank will not have any consequence in practice.

We finally mention that [41] also contains expressions for the exponential and logarithm maps, on which rely the patchwise Bézier surfaces introduced in Section 3.3.2 below. Referring to the PSD matrices discussed above (e.g., ) as “data matrices,” we make the following assumption:

Hypothesis 1

The data matrices are such that all the logarithm and exponential maps to which we refer in the sequel are well-defined.

O(ÆHypothesis 1 will typically be satisfied when the data matrices are sufficiently close to each other; see [41] for more information.

Instead of working directly on the quotient manifold (which involves the computation of geodesics, exponential and logarithm maps), a simpler approach consists in working on an affine section of the quotient. Consider an equivalence class with a representative of the class. We define the section of the quotient at as the set of points:

(2)

where the matrix is any orthonormal basis for the orthogonal complement of , i.e., and . The constraint guarantees that there is at most one representative of each equivalence class in the section, and exactly one under the generic condition that is nonsingular.

Consider the section of the quotient at . The representative in the section of any equivalence class (with nonsingular) is then , where is the orthogonal factor of the polar decomposition of . Once all the points are projected on the section, we can simply perform Euclidean operations on the section.

3 Construction of low-rank covariance functions

A low-rank covariance function is a mapping from a set of parameters to a low-rank PSD matrix.

Definition 1 (Low-rank covariance function and family)

A -parameter low-rank covariance function is a map , for ; its corresponding covariance family is the image of .

3.1 First-order covariance functions

In this section, we consider two possible generalizations of multilinear interpolation to manifolds. The simplest way consists in mapping all the points to a linear approximation of the manifold (here, a section of the quotient), and applying multilinear interpolation on the section. A second approach resorts to the geodesics (generalization of straight lines) on the manifold. It is interesting to notice that both reduce to the one-parameter geodesic eq. 1 in the one-parameter case if the reference point of the section belongs to the geodesic (cf. Remark 1).

3.1.1 First-order sectional covariance function

The main idea is to consider a section eq. 2, projecting the data matrices to that section and performing multilinear interpolation on it. The definition below presents the sectional covariance function in the case of bilinear interpolation: we explain the steps to obtain it. A schematic representation can be seen in Figure 1.

Definition 2 (The sectional -parameter covariance function)

The sectional -parameter covariance function is obtained as follows:

  1. Select a member of the equivalence class .

  2. Intersect the equivalence classes of data matrices with the section defined by to obtain:

  3. Perform any type of Euclidean multilinear interpolation of the projected data matrices. For instance, in the bilinear case (two parameter and four data matrices):

  4. Multiply the factor by its transpose to obtain the full matrix:

Figure 1: Schematic representation of the sectional covariance function. The vertical lines correspond to equivalence classes and the horizontal line is the section into which the data matrices are projected.

3.1.2 First-order geodesic covariance function

By connecting geodesics of the form in eq. 1, we can obtain the geodesic -parameter covariance function, which is described below and represented in Figure 2.

Definition 3 (The geodesic -parameter covariance function)

By connecting two one-parameter geodesics, we obtain the geodesic two-parameter covariance function:

(3)

Recursively, we can construct the geodesic -parameter covariance function.

Figure 2: Configuration of the points in a Euclidean setting. Example for two parameters.

In order to build a geodesic -parameter covariance function, according to the definition, we need data matrices.

Remark 1

For the one-parameter case, if the reference point of the section is on the geodesic, the sectional and geodesic families coincide. Indeed, using the relationships from Section 2, notice that the geodesic (geodesic one-parameter function) can be converted to a sectional one-parameter covariance function:

where is the projection of into the section defined by .

3.2 Piecewise first-order covariance functions

In practical applications, it is desirable to build covariance functions by patches. Each patch only depends on a reduced number of data matrices, usually those that are “closest” in some sense. Let us illustrate using patches of two-parameter covariance functions. We assume that the data matrices used to define the family (equivalently called the “anchor” covariance matrices) are described by a grid of indices: this collection consists of matrices , with , and with and . The two covariance families proposed in the previous section can be computed on each patch of the grid, to build patchwise multilinear surfaces. The resulting surfaces will be denoted and , where the stands for Linear (those surfaces were obtained as generalization of linear interpolation on manifolds), the for Section, and the for Geodesic.

Definition 4

Given a grid of points , with , and with and , the first-order patchwise surfaces and are respectively the unions of the surfaces and , each defined as follows. Let and be the indices of a patch. The function , delimited by the data matrices , , , , is the sectional two-parameter covariance function defined in Definition 2, with bilinear interpolation of the representatives of the points , , , in the section chosen for that patch. Examples of choices of sections are given in Section 5.

The function is the geodesic two-parameter covariance function, defined in Definition 3, that interpolates the points , , , .

Proposition 1

If for each patch, the data matrices located on the corners of the patch satisfy O(ÆHypothesis 1, it holds, except possibly for a zero measure set, that and .

Proof:

For , and for , let be a representative of the equivalence class associated with . Consider the function . This function is zero if and only if has rank . Moreover, under O(ÆHypothesis 1, this function is real analytic. Indeed, in the case , it is a polynomial in and , which is real analytic. In the case , it can be readily checked that:

where (resp. ) is the orthogonal factor of the polar decomposition of (resp. ), and is the orthogonal factor of the polar decomposition of the matrix ; see [41] for more information. If O(ÆHypothesis 1 holds, this matrix is nonsingular for all  [41]. The polar decomposition of a one-variable matrix function , with nonsingular, is real analytic [43, Theorem 3]. We then conclude the proof using the fact that the set of zeros of a real analytic function has measure zero; see, e.g., [29, Remark 5.23].

3.3 Higher-order covariance functions using Bézier curves

The previous section presented two families defined as generalizations of multilinear interpolation on the manifold. In this section, we consider higher-order interpolation on the manifold. We focus on patchwise Bézier surfaces on the grid. Again, we will distinguish two cases: methods resorting to Euclidean algorithms in a section of the manifold and methods based on successive evaluations of geodesics.

3.3.1 Higher-order sectional covariance function

This method consists in first projecting all the data matrices on a section of the manifold (cf. Equation 2), and then using a classical Euclidean Bézier interpolation algorithm in the section.

We focus on patchwise cubic Bézier surfaces. Those surfaces are defined by a set of control points , with . The cubic Bézier surface is defined as:

where with is the Bernstein polynomial of order .

We define the patchwise cubic Bézier surface on the section, denoted , as follows.

Definition 5

Assume that we have a set of data matrices , with , and with and . Choose a section of the quotient manifold and map all the data matrices to this section to obtain , with , and . Let be the patchwise Bézier surface interpolating the data matrices and computed as in [2]. The surface is obtained as follows.

(4)
Proposition 2

Apart from possibly a zero measure set, it holds that .

Proof:

For all , let be a representative in of . The proof is an immediate extension of Proposition 1, using the fact that is a polynomial function of and .

3.3.2 Higher-order covariance function based on the exp and log

In this case, we refer to the Bézier surface interpolation algorithm on manifolds, proposed in [2]. This algorithm relies on the expressions for the logarithm and the exponential map, provided in [41] for the manifold . We define the surface (patchwise Bézier-like on the manifold) as follows.

Definition 6

Let be a data set, with , and with , . The surface is the surface obtained by applying the Bézier surface interpolation algorithm on manifolds proposed in [2], with type-II reconstruction (cf. [2, Definition 4]), and with the control points chosen as suggested in [1].

Proposition 3

Under O(ÆHypothesis 1, it holds that , apart from possibly a zero measure set.

Proof:

For all , let be a representative in of . The proof is an immediate extension of Proposition 1, using the fact that can be written only in terms of polynomial terms in the variables and , and orthogonal factors of some real analytic matrix functions . (This last fact is a direct consequence of the reconstruction method used: interpolation is first performed along one variable, and then along the other.)

3.4 Interpolation of labelled matrices using covariance functions

If the data matrices are labelled by certain known parameters , i.e., , it is also possible to use the covariance functions in Section 3 to perform interpolation. (In Section 5, for example, will correspond to the magnitude and heading of the prevailing wind.)

Problem 1 (Interpolation with low-rank covariance functions)

Given matrices for and an associated low-rank covariance function, evaluate this function for any .

Of course, performing interpolation requires mapping from the label to an element of the input domain of the chosen low-rank covariance function. For example, when using the two-parameter covariance functions detailed above, constructed from data matrices, we need to map from a subset of to . We will usually use affine mappings for this purpose; more details are given in Section 5, where we evaluate the interpolation capabilities of one-parameter (Section 5.2) and multi-parameter (Section 5.4.1) covariance functions.

4 Covariance identification using distance minimization

Having proposed multi-parameter low-rank covariance families in the previous section, we can now describe identification procedures within such families. That is, given a data set (assumed to be centered, with ) from which we construct a sample covariance matrix , we would like to find the most representative member of a family.

A widely used methodology for selecting a representative member of any parametric covariance family (like

) is maximum likelihood estimation. That is, the data are assumed to have a certain probability distribution, e.g.,

, and we choose to maximize the resulting likelihood function . Since

are low-rank, however, the Gaussian distribution of the data is degenerate. Thus the associated probability density function is not well-defined for generic

and maximizing the likelihood becomes non-trivial. (If the matrices were instead SPD and as

, this problem is equivalent to minimizing the Kullback–Leibler divergence of

from , known as reverse information projection [20, 10, 15].)

Rather than maximizing the likelihood, we propose to minimize a particular distance from the covariance function to the sample covariance matrix .

Problem 2 (Minimum Frobenius distance covariance identification)

where is the Frobenius distance.

This is a particular instance of “minimum distance estimation”; such estimators, in general, have a long history in statistics [61].

We now discuss solutions to O(ÆProblem 2 given particular constructions of the covariance function . Since the geodesic and sectional approaches to define covariance functions coincide for the one-parameter case under some conditions (cf. Remark 1), we divide this section in two parts: the one-parameter case () and the multi-parameter case. For the latter, we focus on the case of two parameters ().

4.1 One-parameter first-order covariance function

For , the covariance function is simply the geodesic between the two data matrices. The optimization problem has a closed form solution that is presented below.

Proposition 4 (Solution of the low-rank covariance identification problem)

The solutions of O(ÆProblem 2 for are the roots of a third order polynomial with:

Proof:

The cost function is:

The third-order polynomial is obtained after setting the derivative to zero and noting that the optimization problem is unconstrained.

Proposition 4 provides at least one solution. If there are three roots, the minimizer is of course the one with smallest objective. As with any third order polynomial, the uniqueness condition of this solution is:

The computational cost of finding the solution of the low-rank covariance identification problem is only . Indeed, roots of the cubic equation have a closed form expression whose evaluation does not require any meaningful cost. The only computational cost is that associated with computing traces to obtain the polynomial coefficients. By virtue of the cyclic property of the trace, we can compute these traces with elementary operations.

4.2 Two-parameter first-order covariance functions

Here we focus on O(ÆProblem 2 in the two-parameter case () for first-order covariance functions, i.e., or (cf. Definition 4). Similarly to the previous sections, we assume that data matrices are defined on a grid of points , with ), and .

4.2.1 First-order sectional covariance function

In the case (see Definition 4), we propose to use a gradient descent on each patch of the surface. Observe indeed that the surface is generally nondifferentiable (actually, even noncontinuous) on the borders of the patches. The global optimum is then computed as the minimum of the optima obtained on the patches. Let

(5)

be the cost function to minimize.

Consider an arbitrary patch , with and . Let be the restriction of to the patch , and let , , , denote the four corners of the patch , and , , , their projection on the section. We omit the superscript in the remainder of this section. The gradient of the restriction of the cost function (5) to that patch can be computed explicitly:

(6)
(7)

with and defined as:

The factor was defined in Definition 2:

So and can be easily obtained as:

with the only difference being the parameter vs. .

4.2.2 First-order geodesic covariance function

We focus now on O(ÆProblem 2 for , when the covariance function is the surface defined in Definition 4. Let

(8)

be the cost function. The surface is generally not differentiable on the borders of the patches. As a result, similarly to the previous section, we propose to run an optimization algorithm to find the optimum on each patch, and to compare the optimal values obtained on the patches to obtain the global optimum.

Let be the restriction of to the patch . We propose to minimize by expressing it as a one-variable function, replacing by its optimal value:

(9)

and then to apply gradient descent to the problem:

(10)

The computation of the partial derivatives required by both steps is deferred to Appendix A.

4.3 Higher-order covariance functions using Bézier curves

We now solve O(ÆProblem 2 for when the surface is defined from Bézier interpolating surfaces.

4.3.1 Higher-order sectional covariance function

To solve O(ÆProblem 2 for when the surface is a Euclidean Bézier surface built in a given section of the manifold, we propose again to use steepest descent. The cost function

(11)

with defined in Definition 5, is . Moreover, since Bézier curves in the Euclidean space are weighted sums of Bernstein polynomials, the gradient can be computed explicitly. The computation of the gradient is similar to Section 4.2.1, except that now is obtained as a linear combination of cubic Bernstein polynomials:

The derivatives and become:

4.3.2 Higher-order covariance function based on the exponential and logarithm maps

For in Definition 6, it remains unclear whether the gradient of the cost function:

(12)

has an analytical expression. Variable projection methods also do not seem applicable in this case. Thus we have to estimate the gradient numerically, resorting to finite differences.

5 Case study: wind field approximation

Given the increasing popularity of unmanned aerial vehicles (UAVs) in transportation, surveillance, agriculture, and beyond, accurate and safe aerial navigation is essential. Achieving these requirements demands expressive models of the UAV’s environment—in particular, the wind field—and the ability to update these models given new observations, e.g., via Kalman filtering

[49, 16]. To this end, we wish to construct and estimate the covariance of spatially distributed wind velocity components.

5.1 Model problem and data set

Gaussian random field (GRF) models have previously been used to describe wind velocities (e.g., [62, 33]). A common practice in this setting is to define the covariance matrix of the velocities using the (smooth) squared-exponential kernel, perhaps with some modifications to allow for non-stationarity [34]. We instead assume to have instances of the covariance matrix for different values of the prevailing wind heading and magnitude ; from these instances, we will build a covariance family for continuous . The wind field can change dramatically as function of the prevailing wind, and thus it is useful to consider a covariance family built from a variety of representative prevailing wind settings.

In general, these instances could be estimated from observational data, or they could be constructed using offline (and potentially expensive) computational fluid dynamics simulations. Here we use the latter: we solve the unsteady incompressible Navier–Stokes equations on the two-dimensional domain shown in Figure 4, using direct numerical simulation with a spectral element method. The Reynolds number in our simulations, defined according to the side-length of the central obstacle, is around 500 for . For each chosen value of , we run the simulation until any transients due to the initial condition have dissipated and then collect instantaneous velocity fields as “snapshots,” shown in Figure 4. The sample covariance of these snapshots provides the data covariance matrix at that .

The right plot of Figure 3 represents a notional idea of our example domain: flow around a rectangular cuboid in three dimensions. We consider only a horizontal “slice” of this domain, e.g., the wind in a plane at height sufficiently far from the ground and from top of the obstacle so that a two-dimensional approximation is reasonable. The left plot of Figure 3 shows the mean value of the velocity on this plane, at an example value of . The grid size is 39 39, and hence the discretized wind field has degrees of freedom: two velocity components at each grid point, subtracting 9 points for the obstacle. The grey contours represent the pointwise variance of the -velocity plus that of the -velocity (i.e., the sum of two diagonal entries of the covariance matrix, at each point in space). Naturally, the variance is larger downstream of the obstacle, where vortices are shed.

Our data set for the examples below comprises a set of covariance matrices , with