The classical (finite-dimensional) Mahalanobis distance and its applications
be a random variable taking values inwith non-singular covariance matrix . In many practical situations it is required to measure the distance between two points when considered as two possible observations drawn from . Clearly, the usual (square) Euclidean distance
is not a suitable choice since it disregards the standard deviations and the covariances of the components of
(given a column vectorwe denote by the transpose of ). Instead, the most popular alternative is perhaps the classical Mahalanobis distance, , defined as
Very often the interest is focused on studying “how extreme” a point is within the distribution of ; this is typically evaluated in terms of , where stands for the vector of means of .
This distance is named after the Indian statistician P. C. Mahalanobis (1893-1972) who first proposed and analyzed this concept (Mahalanobis, 1936)
in the setting of Gaussian distributions. Nowadays, some popular applications of the Mahalanobis distance are: supervised classification, outlier detection (Rousseeuw and van Zomeren (1990) and Penny (1996)), multivariate depth measures (Zuo and Serfling (2000)), hypothesis testing (through Hotelling’s statistic, Rencher (2012, Ch. 5)) or goodness of fit (Mardia (1975)). This list of references is far from exhaustive.
On the difficulties of defining a Mahalanobis-type distance for functional data
Our framework here is Functional Data Analysis (FDA); see, e.g., Cuevas (2014) for an overview. In other words, we deal with statistical problems involving functional data. Thus our sample is made of trajectories in drawn from a second order stochastic process with . The inner product and the norm in will be denoted by and , respectively (or simply and when there is no risk of confussion). We will henceforth assume that the covariance function is continuous and positive definite. The function defines a linear operator , called covariance operator, given by
The aim of this paper is to extend the notion of the multivariate (finite-dimensional) Mahalanobis distance (1) to the functional case when . Clearly, in view of (1), the inverse of the functional operator should play some role in this extension if we want to keep a close analogy with the multivariate case. Unfortunately, such a direct approach utterly fails since, typically, is not invertible in general as an operator, in the sense that there is no linear continuous operator such that , the identity operator.
To see the reason for this crucial difference between the finite and the infinite-dimensional cases, let us recall that some elementary linear algebra yields the following representations for and ,
are the, strictly positive, eigenvalues ofand
the corresponding orthonormal basis of eigenvectors.
In the functional case, the classical Karhunen-Loève Theorem (see, e.g., Ash and Gardner (2014)) provides (in uniformly on ) where the
is the basis of orthonormal eigenfunctions ofand the are uncorrelated random variables with , the eigenvalue of corresponding to . Then, we have
Note that the continuity of implies , thus is in fact a compact, Hilbert-Schmidt operator. In addition, it is easy to check so that, in particular, the sequence converges to zero very quickly. As a consequence, there is no hope of keeping a direct analogy with (3) since
will not define in general a continuous operator with a finite norm. Still, for some particular functions the series in (4) might be convergent. Hence we could use it formally to define the following template which, suitably modified, could lead to a general, valid definition for a Mahalanobis-type distance between two functions and ,
for all such that the series in (5) is finite. We are especially concerned with the case where is a trajectory from a stochastic process and is the corresponding mean function. As we will see below, this entails some especial difficulties.
The organization on this work
In the next section some theory of RKHS and its connection with the Mahalanobis distance is introduced, together with the proposed definition. In Section 3 some properties of the proposed distance are presented and compared with those of the original multivariate definition. Then, a consistent estimator is analyzed in Section 4. Finally, some numerical outputs corresponding to different statistical applications can be found in Section 5.
2 A new definition of Mahalanobis distance for functional data
Motivated by the previous considerations, Galeano et al. (2015) and Ghiglietti et al. (2017) have suggested two functional Mahalanobis-type distances, that we will comment at the end of this section. These proposals are natural extensions to the functional case of the multivariate notion (1). Moreover, as suggested by the practical examples considered in both works, these options performed quite well in many cases. However, we believe that there is still some room to further explore the subject for the reasons we will explain below.
In this section we will propose a further definition of a Mahalanobis-type distance, denoted . Its most relevant features can be summarized as follows:
depends on a single, real, easy to interpret smoothing parameter whose choice is not critical, in the sense that the distance has some stability with respect to . Hence, it is possible to think of a cross-validation or bootstrap-based choice of . In particular, no auxiliary weight function is involved in the definition.
is a true metric which is defined for any given pair of functions in . It shares some invariance properties with the finite-dimensional counterpart (1).
If , the distribution of is explicitly known for Gaussian processes. In particular, and have explicit, relatively simple expressions.
The main contribution of this paper is to show that the theory of Reproducing Kernel Hilbert Spaces (RKHS) provides a natural and useful framework in order to propose an extension of the Mahalanobis distance to the functional setting, satisfying the above mentioned properties. So we next give, for the sake of completeness, a very short overview of the RKHS theory, just focused on the features we will use here. We refer to Berlinet and Thomas-Agnan (2004), Appendix F in Janson (1997) and Schölkopf and Smola (2002), for a more detailed treatment of the subject.
2.1 RKHS’s and the Mahalanobis distance
The starting element in the construction of an RKHS space of real functions in is a positive semidefinite function , . For our purposes, will be the continuous positive definite covariance function of the process that generates our functional data.
Let us first consider the following auxiliary space of functions generated by ,
This is a pre-Hilbert space when endowed with the inner product
where and . Note that, as is assumed to be strictly positive definite, the elements of have a unique representation in terms of .
Now, the RKHS associated with is just defined as the completion of . More precisely, the RKHS is the set of functions that are the -pointwise limit of some Cauchy sequence in (see Berlinet and Thomas-Agnan (2004), p. 18). The corresponding inner product in is also denoted .
The term “reproducing” in the name of these spaces is after the following “reproducing property”,
To see the connection with the Mahalanobis distance, let us consider a random vector , instead of the whole stochastic process , . The covariance function would be then replaced with the covariance matrix whose -entry is . From the Moore-Aronszajn Theorem we know that there exists a unique RKHS, , in whose reproducing kernel is see, Hsing and Eubank (2015a), p.47–49 or Berlinet and Thomas-Agnan (2004), p. 19.
From the definition (6) of it is clear that, in this case, this space is just the image of the linear application defined by , that is, it consists of the vectors that can be written as for some . Moreover, according to (7), the inner product between two elements and of this space is given by . On the other hand, since is here a finite-dimensional space, it agrees with its completion .
If we assume that has full rank (if not, the generalized inverse should be used), this product can be rewritten as
Then, the squared distance between two vectors associated with this inner product can be expressed as
where in the last equality we have used the second equation in (3).
We might summarize the above elementary discussion in the following statements:
(a) The RKHS distance in the RKHS associated with a finite-dimensional covariance operator, given by a positive definite matrix , can be expressed as a simple sum involving the inverse eigenvalues of , as shown in (8).
(b) Such RKHS distance coincides with the corresponding Mahalanobis distance between and .
At this point it is interesting to note that the above statement (a) can be extended to the infinite-dimensional case, as pointed out in the following lemma.
Let be the positive eigenvalues of the integral operator associated with the kernel . Let us denote by the corresponding unit eigenfunctions. For ,
and then the RKHS can be also rewritten as
In particular, the functions are an orthonormal basis for .
This result is just a rewording of the following theorem, whose proof can be found in Amini and Wainwright (2012):
Theorem.- Under the indicated conditions, the RKHS associated with can be written
where the convergence of the series is in . This space is endowed with the inner product where and .
The result follows by noting that for any we can write
Then, if the coefficients tend to zero fast enough so that , we have and we get the expression (9) for . ∎
This result sheds some light on the following crucial question: to what extent the formal expression (5) can be used to give a general definition of the functional Mahalanobis distance? In other words, for which functions does the series in (5) converge in ? The answer is clear in view of Lemma 1: expression (5) is well defined if and only if . This amounts to ask for a strong, very specific, regularity condition on .
The bad news is that, as a consequence of a well-known result (see, e.g. Lukić and Beder (2001)), Cor. 7.1) if is a Gaussian process with mean and covariance functions and , respectively, such that and is infinite-dimensional, then
, whenever the probabilityis assumed to be complete.
Hence, with probability one, expression (5) is not convergent for the trajectories drawn from the stochastic process .
2.2 The proposed definition
In view of the discussion above (see statement (b) before Lemma 1), it might seem natural to define the (square) Mahalanobis functional distance between a trajectory of the process and a function by . However, this idea does not work since, as indicated above, the trajectories of do not belong to with probability one.
This observation suggest us the simple strategy we will follow here: given two functions , just approximate them by two other functions and calculate the distance . It only remains to decide how to obtain the RKHS approximations and . One could think of taking as the “closest” function to in but this approach also fails since is dense in whenever all are strictly greater than zero (Remark 4.9 of Cucker and Zhou (2007)). Thus, every function can be arbitrarily well approximated by functions in .
This leads us in a natural way to the following penalization approach. Let us fix a penalization parameter . Given any , define
As we will see below, the “penalized projection” is well-defined. In fact it admits a relatively simple closed form. Finally, the definition we propose for the functional -Mahalanobis distance is
As mentioned, given a realization of the stochastic process we have relatively simple expressions for both the smoothed trajectory and the proposed distance. In the next result we summarize these expressions.
Given a second order process with covariance , we denote as the integral covariance operator of Equation (2) associated with . Then the smoothed trajectories defined in (11) satisfy the following basic properties:
Let be the identity operator on . Then, is invertible and
where , are the eigenvalues of (which are strictly positive under our assumptions) and stands for the unit eigenfunction of corresponding to .
Denoting as the square root operator defined by , the norm of in satisfies
(a) The fact that is invertible is a consequence of Theorem 8.1 in (Gohberg and Goldberg, 2013, p. 183). The expression for follows straightforwardly from Proposition 8.6 of (Cucker and Zhou, 2007, p.139). Moreover, expression (8.4) in (Gohberg and Goldberg, 2013, p. 184) yields
Then, using the Spectral theorem for compact and self-adjoint operators (for instance Theorem 2 of Chapter 2 of Cucker and Smale (2001)) we get:
The expression given in (12) defines a metric in .
The expression obtained in the first part of Proposition 1 has an interesting intuitive meaning: the transformation takes first the function to the space , made of much nicer functions, with Fourier coefficients converging quickly to zero, since we must have ; see (10). Then, after this “smoothing step”, we perform an “approximation step” by applying the inverse operator , in order the get, as a final output, a function that is both, close to and smoother than . Note also that the operator is compact. Thus, if we assume that the original trajectories are uniformly bounded in , the final result of applying on these trajectories the transformation would be to take them to a pre-compact set of . This is very convenient from different points of view (beyond our specific needs here), in particular when one needs to find a convergent subsequence inside a given bounded sequence of ’s.
2.3 Some previous proposals
Motivated by the heuristic spectral version (5) of the Mahalanobis distance, Galeano et al. (2015) have proposed the following definition, that avoids the convergence problems of the series in (5) (provided that ) at the expense of introducing a sort of smoothing parameter ,
We keep the notation used in Galeano et al. (2015). Let us note that is a semi-distance, since it lacks the identifiability condition . The applications of considered by these authors focus mainly on supervised classification. While this proposal is quite simple and natural, it suffers from some insufficiencies when considered from the theoretical point of view. The most important one is the fact that the series (17) is divergent, with probability one, whenever is a trajectory of a Gaussian process with mean function and covariance function (as we have just seen). So, is defined in terms of the -th partial sum of a divergent series. As a consequence, one may expect that the definition might be strongly influenced by the choice of . As we will discuss below, in practice this effect is not noticed if is replaced with a smoothed trajectory but, in that case, the smoothing procedure should be incorporated to the definition.
where and is a weight function such that , is non-increasing and non-negative and . Moreover, for any , is assumed to be non-decreasing in with . This definition does not suffer from any problem derived from degeneracy but, still, it depends from two smoothing functions: the exponential in the denominator of (18) and the weighting function . As pointed out also in Ghiglietti et al. (2017), a more convenient expression for (18) is given by the following weighted version of the template, formal definition (5),
3 Some properties of the functional Mahalanobis distance
In this section we analyze in detail and prove some of the features of we have anticipated above. In what follows , with will stand for a second-order stochastic process with continuous trajectories and continuous mean and covariance functions, denoted by and , respectively.
In the finite dimensional case, one appealing property of the Mahalanobis distance is the fact that it does not change if we apply a non-singular linear transformation to the data. Then, the invariance for a large class of linear operators appears also as a desirable property for any extension of the Mahalanobis distance to the functional case. Here, we will prove invariance with respect to operators preserving the norm. We recall that an operatoris an isometry if it maps to and . In this case, it holds , where stands for the adjoint of .
Let be an isometry on . Then, for all , where was defined in (12).
Let be the covariance operator of the process . The first step of the proof is to show that . It is enough to prove that for all , it holds . Observe that
Then, using Fubini’s theorem and the definition of the adjoint operator:
Analogously, we also have
From the last two equations we conclude .
The second step of the proof is to observe that the eigenvalues of are the same as those of , and the unit eigenfunction of for the eigenvalue is given by , where is the unit eigenfunction corresponding to . Indeed, using we have
Then, by (14) and using that is an isometry,
The family of isometries on contains some interesting examples. For instance, all the symmetries and translations are isometries, as well as the changes between orthonormal bases. Thus, this distance does not depend on the basis on which the data are represented.
3.2 Distribution for Gaussian processes
We have mentioned in the introduction that the squared Mahalanobis distance to the mean for Gaussian data has a distribution with degrees of freedom, where is the dimension of the data. In the functional case, the distribution of for a Gaussian process equals that of an infinite linear combination of independent
random variables. We prove this fact in the following result and its corollary, and also give explicit expressions for the expectation and the variance of.
Let be an Gaussian process with mean and continuous positive definite covariance function . Let be the eigenvalues of and let be the corresponding unit eigenfunctions.
The squared Mahalanobis distance to the origin satisfies
where and , , are non-central random variables with non-centrality parameter , with .
(a) Using (14), , where and . Since the process is Gaussian the variables
are independent with normal distribution, meanand variance 1 (see Ash and Gardner (2014), p. 40). The result follows.
(b) It is easy to see that the partial sums in (20) form a sub-martingale with respect to the natural filtration ,
Moreover, if , which is always finite,
because and (see e.g. Cucker and Smale (2001), Corollary 3, p. 34). Now, Doob’s convergence theorem implies a.s. as , and Monotone Convergence theorem yields the expression for the expectation of .
The proof for the variance is fairly similar. Using Jensen’s inequality, we deduce
Moreover, since the variables are independent:
Then, a.s., as , and using Monotone Convergence theorem,
When we compute the squared Mahalanobis distance to the mean the expressions above simplify because for each , and then we have the following corollary.
Under the same assumptions of Proposition 2, , where and are independent random variables. Moreover, and .
3.3 Stability with respect to
Our definition of distance depends on a regularization parameter . In this subsection we prove the continuity of with respect to the tuning parameter . The proof of the main result requires the following auxiliary lemma, which has been adapted from Corollary 8.3 in Gohberg and Goldberg (2013), p. 71. Recall that given a bounded operator on a Hilbert space we can define the norm
Let , , be a sequence of bounded invertible operators on a Hilbert space which converges in norm to another operator , and such that . Then is also invertible, and , as .
We will apply the preceding lemma in the proof of the following result.
Let be a sequence of positive real numbers such that , as . Then, a.s. as .
Observe that Proposition 3 implies the point convergence of the sequence of distribution functions of