# On Mahalanobis distance in functional settings

Mahalanobis distance is a classical tool in multivariate analysis. We suggest here an extension of this concept to the case of functional data. More precisely, the proposed definition concerns those statistical problems where the sample data are real functions defined on a compact interval of the real line. The obvious difficulty for such a functional extension is the non-invertibility of the covariance operator in infinite-dimensional cases. Unlike other recent proposals, our definition is suggested and motivated in terms of the Reproducing Kernel Hilbert Space (RKHS) associated with the stochastic process that generates the data. The proposed distance is a true metric; it depends on a unique real smoothing parameter which is fully motivated in RKHS terms. Moreover, it shares some properties of its finite dimensional counterpart: it is invariant under isometries, it can be consistently estimated from the data and its sampling distribution is known under Gaussian models. An empirical study for two statistical applications, outliers detection and binary classification, is included. The obtained results are quite competitive when compared to other recent proposals of the literature.

## Authors

• 3 publications
• 2 publications
• 6 publications
04/17/2013

### The Mahalanobis distance for functional data with applications to classification

This paper presents a general notion of Mahalanobis distance for functio...
03/08/2021

### A reproducing kernel Hilbert space framework for functional data classification

We encounter a bottleneck when we try to borrow the strength of classica...
09/26/2019

### Model-based Statistical Depth with Applications to Functional Data

Statistical depth, a commonly used analytic tool in non-parametric stati...
04/26/2021

### Finite sample approximations of exact and entropic Wasserstein distances between covariance operators and Gaussian processes

This work studies finite sample approximations of the exact and entropic...
10/24/2019

### A test for Gaussianity in Hilbert spaces via the empirical characteristic functional

Let X_1,X_2, ... be independent and identically distributed random eleme...
01/10/2020

### Discussion of "Functional Models for Time-Varying Random Objects” by Dubey and Müller

The discussion focuses on metric covariance, a new association measure b...
02/11/2021

### Continuum centroid classifier for functional data

Aiming at the binary classification of functional data, we propose the c...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The classical (finite-dimensional) Mahalanobis distance and its applications

Let

be a random variable taking values in

with non-singular covariance matrix . In many practical situations it is required to measure the distance between two points when considered as two possible observations drawn from . Clearly, the usual (square) Euclidean distance

is not a suitable choice since it disregards the standard deviations and the covariances of the components of

(given a column vector

we denote by the transpose of ). Instead, the most popular alternative is perhaps the classical Mahalanobis distance, , defined as

 M(x1,x2) = ((x1−x2)′Σ−1(x1−x2))1/2. (1)

Very often the interest is focused on studying “how extreme” a point is within the distribution of ; this is typically evaluated in terms of , where stands for the vector of means of .

This distance is named after the Indian statistician P. C. Mahalanobis (1893-1972) who first proposed and analyzed this concept (Mahalanobis, 1936)

in the setting of Gaussian distributions. Nowadays, some popular applications of the Mahalanobis distance are: supervised classification, outlier detection (

Rousseeuw and van Zomeren (1990) and Penny (1996)), multivariate depth measures (Zuo and Serfling (2000)), hypothesis testing (through Hotelling’s statistic, Rencher (2012, Ch. 5)) or goodness of fit (Mardia (1975)). This list of references is far from exhaustive.

On the difficulties of defining a Mahalanobis-type distance for functional data

Our framework here is Functional Data Analysis (FDA); see, e.g., Cuevas (2014) for an overview. In other words, we deal with statistical problems involving functional data. Thus our sample is made of trajectories in drawn from a second order stochastic process with . The inner product and the norm in will be denoted by and , respectively (or simply and when there is no risk of confussion). We will henceforth assume that the covariance function is continuous and positive definite. The function defines a linear operator , called covariance operator, given by

 Kf(t) = ∫10K(t,s)f(s)ds. (2)

The aim of this paper is to extend the notion of the multivariate (finite-dimensional) Mahalanobis distance (1) to the functional case when . Clearly, in view of (1), the inverse of the functional operator should play some role in this extension if we want to keep a close analogy with the multivariate case. Unfortunately, such a direct approach utterly fails since, typically, is not invertible in general as an operator, in the sense that there is no linear continuous operator such that , the identity operator.

To see the reason for this crucial difference between the finite and the infinite-dimensional cases, let us recall that some elementary linear algebra yields the following representations for and ,

 Σx=d∑i=1λi(e′ix)ei,  Σ−1y=d∑i=11λi(e′ix)ei (3)

where

are the, strictly positive, eigenvalues of

and

the corresponding orthonormal basis of eigenvectors.

In the functional case, the classical Karhunen-Loève Theorem (see, e.g., Ash and Gardner (2014)) provides (in uniformly on ) where the

is the basis of orthonormal eigenfunctions of

and the are uncorrelated random variables with , the eigenvalue of corresponding to . Then, we have

 Kx=K(∞∑i=1⟨x,ei⟩ei)

Note that the continuity of implies , thus is in fact a compact, Hilbert-Schmidt operator. In addition, it is easy to check so that, in particular, the sequence converges to zero very quickly. As a consequence, there is no hope of keeping a direct analogy with (3) since

 K−1x=∞∑i=11λi⟨x,ei⟩ei (4)

will not define in general a continuous operator with a finite norm. Still, for some particular functions the series in (4) might be convergent. Hence we could use it formally to define the following template which, suitably modified, could lead to a general, valid definition for a Mahalanobis-type distance between two functions and ,

 ˜M(x,m)=(∞∑i=1⟨x−m,ei⟩2λi)1/2, (5)

for all such that the series in (5) is finite. We are especially concerned with the case where is a trajectory from a stochastic process and is the corresponding mean function. As we will see below, this entails some especial difficulties.

The organization on this work

In the next section some theory of RKHS and its connection with the Mahalanobis distance is introduced, together with the proposed definition. In Section 3 some properties of the proposed distance are presented and compared with those of the original multivariate definition. Then, a consistent estimator is analyzed in Section 4. Finally, some numerical outputs corresponding to different statistical applications can be found in Section 5.

## 2 A new definition of Mahalanobis distance for functional data

Motivated by the previous considerations, Galeano et al. (2015) and Ghiglietti et al. (2017) have suggested two functional Mahalanobis-type distances, that we will comment at the end of this section. These proposals are natural extensions to the functional case of the multivariate notion (1). Moreover, as suggested by the practical examples considered in both works, these options performed quite well in many cases. However, we believe that there is still some room to further explore the subject for the reasons we will explain below.

In this section we will propose a further definition of a Mahalanobis-type distance, denoted . Its most relevant features can be summarized as follows:

• is also inspired in the natural template (5). The serious convergence issues appearing in (5) are solved by smoothing.

• depends on a single, real, easy to interpret smoothing parameter whose choice is not critical, in the sense that the distance has some stability with respect to . Hence, it is possible to think of a cross-validation or bootstrap-based choice of . In particular, no auxiliary weight function is involved in the definition.

• is a true metric which is defined for any given pair of functions in . It shares some invariance properties with the finite-dimensional counterpart (1).

• If , the distribution of is explicitly known for Gaussian processes. In particular, and have explicit, relatively simple expressions.

The main contribution of this paper is to show that the theory of Reproducing Kernel Hilbert Spaces (RKHS) provides a natural and useful framework in order to propose an extension of the Mahalanobis distance to the functional setting, satisfying the above mentioned properties. So we next give, for the sake of completeness, a very short overview of the RKHS theory, just focused on the features we will use here. We refer to Berlinet and Thomas-Agnan (2004), Appendix F in Janson (1997) and Schölkopf and Smola (2002), for a more detailed treatment of the subject.

### 2.1 RKHS’s and the Mahalanobis distance

The starting element in the construction of an RKHS space of real functions in is a positive semidefinite function , . For our purposes, will be the continuous positive definite covariance function of the process that generates our functional data.

Let us first consider the following auxiliary space of functions generated by ,

 H0(K):={f:[0,1]→R: f(⋅)=n∑i=1aiK(ti,⋅), ai∈R, ti∈[0,1], n∈N}. (6)

This is a pre-Hilbert space when endowed with the inner product

 ⟨f,g⟩K=∑i,jαiβjK(ti,sj), (7)

where and . Note that, as is assumed to be strictly positive definite, the elements of have a unique representation in terms of .

Now, the RKHS associated with is just defined as the completion of . More precisely, the RKHS is the set of functions that are the -pointwise limit of some Cauchy sequence in (see Berlinet and Thomas-Agnan (2004), p. 18). The corresponding inner product in is also denoted .

The term “reproducing” in the name of these spaces is after the following “reproducing property”,

 f(t)=⟨f,K(t,⋅)⟩K, for all f∈H(K), t∈[0,1].

To see the connection with the Mahalanobis distance, let us consider a random vector , instead of the whole stochastic process , . The covariance function would be then replaced with the covariance matrix whose -entry is . From the Moore-Aronszajn Theorem we know that there exists a unique RKHS, , in whose reproducing kernel is see, Hsing and Eubank (2015a), p.47–49 or Berlinet and Thomas-Agnan (2004), p. 19.

From the definition (6) of it is clear that, in this case, this space is just the image of the linear application defined by , that is, it consists of the vectors that can be written as for some . Moreover, according to (7), the inner product between two elements and of this space is given by . On the other hand, since is here a finite-dimensional space, it agrees with its completion .

If we assume that has full rank (if not, the generalized inverse should be used), this product can be rewritten as

 ⟨x,y⟩Σ = a′Σb = a′ΣΣ−1Σb = x′Σ−1y.

Then, the squared distance between two vectors associated with this inner product can be expressed as

 ∥x−y∥2Σ = ⟨x−y,x−y⟩Σ = (x−y)′Σ−1(x−y)=d∑i=1((x−m)′ei)2λi, (8)

where in the last equality we have used the second equation in (3).

We might summarize the above elementary discussion in the following statements:

(a) The RKHS distance in the RKHS associated with a finite-dimensional covariance operator, given by a positive definite matrix , can be expressed as a simple sum involving the inverse eigenvalues of , as shown in (8).

(b) Such RKHS distance coincides with the corresponding Mahalanobis distance between and .

At this point it is interesting to note that the above statement (a) can be extended to the infinite-dimensional case, as pointed out in the following lemma.

###### Lemma 1.

Let be the positive eigenvalues of the integral operator associated with the kernel . Let us denote by the corresponding unit eigenfunctions. For ,

 ∥x∥2K = ∞∑i=1⟨x,ei⟩2λi, (9)

and then the RKHS can be also rewritten as

 H(K)={ x∈L2[0,1] : ∞∑i=1⟨x,ei⟩2λi<∞ }.

In particular, the functions are an orthonormal basis for .

###### Proof.

This result is just a rewording of the following theorem, whose proof can be found in Amini and Wainwright (2012):

Theorem.- Under the indicated conditions, the RKHS associated with can be written

 H(K)={x∈L2[0,1] : x=∞∑i=1ai√λiei, for ∞∑i=1a2i<∞}, (10)

where the convergence of the series is in . This space is endowed with the inner product where and .

The result follows by noting that for any we can write

 x = ∞∑i=1⟨x,ei⟩ei = ∞∑i=1⟨x,ei⟩√λi√λiei.

Then, if the coefficients tend to zero fast enough so that , we have and we get the expression (9) for . ∎

This result sheds some light on the following crucial question: to what extent the formal expression (5) can be used to give a general definition of the functional Mahalanobis distance? In other words, for which functions does the series in (5) converge in ? The answer is clear in view of Lemma 1: expression (5) is well defined if and only if . This amounts to ask for a strong, very specific, regularity condition on .

The bad news is that, as a consequence of a well-known result (see, e.g. Lukić and Beder (2001)), Cor. 7.1) if is a Gaussian process with mean and covariance functions and , respectively, such that and is infinite-dimensional, then

, whenever the probability

is assumed to be complete.

Hence, with probability one, expression (5) is not convergent for the trajectories drawn from the stochastic process .

### 2.2 The proposed definition

In view of the discussion above (see statement (b) before Lemma 1), it might seem natural to define the (square) Mahalanobis functional distance between a trajectory of the process and a function by . However, this idea does not work since, as indicated above, the trajectories of do not belong to with probability one.

This observation suggest us the simple strategy we will follow here: given two functions , just approximate them by two other functions and calculate the distance . It only remains to decide how to obtain the RKHS approximations and . One could think of taking as the “closest” function to in but this approach also fails since is dense in whenever all are strictly greater than zero (Remark 4.9 of Cucker and Zhou (2007)). Thus, every function can be arbitrarily well approximated by functions in .

This leads us in a natural way to the following penalization approach. Let us fix a penalization parameter . Given any , define

 xα=\operatornamewithlimitsargminf∈H(K)∥x−f∥22+α∥f∥2K. (11)

As we will see below, the “penalized projection” is well-defined. In fact it admits a relatively simple closed form. Finally, the definition we propose for the functional -Mahalanobis distance is

 Mα(x,m)=∥xα−mα∥K. (12)

As mentioned, given a realization of the stochastic process we have relatively simple expressions for both the smoothed trajectory and the proposed distance. In the next result we summarize these expressions.

###### Proposition 1.

Given a second order process with covariance , we denote as the integral covariance operator of Equation (2) associated with . Then the smoothed trajectories defined in (11) satisfy the following basic properties:

1. Let be the identity operator on . Then, is invertible and

 xα=(K+αI)−1Kx=∞∑j=1λjλj+α⟨x,ej⟩2ej, (13)

where , are the eigenvalues of (which are strictly positive under our assumptions) and stands for the unit eigenfunction of corresponding to .

2. Denoting as the square root operator defined by , the norm of in satisfies

 ∥xα∥2K=∞∑j=1λj(λj+α)2 ⟨x,ej⟩22=∥K1/2(K+αI)−1x∥22, (14)

and therefore,

 Mα(x,m)2=∞∑j=1λj(λj+α)2 ⟨x−m,ej⟩22.
###### Proof.

(a) The fact that is invertible is a consequence of Theorem 8.1 in (Gohberg and Goldberg, 2013, p. 183). The expression for follows straightforwardly from Proposition 8.6 of (Cucker and Zhou, 2007, p.139). Moreover, expression (8.4) in (Gohberg and Goldberg, 2013, p. 184) yields

 (K+αI)−1y=1α(I−K1)y, (15)

where

 K1y=∞∑j=1λjα+λj⟨y,ej⟩2ej. (16)

Then, using the Spectral theorem for compact and self-adjoint operators (for instance Theorem 2 of Chapter 2 of Cucker and Smale (2001)) we get:

 xα=(K+αI)−1Kx=1α∞∑j=1(1−λjα+λj)λj⟨x,ej⟩2ej=∞∑j=1λjα+λj⟨x,ej⟩2ej.

(b) In Lemma 1 we have seen that is an orthonormal basis of . Then (13) together with Parseval’s identity (in ) imply

 ∥xα∥2K=∞∑j=1λj(λj+α)2 ⟨x,ej⟩22.

Moreover, from the Spectral Theorem , then using (15) and (16), , and also

 K1/2(K+αI)−1x=∞∑j=1√λjλj+α⟨x,ej⟩2ej.

Then, using again Parseval’s identity (but now in ) we get

 ∥K1/2(K+αI)−1x∥22=∞∑j=1λj(λj+α)2 ⟨x,ej⟩22=∥xα∥2K.

###### Corollary 1.

The expression given in (12) defines a metric in .

###### Proof.

This result is a direct consequence of Proposition 1. Indeed, from expression (13), the transformation form to in injective (since the coefficients completely determine ). This, together with the fact that is a norm, yields the result. ∎

###### Remark 1.

The expression obtained in the first part of Proposition 1 has an interesting intuitive meaning: the transformation takes first the function to the space , made of much nicer functions, with Fourier coefficients converging quickly to zero, since we must have ; see (10). Then, after this “smoothing step”, we perform an “approximation step” by applying the inverse operator , in order the get, as a final output, a function that is both, close to and smoother than . Note also that the operator is compact. Thus, if we assume that the original trajectories are uniformly bounded in , the final result of applying on these trajectories the transformation would be to take them to a pre-compact set of . This is very convenient from different points of view (beyond our specific needs here), in particular when one needs to find a convergent subsequence inside a given bounded sequence of ’s.

### 2.3 Some previous proposals

Motivated by the heuristic spectral version (

5) of the Mahalanobis distance, Galeano et al. (2015) have proposed the following definition, that avoids the convergence problems of the series in (5) (provided that ) at the expense of introducing a sort of smoothing parameter ,

 dkFM(x,m) = (k∑i=1⟨x−m,ei⟩2λi)1/2. (17)

We keep the notation used in Galeano et al. (2015). Let us note that is a semi-distance, since it lacks the identifiability condition . The applications of considered by these authors focus mainly on supervised classification. While this proposal is quite simple and natural, it suffers from some insufficiencies when considered from the theoretical point of view. The most important one is the fact that the series (17) is divergent, with probability one, whenever is a trajectory of a Gaussian process with mean function and covariance function (as we have just seen). So, is defined in terms of the -th partial sum of a divergent series. As a consequence, one may expect that the definition might be strongly influenced by the choice of . As we will discuss below, in practice this effect is not noticed if is replaced with a smoothed trajectory but, in that case, the smoothing procedure should be incorporated to the definition.

Another recent proposal is due to Ghiglietti et al. (2017). The idea is also to modify the template (5) to deal with the convergence issues. In this case, the suggested definition is

 dp(x,m) = (∫∞0∞∑i=1⟨x−m,ei⟩2eλicg(c;p)dc)1/2, (18)

where and is a weight function such that , is non-increasing and non-negative and . Moreover, for any , is assumed to be non-decreasing in with . This definition does not suffer from any problem derived from degeneracy but, still, it depends from two smoothing functions: the exponential in the denominator of (18) and the weighting function . As pointed out also in Ghiglietti et al. (2017), a more convenient expression for (18) is given by the following weighted version of the template, formal definition (5),

 dp(x,m) = (∞∑i=1⟨x−m,ei⟩2λihi(p))1/2, (19)

where .

The applications of (18) offered in Ghiglietti et al. (2017) and Ghiglietti and Paganoni (2017) deal with hypotheses testing for two-sample problems of type .

## 3 Some properties of the functional Mahalanobis distance

In this section we analyze in detail and prove some of the features of we have anticipated above. In what follows , with will stand for a second-order stochastic process with continuous trajectories and continuous mean and covariance functions, denoted by and , respectively.

### 3.1 Invariance

In the finite dimensional case, one appealing property of the Mahalanobis distance is the fact that it does not change if we apply a non-singular linear transformation to the data. Then, the invariance for a large class of linear operators appears also as a desirable property for any extension of the Mahalanobis distance to the functional case. Here, we will prove invariance with respect to operators preserving the norm. We recall that an operator

is an isometry if it maps to and . In this case, it holds , where stands for the adjoint of .

###### Theorem 1.

Let be an isometry on . Then, for all , where was defined in (12).

###### Proof.

Let be the covariance operator of the process . The first step of the proof is to show that . It is enough to prove that for all , it holds . Observe that

 ⟨KLf,g⟩2=∫10KLf(t)g(t)dt=∫10∫10E[(LX(s)−Lm(s))(LX(t)−Lm(t))]f(s)g(t)dsdt.

Then, using Fubini’s theorem and the definition of the adjoint operator:

 ⟨KLf,g⟩2=E[⟨L(X−m),f⟩2⋅⟨L(X−m),g⟩2]=E[⟨X−m,L∗f⟩2⋅⟨X−m,L∗g⟩2].

Analogously, we also have

 ⟨LKL∗f,g⟩2=⟨KL∗f,L∗g⟩2=E[⟨X−m,L∗f⟩2⋅⟨X−m,L∗g⟩2].

From the last two equations we conclude .

The second step of the proof is to observe that the eigenvalues of are the same as those of , and the unit eigenfunction of for the eigenvalue is given by , where is the unit eigenfunction corresponding to . Indeed, using we have

 KLvj=LKL∗vj=LKL∗Lej=λjLej=λjvj,  j=1,2,…

Then, by (14) and using that is an isometry,

 Mα(Lx,Lm)= ∥(Lx−Lm)α∥KL=∞∑j=1λj(λj+α)2 ⟨Lx−Lm,Lej⟩22 = ∞∑j=1λj(λj+α)2 ⟨x−m,ej⟩22=Mα(x,m).

The family of isometries on contains some interesting examples. For instance, all the symmetries and translations are isometries, as well as the changes between orthonormal bases. Thus, this distance does not depend on the basis on which the data are represented.

### 3.2 Distribution for Gaussian processes

We have mentioned in the introduction that the squared Mahalanobis distance to the mean for Gaussian data has a distribution with degrees of freedom, where is the dimension of the data. In the functional case, the distribution of for a Gaussian process equals that of an infinite linear combination of independent

random variables. We prove this fact in the following result and its corollary, and also give explicit expressions for the expectation and the variance of

.

###### Proposition 2.

Let be an Gaussian process with mean and continuous positive definite covariance function . Let be the eigenvalues of and let be the corresponding unit eigenfunctions.

1. The squared Mahalanobis distance to the origin satisfies

 Mα(X,0)2=∥Xα∥2K=∞∑j=1βjYj, (20)

where and , , are non-central random variables with non-centrality parameter , with .

2. We have

 E(Mα(X,0)2)=∞∑j=1λ2j(λj+α)2(1+μ2jλj),

and

 Var(Mα(X,0)2)=2∞∑j=1λ4j(λj+α)4(1+2μ2jλj).
###### Proof.

(a) Using (14), , where and . Since the process is Gaussian the variables

are independent with normal distribution, mean

and variance 1 (see Ash and Gardner (2014), p. 40). The result follows.

(b) It is easy to see that the partial sums in (20) form a sub-martingale with respect to the natural filtration ,

 E(N+1∑j=1βjYj∣∣Y1,…,YN)=βN+1E(YN+1)+N∑j=1βjYj≥ N∑j=1βjYj.

Moreover, if , which is always finite,

 supNE(N+1∑j=1βjYj)=∞∑j=1λj(λj+μ2j)(λj+α)2≤¯λα2(∞∑j=1λj+∞∑j=1μ2j)<∞,

because and (see e.g. Cucker and Smale (2001), Corollary 3, p. 34). Now, Doob’s convergence theorem implies a.s. as , and Monotone Convergence theorem yields the expression for the expectation of .

The proof for the variance is fairly similar. Using Jensen’s inequality, we deduce

 E⎡⎣(N+1∑j=1βj(Yj−EYj))2∣∣Y1,…,YN⎤⎦≥(N∑j=1βj(Yj−EYj))2.

Moreover, since the variables are independent:

 supNE(N∑j=1βj(Yj−EYj))2= ∞∑j=1β2jVar(Yj)=2∞∑j=1λ3j(λj+2μ2j)(λj+α)4 ≤ 2¯λ3α4(∞∑j=1λj+2∞∑j=1μ2j)<∞.

Then, a.s., as , and using Monotone Convergence theorem,

 Var(Mα(X,0)2)=limN→∞Var(N∑j=1βjYj)=2∞∑j=1λ4j(λj+α)4(1+2μ2jλj).

When we compute the squared Mahalanobis distance to the mean the expressions above simplify because for each , and then we have the following corollary.

###### Corollary 2.

Under the same assumptions of Proposition 2, , where and are independent random variables. Moreover, and .

### 3.3 Stability with respect to α

Our definition of distance depends on a regularization parameter . In this subsection we prove the continuity of with respect to the tuning parameter . The proof of the main result requires the following auxiliary lemma, which has been adapted from Corollary 8.3 in Gohberg and Goldberg (2013), p. 71. Recall that given a bounded operator on a Hilbert space we can define the norm

 ∥A∥L:=sup{∥Ax∥H:∥x∥H≤1}.
###### Lemma 2.

Let , , be a sequence of bounded invertible operators on a Hilbert space which converges in norm to another operator , and such that . Then is also invertible, and , as .

We will apply the preceding lemma in the proof of the following result.

###### Proposition 3.

Let be a sequence of positive real numbers such that , as . Then, a.s. as .

###### Proof.

Note that by Proposition 1(b), Eq (14), we have

 ∣∣∥Xαj∥K−∥Xα∥K∣∣ ≤∥K1/2(K+αjI)−1X−K1/2(K+αI)−1X∥2 ≤∥K1/2∥L ∥(K+αjI)−1−(K+αI)−1∥L ∥X∥2.

But it holds

 ∥(K+αjI)−(K+αI)∥L=|αj−α|→0,  as j→∞,

and (see Gohberg and Goldberg (2013), (1.14), p. 228). Therefore, , as , by Lemma 2. ∎

Observe that Proposition 3 implies the point convergence of the sequence of distribution functions of