Principal points and elliptical distributions from the multivariate setting to the functional case

The k principal points of a random vector 𝐗 are defined as a set of points which minimize the expected squared distance between 𝐗 and the nearest point in the set. They are thoroughly studied in Flury (1990, 1993), Tarpey (1995) and Tarpey, Li and Flury (1995). For their treatment, the examination is usually restricted to the family of elliptical distributions. In this paper, we present an extension of the previous results to the functional elliptical distribution case, i.e., when dealing with random elements over a separable Hilbert space H. Principal points for gaussian processes were defined in Tarpey and Kinateder (2003). In this paper, we generalize the concepts of principal points, self-consistent points and elliptical distributions so as to fit them in this functional framework. Results linking self-consistency and the eigenvectors of the covariance operator are re-obtained in this new setting as well as an explicit formula for the k=2 case so as to include elliptically distributed random elements in H.

Comments

There are no comments yet.

Authors

• 1 publication
• 10 publications
• Principal components analysis of regularly varying functions

The paper is concerned with asymptotic properties of the principal compo...
12/07/2018 ∙ by Piotr Kokoszka, et al. ∙ 0

read it

• A test for Gaussianity in Hilbert spaces via the empirical characteristic functional

Let X_1,X_2, ... be independent and identically distributed random eleme...
10/24/2019 ∙ by Norbert Henze, et al. ∙ 0

read it

• Functional Models for Time-Varying Random Objects

In recent years, samples of time-varying object data such as time-varyin...
07/25/2019 ∙ by Paromita Dubey, et al. ∙ 0

read it

• Principal Graphs and Manifolds

In many physical, statistical, biological and other investigations it is...
09/02/2008 ∙ by Alexander N. Gorban, et al. ∙ 0

read it

• Consistency of Binary Segmentation For Multiple Change-Points Estimation With Functional Data

For sequentially observed functional data exhibiting multiple change poi...
12/31/2019 ∙ by Gregory Rice, et al. ∙ 0

read it

• Resolving the Geometric Locus Dilemma for Support Vector Learning Machines

Capacity control, the bias/variance dilemma, and learning unknown functi...
11/16/2015 ∙ by Denise M. Reeves, et al. ∙ 0

read it

• Characterizations of Two-Points and Other Related Distributions

We provide new characterizations of two-points and some related distribu...
08/05/2019 ∙ by Lev Klebanov, et al. ∙ 0

read it

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Motivation

Inside statistics there exists lots of situations where the collected data may not be represented with classic schemes like numbers or numeric vectors and so, sometimes a functional representation is more appropriate. For example, consider results of an electrocardiogram (EGC) or the study of the temperature in a weather station, which lend themselves to this new framework (see, for instance, Ramsay and Silverman [10]

for more examples). A classical discretization of the data as a sequence of numbers may lose some functional characteristics like smoothness and continuity. For this reason, in the last decades different methods appeared to handle this new kind of data. In an informal way, we may say that a functional datum is a random variable (element would be a better word) that takes its values in a functional space, instead of a finite dimensional one. In this paper, we will study some fundamental concepts of this development, which in a way will result in a mixture between statistics and functional analysis over Hilbert spaces. The main idea is to mix together, in a very general family of distributions, some notions of principal components and principal points. In a multivariate setting, those developments were mainly done by Flury and Tarpey (

[2] to [6], [12] to [17], [18] and [20]) at the beginning of the 90’s. The idea here is to adapt the results obtained therein to the functional elliptical distribution case.

Maybe the final conclusion of this work is not only the theoretical result obtained. Rather, as it was done previously, our results show about the possibility of doing, with some technical difficulties but not critical ones, an interesting generalization of the classical results from multivariate analysis to a more general framework, so as to gain a better comprehension of the phenomenon, as it often tends to happen when abstraction or generalization of a mathematical concept is made.

In section 2, we will define the notion of elliptical families. We will first remind their definition in the multivariate case and later we will extend this definition to the functional case. The definition of self–consistent points and principal points as well as some of their properties are stated in section 3 where we extend the results given in Flury ([2], [3]), Tarpey [13] and Tarpey and Flury [18] to the case of random elements lying in a separable Hilbert space. We also provide a characterization that, under an hypothesis of ellipticity, allows us to make an important link between principal components and self–consistent points. We conclude with some results that allow to compute principal points in a somewhat specific case.

2 Elliptical families

2.1 Review on some finite–dimensional results

For the sake of completeness and to fix our notation we will remind some results regarding elliptical families before extending them to the functional setting. They can be found in Muirhead [9], Seber [11] and also in Frahm [7].

Let be a random vector. We will say that has an elliptical distribution, and we will denote it as , if there exists a vector , a positive semidefinite matrix and a function

such that the characteristic function of

is given by , for all In some situations, for the sake of simplicity, we will omit the symbol and will denote .

As it is well known, if and exists, then

. Moreover, if the second order moments exist

is up to a constant the covariance matrix of , i.e., . Even more, it is easy to see that the constant equals to , where stands for the derivative of .

The following theorem is a well known result and will also be extended in the sequel to adapt for functional random elements.

Theorem 2.1.

Let with and a semidefinite positive matrix with . Let and with the vector of the first coordinates of () such that is not singular. Denote by

 \boldmathΣ=\boldmathΛ\boldmathΛ\sc t=(\boldmathΣ11% \boldmathΣ12\boldmathΣ21\boldmathΣ22)∈Rd×d

with submatrixes , , and . Then,

1. , where means that the two random vectors and have the same distribution,

is uniformly distributed over

and and are independent.

2. Assume that the conditional random vector exists, then has an elliptical distribution where

 \boldmathμ∗ = \boldmathμ2+\boldmathΣ21% \boldmathΣ−111(x1−\boldmathμ1) \boldmathΣ∗ = \boldmathΣ22−\boldmathΣ21% \boldmathΣ−111\boldmathΣ12,

and corresponds to the characteristic generator of with

 R∗D∼R√1−β∣∣(R√βU(k)=C−111(x1−\boldmathμ1)).

Here stands for the Cholesky square root of , is uniformly distributed in , and , , and are mutually independent.

2.2 Functional case

In this section, we will extend the definition of elliptical distributions to the case of random elements on a separable Hilbert space. The definition will be based on the one given for the multivariate case.

Definition 2.1.

Let be a random element in a separable Hilbert space . We will say that has an elliptical distribution of parameters and , with a self–adjoint, positive semidefinite and compact operator, and we will denote , if for any lineal and bounded operator (that is, such that ) we have that has a multivariate elliptical distribution of parameters and , i.e., where stands for the adjoint operator of .

The following result shows that elliptical families in Hilbert spaces are closed through linear and bounded transformations.

Lemma 2.1.

Let an elliptical random element in of parameters and and linear and bounded. Then is an elliptical random element in of parameters and .

Lemma 2.2 shows that both parameters, and , that characterizes the element are respectively the expectation and the covariance operator, provide they exist. Its proof can be found in the Appendix.

Lemma 2.2.

Let be a random element in a separable Hilbert space such that .

1. If exists, then, .

2. If the covariance operator, , exists then, , for some .

Based on the finite dimensional results, one way of obtaining random elliptical elements is through the following transformation. Let be a gaussian element in with zero mean and covariance operator , and let be a random variable with distribution independent of . Given , define . Then, has an elliptical distribution and if exists .

We are interested in obtaining some properties concerning the conditional distribution of elliptical families similar to those existing in the multivariate setting. Let be a random element belonging to an elliptical family of parameter and and let us consider in the orthonormal basis, (

countable or finite) constructed using the eigenfunctions of the operator

related to the eigenvalues

. Given fixed, define the closed subspaces (and so Hilbert spaces)

 H1=<ϕ1,…,ϕd>H2=<ϕ1,…,ϕd>⊥

Define over these spaces the truncating projections, that is, and such that

 PH1(ϕi)={ϕii=1,2,…,d0i>dPH2(ϕi)={ϕii>d0i=1,2,…,d. (1)

We will make a composition of with the natural operator that identifies with . That is, we will consider the operator defined as

 Td(ϕi)={eii=1,2,…,d0i>n, (2)

with the vectors of the canonical base of . Then, for any we have that . We will use instead of as a projector in many situations, because its image is and we will call each of them truncating projectors.

Based on these projections we can construct , and random elements, both of them elliptical by Lemma 2.1.

We have essentially split the random element in two parts, one of them being finite dimensional which will allow us to define a conditional distribution following the guidelines previously established.

Theorem 2.2.

Let be a separable Hilbert space. Let be a random element in with distribution with finite second moments. Without loss of generality, we can assume that is the covariance operator. Assume that is Hilbert–Schmidt so that . Let fixed and consider , and with and defined in (1) and defined in (2). Let be the eigenvalues of and assume that . Then,

• the covariance matrix of given by is non–singular

• ,

where is the covariance operator between and , and .

3 Self–consistent points and principal points

As mentioned in the Introduction self–consistent and principal points were studied by Flury ([2], [3]), Tarpey [13] and Tarpey and Flury [18] in the multivariate setting. Later on, Tarpey and Kinateder [19] extended their definition and properties for gaussian processes while Tarpey et al. [21]

applied principal points to estimate a set of representative longitudinal response curves from a clinical trial. The aim of this section is to extend some of the properties previously obtained to include elliptical families.

For the sake of completeness, we remind the definition of self–consistency and principal points.

Definition 3.1.

Let with , we define the minimum distance of to the set as .

The set induce a partition of the space determined by the domains of attraction.

Definition 3.2.

Given , the domain of attraction of consists in all the elements of that have as the closest point of , that is, .
For points with equal distance to two or several , we assign them arbitrarily to the set with lower index .

Definition 3.3.

Let be a random element in with expectation . A set is said to be self-consistent for if .

A random element is called self-consistent for if .

Definition 3.4.

Let be a random element in with finite second moment. The elements are called principal points of if

 DV(k)=E(d2(V,{ξ1,…,ξk}))=minyj∈HE(d2(V,{y1,…,yk}))

Lemma 3 in Tarpey and Kinateder [19] establishes for functions, with a real bounded interval, the well–known result in multivariate analysis that the mean of a distribution lies in the convex hull of any set of self–consistent points. Moreover, Flury [3] established that principal points of a random vector in are self–consistent points. This result was generalized to random functions in by Tarpey and Kinateder [19]. The same arguments allow to establish these results for any separable Hilber space , we state them without proof.

Lemma 3.1.

Let be a random element of a separable Hilbert space such that exists. Then,

1. if is a self-consistent set, then is a convex combination of .

2. Moreover, if has finite second moments and the set is a set of principal points for , then it is self-consistent.

As a consequence of Lemma 3.1, if and is a random element with self-consistent set , then . Moreover, we will have self-consistent points whenever we have principal points.

The following result will allows us to assume, in the sequel, that the random element has expectation 0. It also generalizes Lemma 2.2 in Tarpey et al. [20], to the infinite–dimensional setting. Its proof is given in the Appendix.

Lemma 3.2.

Let be a random element of a separable Hilbert space and define with , a scalar and a unitary operator, i.e., surjective and isometric. Then, we have that

• If is a set of self-consistent points of , then is a set of self-consistent points of .

• If is a set of principal points of , then is a set of principal points of and .

Lemma 3.3 is analogous to Lemma 2.3 in Tarpey et al. [20].

Lemma 3.3.

Let be a random element with expectation . Let be a set of self–consistent points of spanning a subspace of dimension , with an orthonormal basis . Then, the random vector of defined by with will have with and as self–consistent set.

The notion of best point approximation has been considered by Tarpey et al. [20] for finite–dimensional random elements. It extends immediately to elements on a Hilbert space.

Definition 3.5.

Let

be a discrete random element, jointly distributed with the random element

and denote the support of . The random element is a best point approximation to if contains exactly different elements and for any random element whose support has at most points, i.e., .

The following result is the infinite–dimensional counterpart of Lemma 2.4 in Tarpey et al. [20].

Lemma 3.4.

Let be a best point approximation to and denote by a set of different elements in and by the domain of attraction of . Then,

1. If then equals

with probability 1. That is,

.

2. a.s. for all .

3. a.s., i.e., is self-consistent for .

It is worth noticing that given a self-consistent set of , as in the finite–dimensional case, we can define in a natural way a random variable with support and so, . Since is a self-consistent set, we will have that , with probability 1. As in the finite–dimensional setting, will not be necessarily a best approximation, unless the set is a set of principal points.

As mentioned above, if is a random element with a self-consistent set of elements then and so that if we assume , we have . The forthcoming results try to characterize the subspace spanned by the self–consistent points when . They generalize the results obtained in the finite–dimensional case by Tarpey et al. [20] and extended to gaussian processes by Tarpey and Kinateder [19]. They also justify the use of the means algorithm not only for gaussian processes but also for elliptical processes with finite second moments.

Theorem 3.1.

Let be a random element in a separable Hilbert space , with finite second moment and assume that . Let be a set of self–consistent points for . Then, , for all , where denotes the covariance operator of .

In particular, if denotes the linear space spanned by the self–consistent points, we have get easily that . Moreover, it will also hold that . This last fact will follow from the properties of semidefinite and diagonalizable operators.

Corollary 3.1.

Let be a random element in a separable Hilbert space , with finite second moment and compact covariance operator , such that . Let be a set with self–consistent points for and denote the subspace spanned by them. Then,

1. .

2. Denote by be the subspace spanned by the set of self-consistent points. Then, .

The following Theorems provide the desired result relating, for elliptical elements, self–consistency and principal components.

Theorem 3.2.

Let be a random elliptical element with and compact covariance operator . Let the subspace spanned by the set of self-consistent points. Then, is spanned by a set of eigenfunctions of .

Theorem 3.3.

Let be a random elliptical element with and compact covariance operator . If principal points of generate a subspace of dimension , then this subspace will also be spanned by the eigenfunctions of related to the largest eigenvalues.

3.1 Properties of principal points and resolution for the case k=2

As mentioned above, when the principal point equals the mean of the distribution. The goal of this section is to obtain, as in the finite–dimensional setting, an explicit expression for the principal points when . As is well known, even when dealing with finite–dimensional data, no general result is known for any value of . The following theorem will be very useful in the sequel and it generalizes a result given, for the finite–dimensional case, by Flury [2]. It is worth noticing that Theorem 3.4 does not require to the random element to have an elliptical distribution.

Theorem 3.4.

Let be a separable Hilbert space and a random element with mean and with principal points . Then, the dimension of the linear space spanned by is strictly lower than .

In particular, when we get that the mean is a principal point. We will now focus our attention of the case and the results will be derived for elliptical distributions.

Theorem 3.5 generalizes Theorem 2 in Flury [2] which states an analogous property for the finite dimensional vectors. As in euclidean spaces, the result assumes the existence of self-principal points for real variables, conditions under which this holds are given in Theorem 1 of Flury [2].

Theorem 3.5.

Let be an elliptical random element of a separable Hilbert space with mean and covariance operator with finite trace. Denote by an eigenfunction of with norm 1, related to its largest eigenvalue . Assume that the real random variable has two principal points for any and let , the two principal points of the real random variable . Then, has two principal points and .

A Appendix

Proof of Lemma 2.1 Let linear and bounded, let us show that is an elliptical multivariate random vector of mean and covariance matrix .

Let the composition. Then, is linear and bounded, therefore is elliptical with parameters and , finishing the proof.

Proof of Lemma 2.2. a) Denote by the dual space of , i.e., is the set of all linear and continuous functions . Let , then and since is linear and continuous it is linear and bounded. Then, has an elliptical distribution with parameters and . The existence of entails that exists and that . Since , by uniqueness we get that .

The proof of b) will follow from the properties of the covariance operator and Lemma 2.1 using the uniqueness of the covariance.

For that purpose, it will be convenient to have defined a series of special operators. Since is separable, it admits an orthonormal countable base, that is, there exists (eventually finite if the space is of finite dimension) orthonormal generating . We will choose as basis of the basis of eigenfunctions of related to the eigenvalues . Without loss of generality we can assume that , otherwise and the conclusion would be trivial.

Define , the orthogonal projection onto the subspace spanned by and as in (2).

We want to show that , i.e., that

 <α\boldmathΓu,v>H=aV(u,v)=Cov(H,H)

for any , where we have explicitly written the space where the internal product is taken for clarity.

Let fixed. Using that has an elliptical distribution, we get that . On the other hand, since has finite second moment, the same holds for which implies that and the covariance matrix of , denoted , is proportional to . Therefore, it exists such that .

We begin by showing that does not depend on . It is easy to see that

 Td\boldmathΓu=d∑i=1λi<ϕi,u>Hei.

Therefore, using that for all , we obtain that

 Td\boldmathΓT∗d=diag(λ1,…,λd). (A.1)

Let and be the usual projection , where . The fact that implies that the covariance matrix of is given by and so, which together with (A.1), implies that .

Hence, there exists such that for all the covariance matrix of is equal to , implying that

 <αTd\boldmathΓT∗dx,y>% \footnotesizeRd=Cov(\footnotesizeRd,\footnotesizeRd).

Using the definition of adjoint of , we have that , meanwhile the right member of the equality can be written as

 Cov(\footnotesizeRd,\footnotesizeRd)=Cov(H,H).

Then, we have that for all , , ,

 <α\boldmathΓT∗dx,T∗dy>H=Cov(H,H)=aV(T∗dx,T∗dy).

Given , define , , and . We have that and . Then, using that , we get

 <α\boldmathΓud,vd>H=<α% \boldmathΓT∗dx,T∗dy>H=aV(T∗dx,T∗dy)=aV(ud,vd).

The continuity of entails that . On the other hand, using that is a self–adjoint, compact operator, we obtain that , which concludes the proof.

Proof of Theorem 2.2. The proof of a) follows immediately since .

b) It is enough to show that and , for any . Let . Using that with is a bounded and linear operator, we get that is elliptical of parameters and

 T\boldmathΓT∗=(\boldmathΣW1Cov(W1,f(V2))Cov(W1,f(V2))\sc tf\boldmathΓf∗).

Using Theorem 2.1, we get that has also an elliptical distribution with expectation given by . On the other hand, which implies that

 E(f(V2)|W1) = f(μ2)+Cov(f(V2),W1)\boldmathΣ−1W1(W1−\boldmathμ1) = f(μ2)+f\boldmathΓV2,W1% \boldmathΣ−1W1(W1−\boldmathμ1)=f(μ2+\boldmathΓV2,W1\boldmathΣ−1W1(W1−\boldmathμ1))

and so, we conclude the proof.

Proof of Lemma 3.2. a) Using that is self-consistent for , we get that . Let us notice that, since is a unitary operator, if and only if . Therefore, is the domain of attraction of which implies that . Hence,

 E(V2|V2∈˜Dj) = E(ν+ρUV|V2∈˜Dj)=ν+ρUE(V|ν+ρUV∈ν+ρUDj) = ν+ρUE(V|V∈Dj)=ν+ρUyj.

b) Let be any set of points in and denote by their respective domains of attraction. We have to prove that .

Let such that , then

 E(d2(V2,{ξ1,…,ξk})) = E(min1≤j≤k∥V2−ξj∥2)=E(min1≤j≤k∥ν+ρUV−ξj∥2) = E(min1≤j≤k∥ν+ρUV−ν−ρUzj∥2)=E(min1≤j≤k∥ρUV−ρUzj∥2) = ρ2E(min1≤j≤k∥UV−Uzj∥2)=ρ2E(min1≤j≤k∥V−zj∥2)

where the last inequality holds from the fact that is an isometry. On the other hand, using that is a set of principal points of , we get that . Threfore, we have that

 E(d2(V2,{ξ1,…,ξk})) = ρ2E(min1≤j≤k∥V−zj∥2)≥ρ2E(min1≤j≤k∥V−yj∥2)=E(min1≤j≤k∥V2−ξ0,j∥2)

where , which means are principal points of . Besides, we also obtain that