Principal components analysis of regularly varying functions

12/07/2018
by   Piotr Kokoszka, et al.
0

The paper is concerned with asymptotic properties of the principal components analysis of functional data. The currently available results assume the existence of the fourth moment. We develop analogous results in a setting which does not require this assumption. Instead, we assume that the observed functions are regularly varying. We derive the asymptotic distribution of the sample covariance operator and of the sample functional principal components. We obtain a number of results on the convergence of moments and almost sure convergence. We apply the new theory to establish the consistency of the regression operator in a functional linear model.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

04/11/2018

The spatial sign covariance operator: Asymptotic results and applications

Due to the increasing recording capability, functional data analysis has...
10/27/2017

On the Optimal Reconstruction of Partially Observed Functional Data

We propose a new reconstruction operator that aims to recover the missin...
06/07/2020

Principal points and elliptical distributions from the multivariate setting to the functional case

The k principal points of a random vector 𝐗 are defined as a set of poin...
09/10/2020

Non-asymptotic Optimal Prediction Error for RKHS-based Partially Functional Linear Models

Under the framework of reproducing kernel Hilbert space (RKHS), we consi...
05/24/2022

Bayesian Functional Principal Components Analysis using Relaxed Mutually Orthogonal Processes

Functional Principal Component Analysis (FPCA) is a prominent tool to ch...
06/03/2019

Copula-based functional Bayes classification with principal components and partial least squares

We present a new functional Bayes classifier that uses principal compone...
08/17/2020

A Hierarchical Bayesian SED Model for Type Ia Supernovae in the Optical to Near-Infrared

While conventional Type Ia supernova (SN Ia) cosmology analyses rely pri...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A fundamental technique of functional data analysis is to replace infinite dimensional curves by coefficients of their projections onto suitable, fixed or data–driven, systems, e.g. bosq:2000, ramsay:silverman:2005, HKbook, hsing:eubank:2015. A finite number of these coefficients encode the shape of the curves and are amenable to various statistical procedures. The best systems are those that lead to low dimensional representations, and so provide the most efficient dimension reduction. Of these, the functional principal components (FPCs) have been most extensively used, with hundreds of papers dedicated to the various aspects of their theory and applications.

If are mean zero iid functions in with , then

(1)

The FPCs

and the eigenvalues

are, respectively, the eigenfunctions and the eigenvalues of the covariance operator

defined by As such, the are orthogonal. We assume they are normalized to unit norm. The form an optimal orthonormal basis for dimension reduction measured by the norm, see e.g. Theorem 11.4.1 in KRbook.

The and the

are estimated by

and defined by

(2)

where

(3)

Like the , the are defined only up to a sign. Thus, strictly speaking, in the formulas that follow, the would need to be replaced with , where . As is customary, to lighten the notation, we assume that the orientations of and match, i.e. .

Under the existence of the fourth moment,

(4)

and assuming , it has been shown that for each ,

(5)
(6)
(7)

for a suitably defined variance

and a covariance operator . The above relations, especially rate, have been used to derive large sample justifications of inferential procedures based on the estimated FPCs . In most scenarios, one can show that replacing the by the and the by the is asymptotically negligible. Relations rate were established by dauxois:1982 and extended to weakly dependent functional time series by hormann:kokoszka:2010. Relations clt-laj and clt-vj follow from the results of kokoszka:reimherr:2013. In case of continuous functions satisfying regularity conditions, they follow from the results of hall:h-n:2006.

A crucial assumption for the relations rate–clt-vj to hold is the existence of the fourth moment, i.e. 4th, the iid assumption can be relaxed in many ways. Nothing is at present known about the asymptotic properties of the FPCs and their eigenvalues if 4th does not hold. Our objective is to explore what can be said about the asymptotic behavior of , and if 4th fails. We would thus like to consider the case of and . Such an assumption is however too general. From mid 1980s to mid 1990s similar questions were posed for scalar time series for which the fourth or even second moment does not exist. A number of results pertaining to the convergence of sample covariances and the periodogram have been derived under the assumption of regularly varying tails, e.g. Davis and Resnick (davis:resnick:1985, davis:resnick:1986

), , , , ; many others are summarized in the monograph of . The assumption of regular variation is natural because non–normal stable limits can be derived by establishing a connection to random variables in a stable domain of attraction, which is characterized by regular variation. This is the approach we take. We assume that the functions

are regularly varying in the space with the index , which implies and . Suitable definitions and assumptions are presented in Section 2.

The paper is organized as follows. The remainder of the introduction provides a practical motivation for the theory we develop. It is not necessary to understand the contribution of the paper, but, we think, it gives a good feel for what is being studied. The formal exposition begins in Section 2, in which notation and assumptions are specified. Section 3 is dedicated to the convergence of the sample covariance operator (the integral operator with kernel hat-c). These results are then used in Section 4 to derive various convergence results for the sample FPCs and their eigenvalues. Section 5 shows how the results derived in previous sections can be used in a context of a functional regression model. Its objective is to illustrate the applicability of our theory in a well–known and extensively studied setting. It is hoped that it will motivate and guide applications to other problems of functional data analysis. All proofs which go beyond simple arguments are presented in Online material.

Figure 1: Five consecutive intraday return curves, Walmart stock. The raw returns are noisy grey lines. The smoother black lines are approximations .

We conclude this introduction by presenting a specific data context. Denote by the price of an asset at time of trading day . For the assets we consider in our illustration, is time in minutes between 9:30 and and 16:00 EST (NYSE opening times) rescaled to the unit interval . The intraday return curve on day is defined by . In practice, is the price after the first minute of trading. The curves show how the return accumulates over the trading day, see e.g. ; examples of are shown in Figure 1.

Figure 2: The first three sample FPCs of intraday returns on Walmart stock.

The first three sample FPCs, , are shown in Figure 2. They are computed, using emp, from minute-by-minute Walmart returns form July 05, 2006 to Dec 30, 2011, trading days. (This time interval is used for the other assets we consider.) The curves , with the scores , visually approximate the curves well. One can thus expect that the (with properly adjusted sign) are good estimators of the population FPCs in KL. Relations rate and clt-vj show that this is indeed the case, if . (The curves can be assumed to form a stationary time series in , see .) We will now argue that the assumption of the finite fourth moment is not realistic, so, with the currently available theory, it is not clear if the are good estimators of the . If , then for every . Figure 3 shows the Hill plots of the sample score for two stocks and for . Hill plots for other blue chip stocks look similar. These plots illustrate several properties. 1) It is reasonable to assume that the scores have Pareto tails. 2) The tail index is smaller than 4, implying that the fourth moment does not exist. 3) It is reasonable to assume that the tail index does not depend on and is between 2 and 4. With such a motivation, we are now able to formalize in the next section the setting of this paper.

Figure 3: Hill plots (an estimate of as a function of upper order statistics) for sample FPC scores for Walmart (left) and IBM (right). From top to bottom: levels .

2 Preliminaries

The functions are assumed to be independent and identically distributed in , with the same distribution as , which is regularly varying with index . By , we denote the usual separable Hilbert space of square integrable functions on some compact subset of an Euclidean space. In a typical FDA framework, , e.g. Chapter 2 of . Regular variation in finite–dimensional spaces has been a topic of extensive research for decades, see e.g. Resnick (resnick:1987; resnick:2006) and . We shall need the concept of regular variation of measures on infinitely-dimensional function spaces. To this end, we start by recalling some terminology and fundamental facts about regularly varying functions.

A measurable function is said to be slowly varying (at infinity) if, for all ,

Functions of the form are said to be regularly varying with exponent .

The notion of regular variation extends to measures and provides an elegant and powerful framework for establishing limit theorems. It was first introduced by and has been since extended to Banach and even metric spaces using the notion of convergence (see e.g. ). Even though we will work only with Hilbert spaces, we review the theory in a more general context.

Consider a separable Banach space and let be the open ball of radius , centered at the origin. A Borel measure defined on is said to be boundedly finite if , for all Borel sets that are bounded away from , that is, such that , for some . Let be the collection of all such measures. For , we say that the converge to in the topology, if , for all bounded away from , -continuity Borel sets , i.e., such that , where denotes the boundary of . The convergence can be metrized such that becomes a complete separable metric space (Theorem 2.3 in and also Section 2.2. of ). The following result is known, see e.g. Chapter 2 of and references therein.

Let be a random element in a separable Banach space and . The following three statements are equivalent:

  • For some slowly varying function ,

    (8)

    and

    (9)

    where is a non-null measure on the Borel -field of .

  • There exists a probability measure

    on the unit sphere in such that, for every ,

  • Relation Xtail holds, and for the same spectral measure in (ii),

If any one of the equivalent conditions in Proposition 2 hold, we shall say that is regularly varying with index . The measures and will be referred to as exponent and angular measures of , respectively.

The measure is sometimes called the spectral measure, but we will use the adjective “spectral” in the context of stable measures which appear in Section 3. It is important to distinguish the angular measure of a regularly varying random function and a spectral measure of a stable distribution, although they are related. We also note that we call the tail index, and the tail exponent.

We will work under the following assumption. The random element in the separable Hilbert space has mean zero and is regularly varying with index . The observations are independent copies of .

Assumption 2 is a coordinate free condition not related in any way to functional principal components. The next assumption relates the asymptotic behavior of the FPC scores to the assumed regular variation. It implies, in particular, that the expansion contains infinitely many terms, so that we study infinite dimensional objects. We will see in the proofs of Proposition 3 and Theorem 3 that under Assumption 2 the limit

exists and is finite. We impose the following assumption related to condition X-RV-mu.

For every , . Assumption 2 postulates, intuitively, that the tail sums must have extreme probability tails comparable to that of .

We now collect several useful facts that will be used in the following. The exponent measure satisfies

(10)

It admits the polar coordinate representation via the angular measure . That is, if , where and for , we have

(11)

This means that for every bounded measurable function that vanishes on a neighborhood of , we have

There exists a sequence such that

(12)

for any set in with . One can take, for example,

(13)

with a slowly varying function satisfying .

We will work with Hilbert–Schmidt operators. A linear operator is Hilbert–Schmidt if , where is any orthonormal basis of . Every Hilbert–Schmidt operator is bounded. The space of Hilbert–Schmidt operators will be denoted by . It is itself a separable Hilbert space with the inner product

If is an integral operator defined by , then .

Relations rate essentially follow from the bound

where the subscript indicates the Hilbert–Schmidt norm. Under Assumption 2 such a bound is useless because, by Xtail, . In fact, one can show that under Assumption 2, , so no other bound on can be expected. The following Proposition 2 implies however that under Assumption 2 the population covariance operator is a Hilbert-Schmidt operator, and with probability 1. This means that the space does provide a convenient framework.

Suppose is a random element of with and is the sample covariance operator based on iid copies of . Then and with probability 1.

Like all proofs, the proof of Proposition 2 is presented in the on-line material.

3 Limit distribution of

We will show that converges to an –stable Hilbert–Schmidt operator, for an appropriately defined regularly varying sequence . Unless stated otherwise, all limits in the following are taken as .

Observe that for any ,

(14)

where . Since the are Hilbert–Schmidt operators, the last expression shows a connection between the asymptotic distribution of and convergence to a stable limit in the Hilbert space of Hilbert–Schmidt operators. We therefore restate below, as Theorem 3, Theorem 4.11 of which provides conditions for the stable domain of attraction in a separable Hilbert space. The Hilbert space we will consider in the following will be and the stability index will be . However, when stating the result of Kuelbs and Mandrekar, we will use a generic Hilbert space and the generic stability index . Recall that for a stable random element with index , there exists a spectral measure defined on the unit sphere

, such that the characteristic functional of

is given by

(15)

where

We denote the above representation by . The -stable random element is necessarily regularly varying with index . In fact, its angular measure is precisely the normalized spectral measure appearing in (15), i.e.,

derived sufficient and necessary conditions on the distribution of under which

(16)

where the are iid copies of . They assume that the support of the distribution of , equivalently of the distribution of , spans the whole Hilbert space . In our context, we will need to work with whose distribution is not supported on the whole space. Denote by the smallest closed subspace which contains the support of the distribution of . Then is a Hilbert space itself with the inner product inherited from . Denote by an orthonormal basis of . We assume that this is an infinite basis because we consider infinite dimensional data. (The finite dimensional case has already been dealt with by .) Introduce the projections

Let , , be iid random elements in a separable Hilbert space with the same distribution as . Let be an orthonormal basis of . There exist normalizing constants and such that KM-stable holds if and only if

(17)

where for each , , and , and where

(18)

for all continuity sets , with .

If KM-stable holds, the sequence must satisfy

(19)

where

(20)

and is the Euler gamma function. Furthermore, the may be chosen as

(21)

The origin of the constant appearing in (19) can be understood as follows. Consider the simple scalar case Let be symmetric -stable with , where in this case, . Consider iid copies of and observe that by the -stability property

and hence (16) holds trivially with and .

On the other hand, by Proposition 1.2.15 on page 16 in , we have

This along with an integration by parts and an application of Karamata’s theorem yield , giving the constant in (19).

Conditions KM-1 and KM-2 in Theorem 3 hold if and only if is regularly varying in with index and for each , , where

(22)

Our next objective is to show that if is a regularly varying element of a separable Hilbert space whose index is , then the operator is regularly varying with index , in the space of Hilbert–Schmidt operators. If , then is an element of defined by . It is easy to check that . If , we denote by the subset of defined as the set of operators of the form , with . Denote by the unit sphere in centered at the origin, and by such a sphere in .

The next result is valid for all .

Suppose is a regularly varying element with index of a separable Hilbert space . Then the operator is a regularly varying element with index of the space of Hilbert-Schmidt operators.

The proof of Proposition 3 shows that the angular measure of is supported on the diagonal and that

The next result specifies the limit distribution of the sums of the based on the results derived so far.

Suppose Assumptions 2 and 2 hold. Then, there exist normalizing constants and operators such that

(23)

where is a stable random operator, , where the spectral measure is defined on the unit sphere . The normalizing constants may be chosen as follows

(24)

where is defined by aN.

The final result of this section specifies the asymptotic distribution of .

Suppose Assumptions 2 and 2 hold. Then,

(25)

where and are as in Theorem 3. ( for a slowly varying .)

If the are scalars, then the angular measure is concentrated on , with , in the notation of . Thus , and we recover the centering in Theorem 2.2 of . Relation limC explains the structure of this centering in a much more general context.

Theorem 3

readily leads to a strong law of large numbers which can be derived by an application of the following result, a consequence of Theorem 3.1 of .

Suppose are iid mean zero elements of a separable Hilbert space with , for some . Then,

Set . Then the are iid mean zero elements of which, by Proposition 3, satisfy , for any . Theorem 3 implies that for any , . Thus Theorem 3 leads to the following corollary.

Suppose Assumptions 2 and 2 hold. Then, for any , with probability 1.

4 Convergence of eigenfunctions and eigenvalues

We first formulate and prove a general result which allows us to derive the asymptotic distributions of the eigenfunctions and eigenvalues of an estimator of the covariance operator from the asymptotic distribution of the operator itself. The proof of this result is implicit in the proofs of the results of Section 2 of , which pertain to the asymptotic normality of the sample covariance operator if . The result and the technique of proof are however more general, and can be used in different contexts, so we state and prove it in detail.

Suppose is the covariance operator of a random function taking values in such that . Suppose is an estimator of which is a.s. symmetric, nonnegative–definite and Hilbert–Schmidt. Assume that for some random operator , and for some ,

In our setting, is specified in limC, and for some . More precisely,

We will work with the eigenfunctions and eigenvalues defined by

Assumption  4 implies that and the are orthogonal with probability 1. We assume that, like the , the have unit norms. To lighten the notation, we assume that sign

= 1. This sign does not appear in any of our final results, it cancels in the proofs. We assume that both sets of eigenvalues are ordered in decreasing order. The next assumption is standard, it ensures that the population eigenspaces are one dimensional.

Set

Lemma 6.2 in online material shows that the series defining converges a.s. in .

Suppose Assumptions 4 and 4 hold. Then,

and

If is an –stable random operator in , then the are jointly –stable random functions in , and are jointly –stable random variables. This follows directly from the definition of a stable distribution, e.g. Section 6.2 of . Under Assumption 2, . Theorem 4 thus leads to the following corollary.

Suppose Assumptions 2, 2 and 4 hold. Then,

where the are jointly –stable in , and

where the are jointly –stable in .

Corollary 4 implies the rates in probability and , with . This means, that the distances between and and the corresponding population parameters are approximately of the order , i.e. are asymptotically larger that these distances in the case of , which are of the order . Note that , as .

It is often useful to have some bounds on moments, analogous to relations rate. Since the tails of and behave like , e.g. Section 6.7 of , , with an analogous relation for . We can thus expect convergence of moments of order . The following theorem specifies the corresponding results.

If Assumptions 2 and 2 hold, then for each , there is a slowly varying function such that

and for ,

If, in addition, Assumption 4 holds, then for ,

Several cruder bounds can be derived from Theorem 4. In applications, it is often convenient to take . Then . By Potter bounds, e.g. Proposition 2.6 (ii) in , for any there is a constant such that for . For each , we can choose so small that . This leads to the following corollary.

If Assumptions 2 and 2 hold, then for each , there are constant and such that

If, in addition, Assumption 4 holds, then for , . Corollary 4 implies that , and tend to zero, for any .

5 An application: functional linear regression

One of the most widely used tools of functional data analysis is the functional regression model, e.g. , , . Suppose are explanatory functions, are response functions, and assume that

(26)

where is the kernel of . The are mean zero iid functions in , and so are the error functions . Consequently, the are iid in . A question that has been investigated from many angles is how to consistently estimate the regression kernel . An estimator that has become popular following the work of can be constructed as follows.

The population version of flr is . Denote by the FPCs of and by those of , so that

If is independent of , then, with ,

with the series converging in