Analytic Basis Expansions for Functional Snippets

05/16/2019 ∙ by Zhenhua Lin, et al. ∙ 0

Estimation of mean and covariance functions is fundamental for functional data analysis. While this topic has been studied extensively in the literature, a key assumption is that there are enough data in the domain of interest to estimate both the mean and covariance functions. In this paper, we investigate mean and covariance estimations for functional snippets in which observations from a subject are available only in an interval of length strictly (and often much) shorter than the length of the whole interval of interest. For such a sampling plan, no data is available for direct estimation of the off-diagonal region of the covariance function. We tackle this challenge via an analytic basis representation of the covariance function. The proposed approach allows one to consistently estimate an infinite-rank covariance function from functional snippets. Moreover, the convergence rate is shown to be nearly parametric. This unusually high convergence rate is attributed to the analytic assumption on the covariance function, which is imposed to identify the covariance function. We also illustrate the finite-sample performance via simulation studies and an application to spinal bone mineral density.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 18

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Nowadays functional data are commonly encountered in practice, due to the advances in modern science and technology that enhance capability of data collection, storage and processing. Both unsupervised learning approaches, such as dimension reduction via functional principal component analysis

(Rao, 1958; Dauxois et al., 1982; Hall and Hosseini-Nasab, 2009; Mas and Ruymgaart, 2015)

, and supervised learning, such as functional regression

(Cardot et al., 1999; Müller and Stadtmüller, 2005; Ferraty and Vieu, 2006; Hall and Horowitz, 2007; Müller and Yao, 2008; Kong et al., 2016) are well studied in the literature. For a comprehensive treatment of these subjects, we recommend the monographs Ramsay and Silverman (2005), Ferraty and Vieu (2006), Horváth and Kokoszka (2012), Hsing and Eubank (2015) and Kokoszka and Reimherr (2017), and the review papers Wang et al. (2016) and Aneiros et al. (2019).

Critical to the statistical analysis of such data is the estimation of the mean and covariance functions, since they are the foundation of the aforementioned unsupervised and supervised learning tasks. In reality, functions can only be recorded on a set of discrete points of the domain of the functions. Estimation of the mean and covariance functions in this context has been extensively studied by Rice and Silverman (1991), Cardot (2000), James et al. (2000), Cai and Yuan (2010), Cai and Yuan (2011), Yao et al. (2005), Li and Hsing (2010) and Zhang and Wang (2016), among many others. In addition to the discrete nature of observed functional data, subjects often stay in the study only for a subject specific period that is much shorter than the span of the whole study. This does not cause much problems for mean estimation but brings challenges to covariance estimation.

For illustration, without loss of generality, we assume that the domain of the functional data is the unit interval and each subject only stays in the study for a period of length . Data of this characteristics are termed “functional snippets”, which is analogous to the longitudinal snippets analyzed in Dawson and Müller (2018). For such data, there is no information in the off-diagonal region of the covariance function Cov, and therefore there is no local information available for estimating the covariance function in this region. Figure 2(a) illustrates such a situation for the bone mineral density data in Section 5.

The literature of statistical analysis for functional snippets is in its infancy. Delaigle and Hall (2016)

proposed to approximate snippets by segments of Markov chains. Such a method is only valid at the discrete level, as shown by

Descary and Panaretos (2019). More recently, to analyze functional snippets, Zhang and Chen (2018) and Descary and Panaretos (2019) used matrix completion techniques that innately work on a common grid of the domain . While their methods might be strained to deal with snippets recorded on random and irregular design points, which is often the actual data design, their theoretical analysis only covers the case of the regular design. In addition, Descary and Panaretos (2019) assumed a finite-rank condition on the covariance function. Such condition essentially entails that functional data are of fixed and finite dimension, which does not conform with the common paradigm for functional data, which regards functional data as genuinely infinite-dimensional processes. Other related works such as Liebl (2013), Gellar et al. (2014), Goldberg et al. (2014), Kraus (2015), Gromenko et al. (2017), Kraus and Stefanucci (2019), Stefanucci et al. (2018), Mojirsheibani and Shaw (2018), Kneip and Liebl (2019) and Liebl and Rameseder (2019), although considering partially observed functional data, do not specifically address functional snippets as characterized above. For example, in Kraus (2015) and other works, information and design points for the off-diagonal region are still observable. In contrast, information for that region is completely missing in the context of functional snippets, which significantly elevates the difficulty of statistical analysis. Thus, this is an extrapolation problem, which requires strong assumptions to overcome. Lin and Wang (2019)

adopt a parametric assumption on the correlation function, while allowing the variance function to be estimated nonparametrically. However, the parametric assumption on the correlation can only be validated on the region where data are available.

In this paper, we propose to approach functional snippets from the perspective of basis expansion. The main idea is to represent the covariance function by basis functions composed from tensor product of analytic orthonormal functions defined on

. Unlike local bases such as B-spline studied in Rice and Wu (2001); Ramsay and Silverman (2005), analytic bases are global, in the sense that they are independent of local information such as knots or design points and completely determined by their values on a countably infinite subset of the interval . This feature of analytic bases allows information to pass from the diagonal region to the off-diagonal region along the basis functions. With this analytic assumption on the covariance function, the missing pieces of the covariance function can then be inferred from the data available in the diagonal region. In contrast, this is generally impossible for B-spline or other local bases. Moreover, the analytic assumption facilitates a much faster convergence rate of the proposed covariance estimation for sparsely observed function snippets as compared to conventional covariance estimation under less stringent smoothness assumptions. While this is not totally surprising, the result, to the best of our knowledge, is the first of its kind for sparsely observed functional data.

A distinct feature of the proposed method is its capacity to analyze genuine functional data of infinite dimension in the context of snippets. This is demonstrated under the umbrella of a certain family of analytic functions, the bounded sequential compact family (BSC family), defined in Section 3. We show that, when the covariance function comes from an analytic BSC family, the proposed method is able to estimate it at a nearly parametric rate based on snippets that are irregularly and/or sparsely observed. As an analytic BSC family can contain covariance functions of infinite rank, the proposed approach can process functional data of infinite dimensions. This separates our work from existing ones that only deal with functional snippets with a finite-rank covariance function or a parametric correlation function. As another contribution, we propose a penalized basis approach to estimate the mean function and thoroughly analyze its asymptotic properties over a class of bases. To the best of our knowledge, this has not been explored in the literature yet.

The rest of the paper is organized as follows. Section 2 details the proposed estimators for the mean function and covariance function. Their theoretical properties are investigated in Section 3, while their numerical performance is assessed in Section 4. The application to a real dataset is discussed in Section 5.

2 Methodology

Let be a second-order stochastic process on a compact interval , which without loss of generality is taken to be . The mean and covariance functions of are defined as and respectively. The observed functions are statistically modeled as independent and identically distributed realizations of . In practice, each realization is only recorded at subject-specific time points with measurement errors. More precisely, for functional snippets, the observed data are pairs , where

(1)

is the random noise with mean zero and unknown variance , and there is a for which for all , and . The focus of this paper is to estimate the mean and covariance functions of using these data pairs .

2.1 Mean Function

Although functional snippets pose a challenge for covariance estimation, they usually do not obstruct mean estimation, since data for the estimation are available likely across the whole domain of interest. In light of this observation, traditional methods such as local linear smoothing (Yao et al., 2005; Li and Hsing, 2010) can be employed. Below we adopt an analytic basis expansion approach. The advantage of this approach is its computational efficiency and adaptivity to the regularity of the underlying mean function ; see also Section 3.2.

Let be a complete orthonormal basis of that consists of squared integrable functions defined on the interval . When , it can be represented by the following series in terms of the basis ,

where . In practice, one often approximates such series by its first leading terms, where is a tuning parameter controlling the approximation quality. The coefficients are then estimated from data by penalized least squares. Specifically, with the notation and , the estimator of is given by

(2)

and is estimated by , where the weights satisfy , represents the roughness penalty, and is a tuning parameter that provides trade-off between the fidelity to the data and the smoothness of the estimate. There are two commonly used schemes for the weights, equal weight per observation (OBS) and equal weight per subject (SUBJ), of which the weights are and , respectively. Alternative weight schemes can be also found in Zhang and Wang (2016, 2018).

The penalty term in (2) is introduced to prevent excessive variability of the estimator when a large number of basis functions are required to adequately approximate

and when the sample size is not sufficiently large. In the asymptotic analysis of

in Section 3, we will see that this penalty term contributes little to the convergence rate of when the tuning parameter is not too large. In our study, the roughness penalty is given by

where denotes the second derivative of . In matrix form, for , it equals , where is a matrix with elements . The tuning parameters and can be selected by a cross-validation procedure.

2.2 Covariance Function

Since functional snippets do not provide any direct information at the off-diagonal region, the only way to recover the covariance in the off-diagonal region is to infer it from the diagonal region. Mathematically, this requires the covariance function to be identifiable from its diagonal region. It is well known that if a function is analytic in a connected and open domain, then it might be completely determined by its values at a countably infinite number of points. A function is analytic if it can be locally represented by a convergent power series. In fact, according to Corollary 1.2.7 of Krantz and Parks (2002), one has immediately the following assertion.

Proposition 1.

If and are two analytic functions on and there is a subset of countably infinite points such that

then

Given the above observation, one is able to identify the whole covariance function based on its values in the diagonal region if it is analytic in . For the purpose of identifiability, we shall assume the analyticity of the covariance function in the sequel. Note that this assumption has also been adopted by Descary and Panaretos (2019) who also provided a nice example to illustrate that the infinitely differentiable function space is too large to ensure identifiability of based on data available in the diagonal region.

With the assumed analyticity, it is now possible to infer the off-diagonal region by the information contained in the raw covariance available only in the diagonal region. To leverage such information, both Zhang and Chen (2018) and Descary and Panaretos (2019) used matrix completion techniques. However, the method of Descary and Panaretos (2019) requires the rank of the covariance to be finite to ensure the identifiability based on discretized observations. Zhang and Chen (2018) allow the rank to increase slowly with the sample size, but for the theoretical derivation, require a less transparent condition that involves the rank; see also their Assumption 2.

We take a different approach to utilize the available data in the diagonal region without any assumption on the rank. Specifically, we propose to transport information from the diagonal region to the off-diagonal region through the basis functions , by approximating with

(3)

where , is the matrix of coefficients , and is an integer. If we choose an analytic basis , then their values in the diagonal region completely determine their values in the off-diagonal region. When such a representation of the covariance function is adopted and the unknown coefficients are estimated from data, the information contained in the estimated coefficients extends from the diagonal region to the off-diagonal region through analyticity.

For the purpose of generality, we consider a family of analytic covariance functions defined on . The covariance function is then assumed to reside in . Examples of include the collection of analytic covariance functions of rank , and the collection of all analytic covariance functions. The former, considered by Descary and Panaretos (2019), is too restrictive to model functional data of infinite dimensions, while the latter is too large to provide a theoretical guarantee for the estimated covariance function. In Section 3 we will explore suitable families for .

To estimate the coefficients from data, we again adopt the idea of penalized least squares, where the squared loss of a given function is measured by the sum of weighted squared errors

where are the weights satisfying , while the roughness penalty is given by

Then, one has

(4)

for as of (3), where denotes the matrix trace, and are matrices with elements and , respectively. The estimator of is then taken as with

(5)

where is the tuning parameter that provides trade-off between the fidelity to the data and the smoothness of the estimate.

Similar to (2), the penalty term in (5) does not contribute to the convergence rate of when the tuning parameter is not too large. In practice, the value of can be chosen via cross-validation. For the choice of the Zhang and Wang (2016) discussed several weight schemes, including the OBS scheme and the SUBJ scheme . An optimal weighing scheme was proposed in Zhang and Wang (2018); we refer to this paper for further details.

3 Theory

As functional snippets are often sparsely recorded, in the sense that for all and some , in this section we focus on theoretical analysis tailored to this scenario. For simplicity, we assume that the number of observations for each trajectory is equal, i.e., for all . Note that under this assumption, the SUBJ and OBJ schemes coincide. The results for general number of observations and weight schemes can be derived in a similar fashion. We start with a discussion on the choice of basis functions and then proceed to study the convergence rates of the estimated mean and covariance functions.

3.1 Analytic Basis

While all complete orthonormal bases can be used for the proposed estimators in (2), an analytic basis is required for the estimator in (5). For a clean presentation, we exclusively consider analytic bases that work for both (2) and (5). In this paper, a basis is called an analytic -basis if its basis functions are all analytic and satisfy the following properties: for some constants , there exists a constant such that and for and all . Here, denotes the supremum norm of , defined by , and represents the th derivative of . Throughout this paper, we assume that the basis is an analytic -basis.

Different bases lead to different convergence rates of the approximation to and . For the mean function , when using the first basis functions , the approximation error is quantified by

where we recall that . The convergence rate of the error , denoted by , signifies the approximation power of the basis for . Similarly, the approximation error for is measured by

where the norm of a function is defined by . The convergence rate of is denoted by . Below we discuss two examples of bases.

Example 1 (Fourier basis).

The Fourier basis functions, defined by , and for , constitute a complete orthonormal basis of . It is also an analytic -basis. When is periodic on , if exists and satisfies , then, according to Theorem 2.4.1 of Olson (2017), . If is analytic, then according to Proposition 5.4.1 of Krantz and Parks (2002), for some . Similarly, if is analytic and periodic, then for some . When or is not a periodic function, a technique called Fourier extension, briefly described in Appendix A, can be adopted to yield the same rates (Adcock et al., 2014). This technique is well studied in the field of computational physics (Boyd, 2002) and numerical analysis (Huybrechs, 2010) as a tool to overcome the so-called Gibbs phenomenon (Zygmund, 2003), but seems not well explored in statistics yet. In Section 4, we numerically illustrate the application of this technique.

Example 2 (Legendre polynomials).

The canonical Legendre polynomial of degree is defined on by

These polynomials are orthogonal in . By a change of variable and normalization, they can be turned into an orthonormal basis of . The Legendre polynomials have numerous applications in approximation theory and numerical integration; see Wang and Xiang (2011) and references therein. One can show that the Legendre basis is an analytic -basis. According to Theorem 2.3 of Canuto and Quarteroni (1982), we have when exists and satisfies . When is indeed analytic, then based on Proposition 2.5 of Wang and Xiang (2011), we have for some . Similarly, if is analytic, then for some .

3.2 Mean function

For functional snippets, we shall assume that the observations from a subject scatter randomly in a subject specific time interval, whose length is and whose middle point is called the reference time in this paper. We further assume that the reference time of the th subject are independently and identically distributed in the interval , and the observed time points , conditional on , are independently and identically distributed in the interval .

To study the property of the estimator , we make the following assumptions.

[labelwidth=1.2cm,leftmargin=1.4cm,align=left]

A.1

There exist such that the density of the reference time satisfies for any . There exist , such that the conditional density of the observed time satisfies , for any given reference time and .

[labelwidth=1.2cm,leftmargin=1.4cm,align=left]

A.2

for some constant .

[labelwidth=1.2cm,leftmargin=1.4cm,align=left]

A.3

and .

Assumption A.1 requires the density of the reference time and conditional densities of the time points to be bounded away from zero and infinity. This also guarantees that the marginal probability density of the time points

is bounded away from zero and infinity. In the sequel, we use to denote .

Theorem 1.

If is a -basis, under conditions (A.1)–(A.3), we have

(6)

We first note that, with condition A.3 on the tuning parameter , it does not impact the asymptotic rate of . We also observe that in (6), the term specifies the estimation error using a finite sample, while is the deterministic approximation error for using only the first basis functions. The latter term depends on the smoothness of . Intuitively, it is easier to approximate smooth functions with basis functions. For a fixed number of basis functions, the smoother the function is, the smaller is the approximation error. When is times differentiable, we have and this leads to the following convergence rate, using either the Fourier basis or the Legendre basis in this scenario.

Corollary 1.

Suppose exists and satisfies for some . Assume conditions (A.1)–(A.3) hold. If is the Fourier basis, then with the choice . If is the Legendre basis, then with the choice .

As discussed in Example 1 and 2, for an analytic function, the approximation error could be of an exponential order of . This leads to the following remarkable nearly parametric convergence rate.

Corollary 2.

Suppose that is analytic, conditions (A.1)–(A.3) hold, and . If is the Fourier basis, then . If is the Legendre basis, then .

The above corollaries show that the proposed analytic basis expansion approach automatically adapts to the smoothness of . This contrasts with the local polynomial smoothing method and the B-spline basis approach, for which the convergence rate is limited by the order of the polynomials or B-spline basis functions used in the estimation, even when might have a higher order of smoothness. In practice, it is not easy to determine the right order for these methods, since both the mean function and its smoothness are unknown. In particular, when is analytic, neither of these traditional methods can fully leverage the smoothness of . Therefore, the feature of automatic adaptivity makes the analytic basis expansion approach an attractive alternative to the classical smoothing methods.

3.3 Covariance function

In Section 2 we assumed resides in a family . We also mentioned that the family of analytic covariance functions of rank is too small to model genuine functional data which are conceptually of infinite dimension. Below we show that the proposed approach based on analytic basis expansions can recover infinite-rank covariance functions from functional snippets. To this end, we consider families larger than via sequential compactness, as follows.

We first recall that a topological space is sequentially compact if very sequence has a convergent subsequence converging to a point in the space. Let be the space of all real-valued functions defined on endowed with the product topology. In this topology, a sequence of functions converges to a limit if and only if

In this paper, a subset of is called a bounded sequentially compact family (BSC family) if every sequence in that is uniformly bounded in the norm, i.e., for some , has a subsequence converging to a limit in in the product topology. Note that a sequentially compact subset is a BSC family.

In our development, we have assumed is analytic, i.e., , where denotes the collection of analytic functions defined on . To incorporate the analyticity, we consider analytic BSC families which are BSC families and also subsets of . We first observe that the family considered in Descary and Panaretos (2019) is a BSC family. The following example, exhibiting a BSC family that contains covariance functions of infinite rank, demonstrates that our approach covers infinite-rank covariance functions.

Example 3 (Functions with commonly bounded Lipschitz constant).

Let be fixed but potentially arbitrarily large constants. Let be the collection of analytic functions defined on such that if then and the Lipschitz constant of is no larger than , where the Lipschitz constant of is defined as . We claim that is BSC. To see this, we note that is locally equicontinuous, and also the set is bounded for all . Then the claim follows from Arzelà-Ascoli Theorem in Chapter 7 of Remmert (1997). This family clearly contains countless covariance functions of infinite rank.

To derive the asymptotic property of the estimated covariance function, we shall assume that

[labelwidth=1.2cm,leftmargin=1.4cm,align=left]

B.1

The covariance function belongs to an analytic BSC family of covariance functions.

[labelwidth=1.2cm,leftmargin=1.4cm,align=left]

B.2

The random function satisfies that .

[labelwidth=1.2cm,leftmargin=1.4cm,align=left]

B.3

and as .

Since an analytic BSC family, such as in Example 3, may contain covariance functions of infinite rank, the theory developed below applies to functional snippets of infinite dimensions. To avoid the entanglement with the error from mean function estimation, we shall assume that is known in the following discussion, noting that the case that is unknown can also be covered but requires a much more involved presentation and tedious technical details and thus is not pursued here.

Theorem 2.

If is an analytic -basis, under assumptions A.1 and B.1–B.3, we have

(7)

Note that, with the condition B.3 on , the tuning parameter does not impact the asymptotic rate of . As in the case of mean function, the rate in (7) contains two components, the estimation error stemming from the finiteness of the sample, and the approximation bias attributed to the finiteness of the number of basis functions being used in the estimation. We note that the approximation bias decays exponentially for both the Fourier basis and the Legendre basis. In particular, using analytic bases results in the following nearly parametric convergence rate.

Corollary 3.

Suppose assumptions A.1 and B.1–B.3 hold and . If is the Fourier basis and is periodic, then . If is the Legendre basis, then .

4 Simulation Studies

We now illustrate the numerical performance of the proposed approach using the Fourier basis. As mentioned in Example 1, with the Fourier extension technique to deal with nonperiodic functions, this basis can approximate an th-differentiable (analytic, respectively) function at the rate of ( for some , respectively) when the first basis functions are used. Moreover, it is a -basis and thus enjoys the favorable convergence rates established in Corollaries 1, 2 and 3. As the Fourier extension technique is not well studied in statistics, below we also demonstrate it numerically.

For the mean function, we consider two scenarios, and . The former is a periodic function while the latter is nonperiodic. For the covariance function, we take two functions from the class exhibited in Example 3. Specifically, we consider the periodic covariance function with and if and if , and the nonperiodic covariance function that is determined by a Matérn correlation function and the variance function , i.e.,

where is the modified Bessel function of the second kind of order . In the evaluation of the performance for the mean function, the covariance function is fixed to be , while the mean function is fixed to be in the evaluation of the covariance function. This strategy avoids the bias from covariance to influence the estimation of the mean function, and vice versa.

The estimation quality is measured by the empirical mean integrated squared error (MISE) based on independent simulation replicates. For the mean estimator , the MISE is defined by

and for the covariance estimator , it is defined by

where and are estimators in the th simulation replicate. The tuning parameters , , , and the extension margin (see Appendix A) are selected by a five-fold cross-validation procedure.

In all replicates, the reference time

are sampled from a uniform distribution on

. The number of observations are independent and follow the distribution . The measurement noise

are i.i.d. sampled from a normal distribution with mean zero and variance

, where the noise level is set to make the signal-to-noise ratio . We consider three sample sizes, , and two different values, , representing short snippets and long snippets, respectively.

The results are summarized in Table 1 and 2 for mean and covariance functions, respectively. As expected, in all settings, the performance of the estimators improves as or becomes larger. We also observe that if the function to be estimated is periodic, like and , the performance seems marginally better without Fourier extension. However, if the function is nonperiodic, like and , then the estimators with Fourier extension considerably outperform those without the extension, especially for the mean function or when the sample size is large. This demonstrates that Fourier extension is a rather effective technique that complements the Fourier basis for nonparametric smoothing, and might deserve further investigation in the framework of statistical methodology.

FE 7.28 (5.29) 3.26 (1.79) 1.69 (0.88)
4.94 (3.23) 2.30 (1.31) 1.13 (0.83)
6.75 (5.25) 3.39 (2.35) 1.85 (1.23)
4.63 (3.43) 2.48 (1.69) 1.19 (0.94)
NFE 6.86 (4.94) 2.93 (1.69) 1.49 (0.81)
4.63 (3.16) 2.12 (1.19) 1.07 (0.75)
12.43 (4.77) 8.15 (2.43) 6.02 (1.31)
9.81 (3.28) 7.10 (1.91) 5.51 (1.25)
Table 1:

MISE of the proposed estimator for the mean function. The MISE and their Monte Carlo standard errors in this table are scaled by 100 for a clean presentation. FE refers to Fourier basis with Fourier extension, while NFE refers the basis without the extension.

FE 6.01 (1.93) 5.28 (1.33) 4.75 (0.89)
4.07 (3.24) 2.62 (1.36) 2.03 (0.74)
2.48 (1.80) 1.39 (0.97) 1.11 (0.72)
1.74 (1.56) 0.88 (0.50) 0.54 (0.34)
NFE 6.00 (2.02) 5.24 (1.44) 4.68 (0.97)
3.90 (3.43) 2.47 (1.44) 1.86 (0.68)
2.97 (2.12) 1.90 (0.94) 1.62 (0.77)
2.22 (1.79) 1.37 (0.68) 1.03 (0.37)
Table 2: MISE of the proposed estimator for the covariance function. The MISE and their Monte Carlo standard errors in this table are scaled by 10 for a clean presentation. FE refers to Fourier basis with Fourier extension, while NFE refers the basis without the extension.

5 Application: Spinal Bone Mineral Density

In the study of Bachrach et al. (1999), 423 individuals, age 8 to 27, were examined for their longitudinal spinal bone mineral density. The bone density of each individual was irregularly recorded in four consecutive years, at most once for each year. The data for each individual then lead to a functional snippet spanning at most 4 years. In our study, individuals who have only one measurement were excluded, since they do not carry information for the covariance structure. This results in a total of 280 individuals who have at least two measurements and whose ages range from 8.8 to 26.2.

We are interested in the mean and covariance structure of the mineral density. Figure 2(a) depicts the empirical design of the covariance function, underscoring the nature of these data as a collection of snippets: there is no data available to directly infer the off-diagonal region of the covariance structure. We also note that the design time points are irregular. This feature renders techniques based on matrix completion inapplicable since they require a regular design for the measurement time points. In contrast, our method is able to seamlessly accommodate this irregularity.

The mineral density data and the estimated mean are displayed in Figure 1(a) and 1(b), respectively. We observe that the mean density starts with a low level, rises rapidly before the age of 16 and then goes up relatively slowly to a peak at the age of around 20. This indicates that the spinal bone mineral accumulates fast during adolescence during which rapid physical growth and psychological changes occur, and then the density remains in a stable high level in the early 20s. From Figure 1(a), we see that observations are relatively sparse from age 23 to age 26, especially near the boundary at age 26. Therefore, we suspected that the upward trend around age 26 might be due to a boundary effect, in particular the marked rightmost point in Figure 1(a). To check this, we refitted the data with this point excluded. The refitted mean curve is in Figure 1(c), and indeed the upward trend disappears. The estimated covariance surface, after removing the rightmost point, is shown in Figure 2(b), which suggests larger variability of the data around the age of 17. It also indicates that the correlation of the longitudinal mineral density at different ages decays drastically as ages become more distant.

(a)
(b)
(c)
Figure 1: (a) the spinal bone mineral density data. (b) the estimated mean function. (c) the estimate mean function when the rightmost point in the left panel is removed from the data.
(a)
(b)
Figure 2: (a) the empirical design of the covariance structure of the snippets of the spinal bone mineral density. (b) the estimated covariance function of the spinal bone mineral density.

Appendix A: Fourier Extension

For a nonperiodic function , its finite Fourier series expansion is observed to suffer from the so-called Gibbs phenomenon (Zygmund, 2003) which refers to the drastic oscillatory overshoot in the region close to two endpoints of the domain. A remedy to this issue is to employ the technique of Fourier extension mentioned in Example 1.

The idea of Fourier extension is to approximate a nonperiodic function by Fourier basis functions defined in an extended domain for some called extension margin in this paper, where the basis functions, are then defined by , and for . To elaborate, let . The Fourier extension of within , denoted by , is defined by

where we note that the norm is for the domain , not the extended one. One can easily see that the Fourier extension of is not unique. However, they all have the same approximation quality for over the domain of interest. Intuitively, for a Fourier extension of , even if it has the Gibbs phenomenon, for a suitable , the phenomenon is expected to be restricted within the extended part of the domain, i.e., and . This way, the nonperiodic function can then be well approximated in the domain . Indeed, the speed of the convergence of a Fourier extension of in the domain adapts to the smoothness of (Adcock et al., 2014). For example, converges to at the rate of ( for some , respectively) when is th-differentiable (analytic, respectively).

The above discussion can be straightforwardly extended to the two-dimensional case. Let , where represents the function defined on the two-dimensional square . For a function defined on , its Fourier extension within is given by

where denotes the space of squared integrable functions defined on with the norm .

References

  • Adcock et al. (2014) Adcock, B., Huybrechs, D. and Martín-Vaquero, J. (2014). On the numerical stability of fourier extensions. Foundations of Computational Mathematics 14 635–687.
  • Aneiros et al. (2019) Aneiros, G., Cao, R., Fraiman, R., Genest, C. and Vieu, P. (2019).

    Recent advances in functional data analysis and high-dimensional statistics.

    Journal of Multivariate Analysis

    170 3–9.
  • Bachrach et al. (1999) Bachrach, L. K., Hastie, T., Wang, M.-C., Narasimhan, B. and Marcus, R. (1999).

    Bone mineral acquisition in healthy asian, hispanic, black, and caucasian youth: A longitudinal study.

    The Journal of Clinical Endocrinology & Metabolism 84 4702–4712.
  • Boyd (2002) Boyd, J. P. (2002). A comparison of numerical algorithms for fourier extension of the first, second, and third kinds. Journal of Computational Physics 178 118–160.
  • Cai and Yuan (2010) Cai, T. and Yuan, M. (2010). Nonparametric covariance function estimation for functional and longitudinal data. Technical report, University of Pennsylvania.
  • Cai and Yuan (2011) Cai, T. and Yuan, M. (2011).

    Optimal estimation of the mean function based on discretely sampled functional data: Phase transition.

    The Annals of Statistics 39 2330–2355.
  • Canuto and Quarteroni (1982) Canuto, C. and Quarteroni, A. (1982). Approximation results for orthogonal polynomials in Sobolev spaces. Mathematics of Computation 38 67–86.
  • Cardot (2000) Cardot, H. (2000). Nonparametric estimation of smoothed principal components analysis of sampled noisy functions. Journal of Nonparametric Statistics 12 503–538.
  • Cardot et al. (1999) Cardot, H., Ferraty, F. and Sarda, P. (1999). Functional linear model. Statistics & Probability Letters 45 11–22.
  • Dauxois et al. (1982) Dauxois, J., Pousse, A. and Romain, Y. (1982).

    Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference.

    Journal of Multivariate Analysis 12 136–154.
  • Dawson and Müller (2018) Dawson, M. and Müller, H.-G. (2018).

    Dynamic modeling of conditional quantile trajectories, with application to longitudinal snippet data.

    Journal of the American Statistical Association 113 1612–1624.
  • Delaigle and Hall (2016) Delaigle, A. and Hall, P. (2016). Approximating fragmented functional data by segments of Markov chains. Biometrika 103 779–799.
  • Descary and Panaretos (2019) Descary, M.-H. and Panaretos, V. M. (2019). Recovering covariance from functional fragments. Biometrika 106 145–160.
  • Ferraty and Vieu (2006) Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis: Theory and Practice. Springer-Verlag, New York.
  • Gellar et al. (2014) Gellar, J. E., Colantuoni, E., Needham, D. M. and Crainiceanu, C. M. (2014). Variable-domain functional regression for modeling icu data. Journal of the American Statistical Association 109 1425–1439.
  • Goldberg et al. (2014) Goldberg, Y., Ritov, Y. and Mandelbaum, A. (2014). Predicting the continuation of a function with applications to call center data. Journal of Statistical Planning and Inference 147 53–65.
  • Gromenko et al. (2017) Gromenko, O., Kokoszka, P. and Sojka, J. (2017). Evaluation of the cooling trend in the ionosphere using functional regression with incomplete curves. The Annals of Applied Statistics 11 898–918.
  • Hall and Horowitz (2007) Hall, P. and Horowitz, J. L. (2007).

    Methodology and convergence rates for functional linear regression.

    The Annals of Statistics 35 70–91.
  • Hall and Hosseini-Nasab (2009) Hall, P. and Hosseini-Nasab, M. (2009). Theory for high-order bounds in functional principal components analysis. Mathematical Proceedings of the Cambridge Philosophical Society 146 225–256.
  • Horváth and Kokoszka (2012) Horváth, L. and Kokoszka, P. (2012). Inference for functional data with applications. Springer Series in Statistics. Springer.
  • Hsing and Eubank (2015) Hsing, T. and Eubank, R. (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley.
  • Huybrechs (2010) Huybrechs, D. (2010). On the fourier extension of non-periodic functions. SIAM Journal on Numerical Analysis 47 4326–4355.
  • James et al. (2000) James, G. M., Hastie, T. J. and Sugar, C. A. (2000). Principal component models for sparse functional data. Biometrika 87 587–602.
  • Kneip and Liebl (2019) Kneip, A. and Liebl, D. (2019+). On the optimal reconstruction of partially observed functional data. The Annals of Statistics to appear.
  • Kokoszka and Reimherr (2017) Kokoszka, P. and Reimherr, M. (2017). Introduction to Functional Data Analysis. Chapman and Hall/CRC.
  • Kong et al. (2016) Kong, D., Xue, K., Yao, F. and Zhang, H. H. (2016). Partially functional linear regression in high dimensions. Biometrika 103 147–159.
  • Krantz and Parks (2002) Krantz, S. G. and Parks, H. R. (2002). A primer of real analytic functions. Springer.
  • Kraus (2015) Kraus, D. (2015). Components and completion of partially observed functional data. Journal of Royal Statistical Society: Series B (Statistical Methodology) 77 777–801.
  • Kraus and Stefanucci (2019) Kraus, D. and Stefanucci, M. (2019).

    Classification of functional fragments by regularized linear classifiers with domain selection.

    Biometrika 106 161–180.
  • Li and Hsing (2010) Li, Y. and Hsing, T. (2010). Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. The Annals of Statistics 38 3321–3351.
  • Liebl (2013) Liebl, D. (2013). Modeling and forecasting electricity spot prices: A functional data perspective. The Annals of Applied Statistics 7 1562–1592.
  • Liebl and Rameseder (2019) Liebl, D. and Rameseder, S. (2019). Partially observed functional data: The case of systematically missing parts. Computational Statistics & Data Analysis 131 104–115.
  • Lin and Wang (2019) Lin, Z. and Wang, J.-L. (2019). Mean and covariance estimation for functional snippets. Technical report, University of California, Davis.
  • Mas and Ruymgaart (2015) Mas, A. and Ruymgaart, F. (2015). High-dimensional principal projections. Complex Analysis and Operator Theory 9 35–63.
  • Mojirsheibani and Shaw (2018) Mojirsheibani, M. and Shaw, C. (2018). Classification with incomplete functional covariates. Statistics & Probability Letters 139 40–46.
  • Müller and Stadtmüller (2005) Müller, H.-G. and Stadtmüller, U. (2005). Generalized functional linear models. The Annals of Statistics 33 774–805.
  • Müller and Yao (2008) Müller, H.-G. and Yao, F. (2008). Functional additive models. Journal of the American Statistical Association 103 1534–1544.
  • Olson (2017) Olson, T. (2017). Applied Fourier Analysis, volume 2017. Springer.
  • Ramsay and Silverman (2005) Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. Springer Series in Statistics, 2nd edition. Springer, New York.
  • Rao (1958) Rao, C. R. (1958). Some statistical methods for comparison of growth curves. Biometrics 14 1–17.
  • Remmert (1997) Remmert, R. (1997). Classical Topics in Complex Function Theory. Springer.
  • Rice and Silverman (1991) Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society. Series B 53 233–243.
  • Rice and Wu (2001) Rice, J. A. and Wu, C. O. (2001). Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57 253–259.
  • Stefanucci et al. (2018) Stefanucci, M., Sangalli, L. M. and Brutti, P. (2018). PCA-based discrimination of partially observed functional data, with an application to aneurisk65 data set. Statistica Neerlandica 72 246–264.
  • Wang and Xiang (2011) Wang, H. and Xiang, S. (2011). On the convergence rates of Legendre approximation. Mathematics of Computation 81 861–877.
  • Wang et al. (2016) Wang, J.-L., Chiou, J.-M. and Müller, H.-G. (2016). Review of functional data analysis. Annual Review of Statistics and Its Application 3 257–295.
  • Yao et al. (2005) Yao, F., Müller, H.-G. and Wang, J.-L. (2005).

    Functional linear regression analysis for longitudinal data.

    The Annals of Statistics 33 2873–2903.
  • Zhang and Chen (2018) Zhang, A. and Chen, K. (2018). Nonparametric covariance estimation for mixed longitudinal studies, with applications in midlife women’s health. arXiv .
  • Zhang and Wang (2016) Zhang, X. and Wang, J.-L. (2016). From sparse to dense functional data and beyond. The Annals of Statistics 44 2281–2321.
  • Zhang and Wang (2018) Zhang, X. and Wang, J.-L. (2018). Optimal weighting schemes for longitudinal and functional data. Statistics & Probability Letters 138 165–170.
  • Zygmund (2003) Zygmund, A. (2003). Trigonometric Series. Cambridge University Press.