Basis Expansions for Functional Snippets

05/16/2019 ∙ by Zhenhua Lin, et al. ∙ University of California-Davis Tsinghua University 0

Estimation of mean and covariance functions is fundamental for functional data analysis. While this topic has been studied extensively in the literature, a key assumption is that there are enough data in the domain of interest to estimate both the mean and covariance functions. In this paper, we investigate mean and covariance estimations for functional snippets in which observations from a subject are available only in an interval of length strictly (and often much) shorter than the length of the whole interval of interest. For such a sampling plan, no data is available for direct estimation of the off-diagonal region of the covariance function. We tackle this challenge via a basis representation of the covariance function. The proposed approach allows one to consistently estimate an infinite-rank covariance function from functional snippets. We establish the convergence rates for the proposed estimators and illustrate their finite-sample performance via simulation studies and an application to spinal bone mineral density.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Nowadays functional data are commonly encountered in practice, due to the advances in modern science and technology that enhance capability of data collection, storage and processing. Both unsupervised learning approaches, such as dimension reduction via functional principal component analysis

(Rao, 1958; Dauxois et al., 1982; Hall and Hosseini-Nasab, 2009; Mas and Ruymgaart, 2015)

, and supervised learning, such as functional regression

(Cardot et al., 1999; Müller and Stadtmüller, 2005; Ferraty and Vieu, 2006; Hall and Horowitz, 2007; Müller and Yao, 2008; Kong et al., 2016) are well studied in the literature. For a comprehensive treatment of these subjects, we recommend the monographs Ramsay and Silverman (2005), Ferraty and Vieu (2006), Horváth and Kokoszka (2012), Hsing and Eubank (2015) and Kokoszka and Reimherr (2017), and the review papers Wang et al. (2016) and Aneiros et al. (2019).

Critical to the statistical analysis of such data is the estimation of the mean and covariance functions, since they are the foundation of the aforementioned unsupervised and supervised learning tasks. In reality, functions can only be recorded on a set of discrete points of the domain of the functions. Estimation of the mean and covariance functions in this context has been extensively studied by Rice and Silverman (1991), Cardot (2000), James et al. (2000), Cai and Yuan (2010), Cai and Yuan (2011), Yao et al. (2005), Li and Hsing (2010) and Zhang and Wang (2016), among many others. In addition to the discrete nature of observed functional data, subjects often stay in the study only for a subject specific period that is much shorter than the span of the whole study. This does not cause much problems for mean estimation but brings challenges to covariance estimation.

For illustration, without loss of generality, we assume that the domain of the functional data is the unit interval and each subject only stays in the study for a period of length . Data of this characteristics are termed “functional snippets”, which are analogous to the longitudinal snippets analyzed in Dawson and Müller (2018). For such data, there is no information in the off-diagonal region of the covariance function Cov, and therefore there is no local information available for estimating the covariance function in this region. Figure 2(a) illustrates such a situation for the bone mineral density data to be studied in Section 5.

The literature of statistical analysis for functional snippets is in its infancy. Delaigle and Hall (2016)

proposed to approximate snippets by segments of Markov chains. Such a method is only valid at the discrete level, as shown by

Descary and Panaretos (2019). More recently, to analyze functional snippets, Zhang and Chen (2018) and Descary and Panaretos (2019) used matrix completion techniques that innately work on a common grid of the domain . These approaches might be strained to deal with snippets recorded on random and irregular design points, which is often the actual data design, and their theoretical analysis only covers the case of the regular design. In addition, Descary and Panaretos (2019) assumed a finite-rank condition on the covariance function. Such condition essentially entails that functional data are of fixed and finite dimension, which does not conform with the common paradigm for functional data, which regards functional data as genuinely infinite-dimensional processes. Other related works such as Liebl (2013), Gellar et al. (2014), Goldberg et al. (2014), Kraus (2015), Gromenko et al. (2017), Kraus and Stefanucci (2019), Stefanucci et al. (2018), Mojirsheibani and Shaw (2018), Kneip and Liebl (2019) and Liebl and Rameseder (2019), although considering partially observed functional data, do not specifically address functional snippets as characterized above. For example, in Kraus (2015) and other works, information and design points for the off-diagonal region are still observable. In contrast, information for that region is completely missing in the context of functional snippets, which significantly elevates the difficulty of statistical analysis.

The above discussion suggests that estimation of the covariance function based on snippets is an extrapolation problem, which requires additional assumptions to overcome. Lin and Wang (2019)

adopted a parametric assumption on the correlation function but allowed the variance function to be estimated nonparametrically, while

Descary and Panaretos (2019) assumed analyticity of the covariance function. We shall demonstrate that, the classes considered in Descary and Panaretos (2019) and Lin and Wang (2019) are -identifiable families to be introduced in Section 2.2. More importantly, we show that -identifiable families go much beyond parametric and analytic classes. For instance, there are -identifiable families that have a finite order of differentiability; see Example 2. This substantially extends the scope of functional snippets that can be analyzed.

Under the umbrella of

-identifiability, we propose to approach functional snippets from the perspective of basis expansion. The main idea is to represent the covariance function by basis functions composed from tensor product of analytic orthonormal functions defined on

. Basis functions, in particular spline basis functions, have been extensively explored in both nonparametric smoothing and functional data analysis, such as Wahba (1990), Wood (2003), Rice and Wu (2001), Ramsay and Silverman (2005) and Crambes et al. (2009), among many others. Unlike spline bases that are controlled by knots, analytic bases are global, in the sense that they are independent of local information such as knots or design points and completely determined by their values on a countably infinite subset of the interval . This feature of analytic bases allows information to pass from the diagonal region to the off-diagonal region along the basis functions. With this analyticity of basis functions, the missing pieces of the covariance function can then be inferred from the data available in the diagonal region when the covariance function is from a -identifiable class. In contrast, this is generally impossible for B-spline or other local bases.

A distinct feature of the proposed method is its capacity to analyze genuine functional data of infinite dimension in the context of snippets. This is demonstrated under the umbrella of a certain family of functions, the bounded sequential compact family (BSC family), defined in Section 3. We show that, when the covariance function comes from a BSC family, the proposed method is able to estimate it consistently based on snippets that are irregularly and/or sparsely observed. As a BSC family can contain covariance functions of infinite rank, the proposed approach can process functional data of infinite dimensions. This separates our work from existing ones that only deal with functional snippets with a finite-rank covariance function. As another contribution, we propose a penalized basis approach to estimate the mean function and thoroughly analyze its asymptotic properties over a class of bases. To the best of our knowledge, this has not been explored in the literature yet.

The rest of the paper is organized as follows. Section 2 details the proposed estimators for the mean function and covariance function. Their theoretical properties are investigated in Section 3, while their numerical performance is assessed in Section 4. The application to a real dataset is discussed in Section 5.

2 Methodology

Let be a second-order stochastic process on a compact interval , which without loss of generality is taken to be . The mean and covariance functions of are defined as and respectively. The observed functions are statistically modeled as independent and identically distributed realizations of . In practice, each realization is only recorded at subject-specific time points with measurement errors. More precisely, for functional snippets, the observed data are pairs , where

(1)

is the random noise with mean zero and unknown variance , and there is a for which for all , and . The focus of this paper is to estimate the mean and covariance functions of using these data pairs .

2.1 Mean Function

Although functional snippets pose a challenge for covariance estimation, they usually do not obstruct mean estimation, since data for the estimation are available likely across the whole domain of interest. In light of this observation, traditional methods such as local linear smoothing (Yao et al., 2005; Li and Hsing, 2010) can be employed. Below we adopt an analytic basis expansion approach. The advantage of this approach is its computational efficiency and adaptivity to the regularity of the underlying mean function ; see also Section 3.2.

Let be a complete orthonormal basis of that consists of squared integrable functions defined on the interval . When , it can be represented by the following series in terms of the basis ,

where . In practice, one often approximates such series by its first leading terms, where is a tuning parameter controlling the approximation quality. The coefficients are then estimated from data by penalized least squares. Specifically, with the notation and , the estimator of is given by

(2)

and is estimated by , where the weights satisfy , represents the roughness penalty, and is a tuning parameter that provides trade-off between the fidelity to the data and the smoothness of the estimate. There are two commonly used schemes for the weights, equal weight per observation (OBS) and equal weight per subject (SUBJ), of which the weights are and , respectively. Alternative weight schemes can be also found in Zhang and Wang (2016, 2018).

The penalty term in (2) is introduced to prevent excessive variability of the estimator when a large number of basis functions are required to adequately approximate

and when the sample size is not sufficiently large. In the asymptotic analysis of

in Section 3, we will see that this penalty term contributes little to the convergence rate of when the tuning parameter is not too large. In our study, the roughness penalty is given by

where denotes the second derivative of . In matrix form, for , it equals , where is a matrix with elements . The tuning parameters and can be selected by a cross-validation procedure.

2.2 Covariance Function

Since functional snippets do not provide any direct information at the off-diagonal region, the only way to recover the covariance in the off-diagonal region is to infer it from the diagonal region. Thus, as a basic requirement for identifiability of the covariance function, the covariance function under consideration is assumed to come from a family of covariance functions whose values on the diagonal region uniquely identify them within the family . Such a family is called a -identifiable family in this paper. The following examples suggest that -identifiable families are fairly ubiquitous.

Example 1 (Analytic functions).

A function is analytic if it can be locally represented by a convergent power series. According to Corollary 1.2.7 of Krantz and Parks (2002), if two analytic functions agree on , then they are identical on . Thus, the family of analytic functions is a -identifiable family. Indeed, the family of analytic covariance functions of fixed and finite rank, as a subfamily of the analytic family, was considered by Descary and Panaretos (2019). Also note that the space of the infinitely differentiable function space is not -identifiable; see the counterexample provided by Descary and Panaretos (2019).

Example 2 (Sobolev sandwich families).

For any , consider the family of continuous functions that belong to a two-dimensional Sobolev space on and are analytic elsewhere. Such functions have an -times differentiable diagonal component sandwiched between two analytic off-diagonal pieces. The family is -identifiable, because the values of such functions on the off-diagonal region are fully determined by the values on the uncountable set according to Corollary 1.2.7 of Krantz and Parks (2002). Note that this family contains functions with derivatives only up to a finite order.

Example 3 (Semiparametric families).

Consider the family of functions of the form , where is a function from a nonparametric class and is from a parametric class of correlation functions. Such family, considered in Lin and Wang (2019), is generally -identifiable as long as both and are identifiable. For instance, when is a one-dimensional Sobolev space and is the class of Matérn correlation functions, the family is a -identifiable family. Note that no analyticity is assumed in this family.

With the assumed -identifiability of the family , it is now possible to infer the off-diagonal region by the information contained in the raw covariance available only in the diagonal region. To this end, we propose to transport information from the diagonal region to the off-diagonal region through the basis functions , by approximating with

(3)

where , is the matrix of coefficients , and is an integer. There are countless bases that can serve in (3); however, if we choose an analytic basis , then their values in the diagonal region completely determine their values in the off-diagonal region. When such a representation of the covariance function is adopted and the unknown coefficients are estimated from data, the information contained in the estimated coefficients extends from the diagonal region to the off-diagonal region through the analyticity of the basis functions.

To estimate the coefficients from data, we adopt the idea of penalized least squares, where the squared loss of a given function is measured by the sum of weighted squared errors

where are the weights satisfying , while the roughness penalty is given by

Then, one has

(4)

for as of (3), where denotes the matrix trace, and are matrices with elements and , respectively. The estimator of is then taken as with

(5)

where are weights assigned to the th subject, is the tuning parameter that provides trade-off between the fidelity to the data and the smoothness of the estimate.

Similar to (2), the penalty term in (5) does not contribute to the convergence rate of when the tuning parameter is not too large. In practice, the value of can be chosen via cross-validation. For the choice of the weights , Zhang and Wang (2016) discussed several weighing schemes, including the OBS scheme and the SUBJ scheme . An optimal weighing scheme was proposed in Zhang and Wang (2018); we refer to this paper for further details.

3 Theory

As functional snippets are often sparsely recorded, in the sense that for all and some , in this section we focus on theoretical analysis tailored to this scenario. For simplicity, we assume that the number of observations for each trajectory is equal, i.e., for all . Note that under this assumption, the SUBJ and OBJ schemes coincide. The results for general number of observations and weight schemes can be derived in a similar fashion. We start with a discussion on the choice of basis functions and then proceed to study the convergence rates of the estimated mean and covariance functions.

3.1 Analytic Basis

While all complete orthonormal bases can be used for the proposed estimator in (2), an analytic basis is preferred for the estimator in (5). For a clean presentation, we exclusively consider analytic bases that work for both (2) and (5). In this paper, a basis is called an analytic -basis if its basis functions are all analytic and satisfy the following property: for some constants , there exists a constant such that and for and all . Here, denotes the supremum norm of , defined by , and represents the th derivative of . Throughout this paper, we assume that the basis is an analytic -basis.

Different bases lead to different convergence rates of the approximation to and . For the mean function , when using the first basis functions , the approximation error is quantified by

where we recall that . The convergence rate of the error , denoted by , signifies the approximation power of the basis for . Similarly, the approximation error for is measured by

where the norm of a function is defined by . The convergence rate of is denoted by . Below we discuss two examples of bases.

Example 4 (Fourier basis).

The Fourier basis functions, defined by , and for , constitute a complete orthonormal basis of for . It is also an analytic -basis. When is periodic on and belongs to the Sobolev space (see Appendix A.11.a and A.11.d of Canuto et al. (2006) for the definition), then, according to Eq. (5.8.4) of Canuto et al. (2006) one has . Similarly, if is periodic and belongs to , then . When or is not a periodic function, a technique called Fourier extension, briefly described in Appendix A, can be adopted to yield the same rate (Adcock et al., 2014). This technique is well studied in the field of computational physics (Boyd, 2002) and numerical analysis (Huybrechs, 2010) as a tool to overcome the so-called Gibbs phenomenon (Zygmund, 2003), but seems not well explored in statistics yet. In Section 4, we numerically illustrate the application of this technique.

Example 5 (Legendre polynomials).

The canonical Legendre polynomial of degree is defined on by

These polynomials are orthogonal in . By a change of variable and normalization, they can be turned into an orthonormal basis of . The Legendre polynomials have numerous applications in approximation theory and numerical integration; see Wang and Xiang (2011) and references therein. One can show that the Legendre basis is an analytic -basis. According to Eq (5.8.11) of Canuto et al. (2006), one has and when belongs to and belongs to , respectively.

3.2 Mean function

For functional snippets, we shall assume that the observations from a subject scatter randomly in a subject specific time interval, whose length is and whose middle point is called the reference time in this paper. We further assume that the reference time of the th subject are independently and identically distributed in the interval , and the observed time points , conditional on , are independently and identically distributed in the interval .

To study the property of the estimator , we make the following assumptions.

[labelwidth=1.2cm,leftmargin=1.4cm,align=left]

A.1

There exist such that the density of the reference time satisfies for any . There exist , such that the conditional density of the observed time satisfies , for any given reference time and .

[labelwidth=1.2cm,leftmargin=1.4cm,align=left]

A.2

for some constant .

[labelwidth=1.2cm,leftmargin=1.4cm,align=left]

A.3

and .

Assumption A.1 requires the density of the reference time and conditional densities of the time points to be bounded away from zero and infinity. This also guarantees that the marginal probability density of the time points

is bounded away from zero and infinity. In the sequel, we use to denote .

Theorem 1.

If is a -basis, conditions (A.1)–(A.3) imply that

(6)

We first note that, with condition A.3 on the tuning parameter , it does not impact the asymptotic rate of . We also observe that in (6), the term specifies the estimation error using a finite sample, while is the deterministic approximation error for using only the first basis functions. The latter term depends on the smoothness of . Intuitively, it is easier to approximate smooth functions with basis functions. For a fixed number of basis functions, smoother functions generally yield smaller approximation errors. As discussed in Example 4 and 5, when belongs to the Sobolev space , i.e., is times differentiable, we have , and this leads to the following convergence rate, using either the Fourier basis or the Legendre basis in this scenario.

Corollary 1.

Suppose exists and satisfies for some . Assume conditions (A.1)–(A.3) hold. If is the Fourier basis, then with the choice . If is the Legendre basis, then with the choice .

The above corollary shows that the proposed analytic basis expansion approach automatically adapts to the smoothness of . This contrasts with the local polynomial smoothing method and the B-spline basis approach, for which the convergence rate is limited by the order of the polynomials or B-spline basis functions used in the estimation, even when might have a higher order of smoothness. In practice, it is not easy to determine the right order for these methods, since both the mean function and its smoothness are unknown.

3.3 Covariance function

In Section 2 we assumed to reside in a -identifiable family in order to meet a basic criterion of identifiability. To study the asymptotic properties of the covariance estimator, we require the family to satisfy an additional condition as described below. Recall that a topological space is sequentially compact if very sequence has a convergent subsequence converging to a point in the space. Let be the space of all real-valued functions defined on endowed with the product topology. In this topology, a sequence of functions converges to a limit if and only if

In this paper, a subset of is called a bounded sequentially compact family (BSC family) if every sequence in that is uniformly bounded in the norm, i.e., for some , has a subsequence converging to a limit in in the product topology. Note that a sequentially compact subset is a BSC family. We shall assume that the family is also a BSC family. The following example, exhibiting a BSC family that contains covariance functions of infinite rank, indicates the abundance of BSC families and demonstrates that our approach covers infinite-rank covariance functions.

Example 6 (Bounded Sobolev sandwich families).

Let be fixed but potentially arbitrarily large constants. Let be the subfamily of the -identifiable family introduced in Example 2 such that, if then and the Lipschitz constant of is no larger than , where the Lipschitz constant of is defined as . We claim that is a -identifiable BSC family. To see this, we note that is locally equicontinuous, and also the set is bounded for all . Then the claim follows from Arzelà-Ascoli Theorem in Chapter 7 of Remmert (1997). We also note that similar BSC families can be derived as subfamilies of the families introduced in Example 1 and 3. All of these BSC subfamilies clearly contain countless covariance functions of infinite rank.

Formally, we shall assume the following conditions.

[labelwidth=1.2cm,leftmargin=1.4cm,align=left]

B.1

The covariance function belongs to a -identifiable BSC family .

[labelwidth=1.2cm,leftmargin=1.4cm,align=left]

B.2

The random function satisfies that .

[labelwidth=1.2cm,leftmargin=1.4cm,align=left]

B.3

and as .

Since a -identifiable BSC family, such as in Example 6, may contain covariance functions of infinite rank, the theory developed below applies to functional snippets of infinite dimensions. To avoid the entanglement with the error from mean function estimation, we shall assume that is known in the following discussion, noting that the case that is unknown can also be covered but requires a much more involved presentation and tedious technical details and thus is not pursued here.

Theorem 2.

If is an analytic -basis, under assumptions A.1 and B.1–B.3, we have

(7)

Note that, with the condition B.3 on , the tuning parameter does not impact the asymptotic rate of . As in the case of mean function, the rate in (7) contains two components, the estimation error stemming from the finiteness of the sample, and the approximation bias attributed to the finiteness of the number of basis functions being used in the estimation. When the Fourier basis or Legendre basis is used, we have the following convergence rate for times differentiable covariance functions.

Corollary 2.

Suppose assumptions A.1 and B.1–B.3 hold, and belongs to the Sobolev space for some . If is the Fourier basis and is periodic, then with , one has . If is the Legendre basis, then with , one has .

4 Simulation Studies

We now illustrate the numerical performance of the proposed approach using the Fourier basis. In addition to excellent numerical stability, as mentioned in Example 4, with the Fourier extension technique to deal with nonperiodic functions, this basis can approximate an th-differentiable function at the rate of when the first basis functions are used. Moreover, it is a -basis and thus enjoys the favorable convergence rates established in Corollaries 1 and 2. As the Fourier extension technique is not well studied in statistics, below we also demonstrate its numerical performance.

For the mean function, we consider two scenarios, and . The former is a periodic function while the latter is nonperiodic. For the covariance function, we consider the periodic covariance function with and if and if , and the nonperiodic and nonsmooth covariance function that is determined by a Matérn correlation function and the variance function , i.e.,

where is the modified Bessel function of the second kind of order . In the evaluation of the performance for the mean function, the covariance function is fixed to be , while the mean function is fixed to be in the evaluation of the covariance function. This strategy avoids the bias from covariance to influence the estimation of the mean function, and vice versa.

The estimation quality is measured by the empirical mean integrated squared error (MISE) based on independent simulation replicates. For the mean estimator , the MISE is defined by

and for the covariance estimator , it is defined by

where and are estimators in the th simulation replicate. The tuning parameters , , , and the extension margin (see Appendix A) are selected by a five-fold cross-validation procedure.

In all replicates, the reference time

are sampled from a uniform distribution on

. The number of observations are independent and follow the distribution . The measurement noise are i.i.d. sampled from a Guassian distribution with mean zero and variance , where the noise level is set to make the signal-to-noise ratio . We consider three sample sizes, , and two different values, , representing short snippets and long snippets, respectively.

The results are summarized in Table 1 and 2 for mean and covariance functions, respectively. As expected, in all settings, the performance of the estimators improves as or becomes larger. We also observe that if the function to be estimated is periodic, like and , the performance seems marginally better without Fourier extension. However, if the function is nonperiodic, like and , then the estimators with Fourier extension considerably outperform those without the extension, especially for the mean function or when the sample size is large. This demonstrates that Fourier extension is a rather effective technique that complements the Fourier basis for nonparametric smoothing, and might deserve further investigation in the framework of statistical methodology.

FE 7.28 (5.29) 3.26 (1.79) 1.69 (0.88)
4.94 (3.23) 2.30 (1.31) 1.13 (0.83)
6.75 (5.25) 3.39 (2.35) 1.85 (1.23)
4.63 (3.43) 2.48 (1.69) 1.19 (0.94)
NFE 6.86 (4.94) 2.93 (1.69) 1.49 (0.81)
4.63 (3.16) 2.12 (1.19) 1.07 (0.75)
12.43 (4.77) 8.15 (2.43) 6.02 (1.31)
9.81 (3.28) 7.10 (1.91) 5.51 (1.25)
Table 1:

MISE of the proposed estimator for the mean function. The MISE and their Monte Carlo standard errors in this table are scaled by 100 for a clean presentation. FE refers to Fourier basis with Fourier extension, while NFE refers the basis without the extension.

FE 6.01 (1.93) 5.28 (1.33) 4.75 (0.89)
4.07 (3.24) 2.62 (1.36) 2.03 (0.74)
2.61 (1.92) 1.40 (0.86) 1.01 (0.57)
1.97 (1.66) 0.96 (0.67) 0.60 (0.43)
NFE 6.00 (2.02) 5.24 (1.44) 4.68 (0.97)
3.90 (3.43) 2.47 (1.44) 1.86 (0.68)
3.15 (2.30) 2.01 (1.09) 1.54 (0.67)
2.42 (1.78) 1.43 (0.74) 1.10 (0.43)
Table 2: MISE of the proposed estimator for the covariance function. The MISE and their Monte Carlo standard errors in this table are scaled by 10 for a clean presentation. FE refers to Fourier basis with Fourier extension, while NFE refers the basis without the extension.

5 Application: Spinal Bone Mineral Density

In the study of Bachrach et al. (1999), 423 individuals, age 8 to 27, were examined for their longitudinal spinal bone mineral density. The bone density of each individual was irregularly recorded in four consecutive years, at most once for each year. The data for each individual then lead to a functional snippet spanning at most 4 years. In our study, individuals who have only one measurement were excluded, since they do not carry information for the covariance structure. This results in a total of 280 individuals who have at least two measurements and whose ages range from 8.8 to 26.2.

We are interested in the mean and covariance structure of the mineral density. Figure 2(a) depicts the empirical design of the covariance function, underscoring the nature of these data as a collection of snippets: there is no data available to directly infer the off-diagonal region of the covariance structure. We also note that the design time points are irregular. This feature renders techniques based on matrix completion inapplicable since they require a regular design for the measurement time points. In contrast, our method is able to seamlessly accommodate this irregularity.

The mineral density data and the estimated mean are displayed in Figure 1(a) and 1(b), respectively. We observe that the mean density starts with a low level, rises rapidly before the age of 16 and then goes up relatively slowly to a peak at the age of around 20. This indicates that the spinal bone mineral accumulates fast during adolescence during which rapid physical growth and psychological changes occur, and then the density remains in a stable high level in the early 20s. From Figure 1(a), we see that observations are relatively sparse from age 23 to age 26, especially near the boundary at age 26. Therefore, we suspected that the upward trend around age 26 might be due to a boundary effect, in particular the marked rightmost point in Figure 1(a). To check this, we refitted the data with this point excluded. The refitted mean curve is in Figure 1(c), and indeed the upward trend disappears. The estimated covariance surface, after removing the rightmost point, is shown in Figure 2(b), which suggests larger variability of the data around the age of 17. It also indicates that the correlation of the longitudinal mineral density at different ages decays drastically as ages become more distant.

(a)
(b)
(c)
Figure 1: (a) the spinal bone mineral density data. (b) the estimated mean function. (c) the estimate mean function when the rightmost point in the left panel is removed from the data.
(a)
(b)
Figure 2: (a) the empirical design of the covariance structure of the snippets of the spinal bone mineral density. (b) the estimated covariance function of the spinal bone mineral density.

Appendix A: Fourier Extension

For a nonperiodic function , its finite Fourier series expansion is observed to suffer from the so-called Gibbs phenomenon (Zygmund, 2003) which refers to the drastic oscillatory overshoot in the region close to two endpoints of the domain. A remedy to this issue is to employ the technique of Fourier extension mentioned in Example 4.

The idea of Fourier extension is to approximate a nonperiodic function by Fourier basis functions defined in an extended domain for some called extension margin in this paper, where the basis functions, are then defined by , and for . To elaborate, let . The Fourier extension of within , denoted by , is defined by

where we note that the norm is for the domain , not the extended one. One can easily see that the Fourier extension of is not unique. However, they all have the same approximation quality for over the domain of interest. Intuitively, for a Fourier extension of , even if it has the Gibbs phenomenon, for a suitable , the phenomenon is expected to be restricted within the extended part of the domain, i.e., and . This way, the nonperiodic function can then be well approximated in the domain . Indeed, the speed of the convergence of a Fourier extension of in the domain adapts to the smoothness of (Adcock et al., 2014). For example, converges to at the rate of ( for some , respectively) when is th-differentiable (analytic, respectively).

The above discussion can be straightforwardly extended to the two-dimensional case. Let , where represents the function defined on the two-dimensional square . For a function defined on , its Fourier extension within is given by

where denotes the space of squared integrable functions defined on with the norm .

Appendix B: Technical Proofs

Notation.

We collect the notation that has been used in the above or to be used below. Without loss of generality, we assume . We use

to denote the column vector

, and to denote its value evaluated at . The norm of a vector is denoted by . When is viewed as a linear functional, its operator norm is denoted by . Note that . For a matrix , denotes its induced operator norm, while denotes its Frobenius norm. For a function defined on , its norm, denoted by , is defined by . For a covariance function , we use to denote its norm that is defined by .

Proof of Theorem 1.

In the sequel, we use to denote the vector of the coefficients of with respect to the basis functions .

First, we observe that, for all . Let , and . Define

Then we observe that

(8)

It is easy to check that . Thus, and . According to Claim 1 and 2, we have that, for any , there exists , , and , such that for all ,

  • , and

  • for all , , and

  • for all , ,

where denotes the sample space. With the choice of and , we can see that on for all with for a sufficiently large and fixed constant . Therefore, with probability tending to one, the minimizer of falls into the ball . Now,

and the proof of the theorem is completed. ∎

Claim 1.

For any , there exists , , and , such that for all ,

  • , and

  • for all , , where is given in (8).

Proof.

We first observe that

Now we consider the first term