In the past three decades functional data analysis has received increasing interest due to the possibility of recording and storing data collected with high frequency and/or high resolution in time and space. Many different methods have been developed to study these particular data objects; for overviews of some recent developments in this fast growing field we refer the reader to the review articles Cuevas (2014) and Wang et al. (2016).
Currently, the literature on construction of simultaneous confidence bands (SCBs) for functional parameters derived from repeated observations of functional processes is sparse. The existing methods basically split into two seperate groups. The first group is based on functional central limit theorems (fCLTs) in the Banach space of continuous functions endowed with the maximum metric and evaluation of the maximum of the limiting Gaussian process often using Monte-Carlo simulations with an estimated covariance structure of the limit process, cf. Bunea et al. (2011); Degras (2011, 2017); Cao et al. (2012, 2014). The second group is based on the bootstrap, among others Cuevas et al. (2006); Chernozhukov et al. (2013); Chang et al. (2017), Belloni et al. (2018).
Although these methods are asymptotically achieving the correct covering probabilities, their small sample performance is often less impressive as for example discovered inCuevas et al. (2006). Figure 1 shows the typical behaviour observed in our simulations of the covering rates of SCBs for the (smoothed) population mean using replications of a (smoothed) Gaussian process with an unknown non-isotropic covariance function. The general pattern is that SCBs using a non-parametric bootstrap- yield too wide confidence bands and thus have overcoverage, whereas a functional Gaussian multiplier bootstrap inspired by Chernozhukov et al. (2013) and Degras (2011) asymptotic SCBs lead to undercoverage. We used here SCBs for the smoothed population mean, since the R-package SCBmeanfd by Degras does require smoothing.
Contrary to most of the current methods, which assume that each functional process is observed only at discrete points of their domain, we start from the viewpoint that the whole functional processes are observed and sufficiently smooth, since often the first step in data analysis is anyhow smoothing the raw data. This implies that we construct SCBs only for the smoothed population parameter and thereby circumventing the bias problem altogether to isolate the covering aspect. In principle, however, it is possible to apply our proposed method also to discretely observed data and inference on the true (non-smoothed) mean as for example in settings described in Zhang et al. (2007); Degras (2011); Zhang et al. (2016). But for the sake of simplicity we restrict ourselves to only robustify against the choice of a smoothing parameter by introducing SCBs for scale space surfaces, which were proposed in Chaudhuri and Marron (1999).
The main contribution of this paper is proposing the construction of SCBs for general functional parameters based on the Gaussian kinematic formula for -processes (tGKF) and studying its theoretical covering properties. Moreover, as an important case study we explain and prove how all the required assumptions can be satisfied for construction of SCBs for the population mean in functional signal-plus-noise models and scale spaces. A first glance on the improvement, especially for small samples, using the tGKF is shown in Figure 1.
A second contribution is that we compare to a studentized version of the multiplier bootstrap based on residuals, which we call multiplier- bootstrap (Mult-
), which also improves the small sample properties of the bootstrap SCBs considerably, if the multiplier is chosen to be Rademacher random variables. Since the scope of the paper is to analyse the theoretical properties of the use of the tGKF in construction of SCBs, we do not provide any rigorous mathematical theory about the Mult-bootstrap. Theoretical analysis of this bootstrap method, in particular the influence of the choice of multiplier, is an interesting opportunity for future research. For recent work on the performance of different multipliers in similar scenarios, see e.g., Deng and Zhang (2017) and Davidson and Flachaire (2008).
SCBs usually require estimation of the quantiles of the maximum of a process. Often this process is the maximum of a process derived from a functional central limit theorem (fCLT); thus the maximum of a Gaussian process. Our proposed method tries to resolve the poor covering rates for small sample sizes by imposing two small but significant changes. Firstly, we use a pointwise
-statistic, since for small sample sizes and unknown variance this would be the best choice to construct confidence bands for Gaussian random variables. Secondly, we employ a formula which is known to approximate the tail distribution of the maximum of a pointwise-distributed process very well. This formula is known as the Gaussian kinematic Formula for -processes (tGKF).
In a nutshell the tGKF as proven in even more generality in Taylor (2006) expresses the expected Euler characteristic (EEC) of the excursion set of a random process , , derived from unit variance Gaussian processes indexed by a nice compact subset of in terms of a finite sum of known functions with corresponding coefficients. These coefficients are called Lipschitz-Killing curvatures (LKCs) and depend solely on the random process . Earlier versions of this formula derived in Adler (1981) for Gaussian processes were used by Worsley et al. (1996),Worsley et al. (2004) in neuroscience for multiple testing corrections. Takemura and Kuriki (2002) have shown that the GKF (for Gaussian processes) is closely related to the Volume of Tubes formula dating all the way back to Working and Hotelling (1929) and which has been applied for SCBs in nonlinear regression analysis, e.g., Johansen and Johnstone (1990); Krivobokova et al. (2010); Lu and Kuriki (2017).
In this sense the version of the tGKF of Taylor (2006) can be interpreted as a generalization of the volume of tube formula for repeated observations of functional data. The most important and valuable aspect of the tGKF is that it is a non-asymptotic formula in the sample size . Hence it is suitable for applications with small sample sizes. Moreover, it has been proven in Taylor et al. (2005) that the EEC is a good approximation of the quantile of the maximum value of Gaussian- and -processs over their domain. Although it is not yet proven, simulations suggest that this property seems to be also true for more complicated Gaussian related processes like -processes. Therefore, it can be used to control the family wise error rate (FWER) as in Taylor and Worsley (2007); Taylor et al. (2009). To the best of our knowledge, the full potential of the tGKF for functional data seems to not have been explored yet in the statistical literature and we will try to fill this gap and tie together some loose strings for the challenge of constructing SCBs.
Our main theoretical contributions are the following. In Theorem 2 we show based on the main result in Taylor et al. (2005) that asymptotically the error in the covering rate of SCBs for a function-valued population parameter based on the GKF for -processes can be bounded and is small, if the targeted covering probability of the SCB is sufficiently high. This requires no Gaussianity of the observed processes. It only requires that the estimator of the targeted function-valued parameter fulfills a fCLT in the Banach space of continuous functions with a sufficiently regular Gaussian limit process. Moreover, it requires consistent estimators for the LKCs. Using this general result we derive SCBs for the population mean and the difference of population means for functional signal-plus-noise models, where we allow the error processes to be non-Gaussian. Especially we derive for such models defined over sufficiently regular domains , , consistent estimators for the LKCs. These estimators are closely related to the discrete estimators in Taylor and Worsley (2007), but we can even prove CLTs for our estimators. In order to deal with observation noise we give in Theorem 8 sufficient conditions to have weak convergence of a scale space process to a Gaussian limit extending the results from Chaudhuri and Marron (2000) from regression analysis to repeated observations of functional data. Additionally, we prove that also the LKCs of this limit process can be consistently estimated and therefore Theorem 2 can be used to bound the error in the covering rate for SCBs of the population mean of a scale space process.
These theoretical findings are accompanied by a simulation study using our Rpackage SCBfda, which can be found on https://github.com/ftelschow/SCBfda and a data application to diffusion tensor imaging (DTI) fibers using a scale space approach for the difference of population means. In the simulation study we compare the performance of the tGKF approach to SCBs for different error processes mainly with bootstrap approaches and conclude that the tGKF approach does not only often give better coverings for small sample sizes, but also outperforms bootstrap approaches computationally. Moreover, the average width of the tGKF confidence bands is lower for large sample sizes.
Organisation of the Article
Our article is organized as follows: In Section 2.1 we describe the general idea of construction of SCBs for functional parameters. Section 2.2 defines the functional signal-plus-noise model and general properties on the error processes, which are necessary to prove most of our theorems. The next section explains how SCBs can be constructed in praxis and especially explains how the tGKF can be used for this purpose. Section 4 finally proves asymptotic properties of the SCBs constructed using the tGKF for general functional parameters. The latter require consistent estimation of the LKCs, which are discussed for the functional signal-plus-noise model in Section 5 together with the new CLTs for the LKCs. Robustification using SCBs for Scale Space models can be found in 6. In Section 7 we compare our proposed method in different simulations to competing methods to construct SCBs in the case of the functional signal-plus-noise model for various settings, which is followed in Section 8 by a data example.
2 Simultaneous Confidence Bands
2.1 SCBs for Functional Parameters
We describe now a general scheme for construction of simultaneous confidence bands (SCBs) for a functional parameter , , where is a compact metric space. Note that all functions of hereafter will be assumed to belong to the space of continuous functions from to .
Assume we have estimators of and estimating a different functional parameter fulfilling
Here and whenever convergence of random continuous functions is considered, “” denotes weak convergence in endowed with the maximums norm . In (E1), is a mean zero Gaussian process with covariance function satisfying for all , is a scaling sequence of positive numbers. Assumptions (E1) and (E2) together with the functional version of Slutzky’s Lemma imply
Thus, it is easy to check that the collection of intervals
form -simultaneous confidence bands of , i.e.
Unfortunately, the quantiles are in general unknown and need to be estimated. Therefore, the main contribution of this article is the study and comparison of different estimators for .
In principle, there are two general approaches. Limit approximations try to estimate by approximations of the quantiles of the asymptotic process, i.e.
as done in Degras (2011, 2017) for the special case of a signal-plus-noise model and the local linear estimator of the signal. Here usually the covariance function is estimated and then many samples of the Gaussian process are simulated in order to approximate .
Better performance for finite sample sizes can be achieved by approximating directly by bootstrap approaches such as a fully non-parametric bootstrap proposed in Degras (2011), which is inspired by the bootstrap-confidence intervals (e.g., DiCiccio and Efron (1996)),.
2.2 Functional Signal-Plus-Noise Model
Our leading example will be SCBs for the population mean curve in a functional signal-plus-noise model which we will introduce now. Let us assume that , , is a compact set with piecewise -boundary . The functional signal-plus-noise model is given by
Here we assume that are continously differentiable functions on and is a stochastic process with zero mean and covariance function for satisfying if and only if . Moreover, we introduce the following useful properties of stochastic processes.
We say a stochastic process with domain is Lipschitz, if there is a (semi)-metric on continuous w.r.t. the standard metric on and a random variable satisfying such that
and , where denotes the metric entropy function of the (semi)-metric space , e.g., Adler and Taylor (2007, Def. 1.3.1.).
Any Lipschitz process has necessarily almost surely continuous sample paths. Moreover, this property is the main ingredient in the version of a CLT in proven in Jain and Marcus (1975) or Ledoux and Talagrand (2013, Section 10.1). However, there are different results on CLTs in with different assumptions on the process, which in principle could replace this condition. For example for we could also use the condition
where is non-decreasing near 0 and satisfies . This is due to Hahn (1977). However, -Lipschitz seems to be the most tractable assumption for our purposes.
We say a process has finite -th -moment, if
Any Lipschitz process over a compact set has finite -th -moment.
Since any continuous Gaussian process satisfies the finite -th -moment condition, cf. Landau and Shepp (1970), it is possible to proof a reverse of Proposition 1 for continuously differentiable Gaussian processes. Moreover, finite -th -moment conditions are often assumed, when uniform consistency of estimates of the covariance function are required, e.g., Hall et al. (2006) and Li et al. (2010).
3 Estimation of the Quantile
This section describes different estimators for the quantiles defined by equation (3). Especially, we propose using the Gaussian kinematic formula for -processes (tGKF) as proven in Taylor (2006). Moreover, we describe different bootstrap estimators.
3.1 Estimation of the Quantile Using the tGKF
3.1.1 The Gaussian Kinematic Formula for t-processes
A -process over an index set is a stochastic process such that is -distributed for all . In order to match the setting in Taylor (2006) we assume that the process is given by
where are mean zero, variance one Gaussian processes. Let us define , where denotes the number of elements in the multi-index . Then we require the following assumptions:
has almost surely sample paths.
The joint distribution ofis nondegenerate for all and .
There is an such that
for all and for all . Here and are finite constants.
Assumption (G3) is satisfied for any process having almost surely sample paths and all third derivatives have finite second -moment, see Definiton 2. In particular, this holds true for any Gaussian process with sample paths. For completeness the argument is carried out in more detail in the appendix.
Under these assumptions the tGKF is an exact, analytical formula of the expectation of the Euler characteristic of the excursion sets of . This formula as proven in Taylor (2006) or Adler and Taylor (2007) is
where , , is the -th Euler characteristic density of a -process, which can be found for example in Taylor and Worsley (2007). Moreover, denotes the -th Lipschitz killing curvature, which only depends on and the parameter space . Note that .
) is useful, since by the expected Euler characteristic heuristic (EECH) (seeTaylor et al. (2005) ), we expect
to be a good approximation for large thresholds . In the case that this approximation actually is always from above. This is due to the fact that the Euler characteristic of one-dimensional sets is always non-negative and hence using the Markov-inequality we obtain
The same argument is heuristically valid for high enough thresholds in any dimension, since the excursion set will with high probabilty consist mostly of simply-connected sets. Thus, the EC of the excursion set will be positive with high probability. Therefore, once again we expect that the EEC is an approximation from above for the excursion probability.
Notably, the tGKF Equation (8) had a predecessor for the EEC of a Gaussian processes . Remarkably, the only difference between these formulas is that the Euler characteristic densities are different, see Adler and Taylor (2007, p.315, (12.4.2)). Moreover, the approximation error when using the EEC instead of the excursion probabilities of the Gaussian process has been analytically quantified in Taylor et al. (2005).
3.1.2 The tGKF-estimator of
Assume that we have consistent estimators for the LKCs for all (we provide such estimators for in Section 5). Then a natural estimator of the quantile is the largest solution of
The following result will be used in the next section for the proof of the accuracy of the SCBs derived using the tGKF estimator .
3.2 Estimation of the Quantile Using the Bootstrap
As alternatives to the approach using the tGKF we discuss a non-parametric bootstrap- and a multiplier bootstrap estimator of the quantile defined in Equation (3).
3.2.1 Non-parametric bootstrap-
Based on Degras (2011) we review a bootstrap estimator of the quantile . Assume that the estimators and are obtained from a sample of random functions, then the non-parametric bootstrap- estimator of is obtained as follows
Resample from with replacement to produce a bootstrap sample .
Compute and using the sample .
Repeat steps 1 to 3 many times to approximate the conditional law and take the quantile of to estimate .
Note that the variance in the denominator is also bootstrapped, which corresponds to the standard bootstrap- approach for confidence intervals, cf. DiCiccio and Efron (1996). This is done in order to mimic the l.h.s. in (1), and improves the small sample coverage. Results of this bootstrap variant will be discussed in our simulations.
We expect that this estimator works well for large enough sample sizes. Although Degras (2011) introduced it especially for small sample sizes, there is not much hope that it will perform well in this case, since it is well known that confidence bands for a finite dimensional parameter based on the bootstrap- have highly variable end points, cf., Good (2005, Section 3.3.3). Evidence that this remains the case in the functional world will be given in our simulations in Section 7.
3.2.2 Multiplier- Bootstrap
The second bootstrap method, which we introduce, builds on residuals and a version of the multiplier (or Wild) bootstrap (e.g., Mammen (1993)) designed for the maximum of sums of independent random variables in high dimensions as discussed in detail by Chernozhukov et al. (2013). Briefly, the approach in Chernozhukov et al. (2013) is as follows. Let
be independent random vectors in, with and finite covariance for all . Define and assume there are such that for all and all . Under these assumptions it is shown in Chernozhukov et al. (2013, Theorem 3.1) that the quantiles of the distribution of
can be asymptotically consistently estimated by the quantiles of the multiplier bootstrap i.e., by the distribution of
with multipliers given the data , even if .
We adapt this approach to the estimation of for functional parameters. Therefore we again assume that the estimators and are obtained from an i.i.d. sample of random functions. However, here we also need to assume that we have residuals for satisfying and mutual independence for tending to infinity. For example for the signal-plus-noise model, where and
are the pointwise sample mean and the pointwise sample standard deviation, the residualsdo satisfy these conditions, if the error process is -Lipschitz and has finite second -moment. Thus, using these assumptions we obtain that
have approximately the same distribution. Algorithmitically, our proposed multiplier bootstrap is
Compute residuals and multipliers with and
Estimate from .
Repeat steps 1 to 3 many times to approximate the conditional law and take the quantile of to estimate .
In our simulations we use Gaussian multipliers as proposed in Chernozhukov et al. (2013), but find that Rademacher multipliers as used in regression models with heteroskedastic noise, e.g. Davidson and Flachaire (2008), perform much better for small sample sizes.
Again the choice to compute bootstrapped versions of mimics the true distribution of the maximum of the random process on the l.h.s of (1) for finite better than just applying the multipliers to .
4 Asymptotic Covering Rates
4.1 Asymptotic SCBs for Functional Parameters
This section discusses the accuracy of the SCBs derived using the tGKF. Since the expected Euler characteristic of the excursion sets are only approximations of the excursion probabilities, there is no hope to prove that the covering of these confidence bands is exact. Especially, if is large the approximation fails badly and will usually lead to confidence bands that are too wide. However, for values of typically used for confidence bands, the EEC approximation works astonishingly well. Theoretically, this has been made precise for Gaussian processes in Theorem 4.3 of Taylor et al. (2005), which is the main ingredient in the proof of the next result. Additionally, it relies on the fCLT (1) and the consistency of for proved in Theorem 1.
Assume (E1-2) and assume that the limiting Gaussian process satisfies (G1-3). Moreover, let be a sequence of consistent estimators of for . Then there exists an such that for all we have that the SCBs defined in Equation (2) fullfill
Typically, in our simulations we have that, for , the quantile is about leading to an upper bound of , if we use the weaker bound without the critical variance.
4.2 Asymptotic SCBs for the Signal-Plus-Noise Model
As an immediate application of Theorem 2 we derive now SCBs for the population mean and the difference of population means in one and two sample scenarios of the signal-plus-noise model introduced in Section 2.2. The derivation of consistent estimators for the LKCs will be postponed to the next section.
Theorem 3 (Asymptotic SCBs for Signal-Plus-Noise Model).
Let be a sample of model (5) and assume is an Lipschitz process. Define .
Then the estimators
fullfill the conditions (E1-2) with , , and .
A simple condition on to ensure that fulfills the non-degeneracy condition (G2) is that for all we have that has full rank for all . A proof is provided in Lemma 5 in the appendix.
Theorem 4 (Asymptotic SCBs for Difference of Means of Two Signal-Plus-Noise Models).
Let and be independent samples, where
with both Lipschitz processes and assume that . Then
Condition (E1) is satisfied, i.e.
where are Gaussian processes with the same covariance structures as and and the denominator converges uniformly almost surely, i.e. condition (E2) is satisfied.
5 Estimation of the LKCs for Signal-Plus-Noise Models
We turn now to consistent estimators of the LKCs. However, we treat only the case where is a process defined over a domain , where is assumed to be either a compact collection of intervals in or a compact two-dimensional domain of with a piecewise -boundary . We restrict ourselves to this setting, since there are simple, explicit formulas for the LKCs.
5.1 Definition of the LKCs for and
The basic idea behind the Lipschitz-Killing curvatures is that they are the intrinsic volumes of with respect to the pseudo-metric which is induced by the pseudo Riemannian metric given in standard coordinates of , cf. Taylor and Adler (2003),
For the cases the general expressions from Adler and Taylor (2007) can be nicely simplified. For , i.e. , the only unknown is , which is given by
In the case of with piecewise -boundary parametrized by the piecewise -function the LKCs are given by
Note that by the invariance of the length of a curve the particular choice of the parametrization of does not change the value of .
5.2 Simple Estimators Based on Residuals
In order to estimate the unknown LKCs and from a sample of model (5), we assume that the estimators of and of satisfy (E1-2) with and
for all almost surely.
These conditions imply that the normed residuals
and its gradient converge uniformly almost surely to the random processes and . Let us denote the vector of residuals by . In view of equation (16) it is hence natural to estimate the LKC for by
and using equations (17) the LKCs for by
where is the empirical covariance matrix of . Thus, we need to study mainly the properties of this estimator to ensure properties of the estimated LKCs.
Assume and for are Lipschitz processes and (E2) and (L) hold true. Then, we obtain
If and for are even Lipschitz processes, we obtain
and the matrix valued covariance function is given componentwise by
for all and .
From this theorem we can draw two interesting results for . The first one is a consistency results of the LKCs. In a sense this is just reproving the result for the discrete estimator given in Taylor and Worsley (2007) without the need to refer to convergences of meshes and making underlying assumptions more visible. Especially, it becomes obvious that the Gaussianity assumption on is not required, since we only have to estimate the covariance matrix of the gradient consistently.
Theorem 6 (Consistency of LKCs).
Under the setting of Theorem 5 it follows for that
An immediate corollary can be drawn for the generic pointwise estimators in the functional signal-plus-noise Model.
Let and assume that and for are Lipschitz processes. Then for we obtain
The second conclusion we can draw from this approach is that we can derive a CLT for the estimator of the LKCs.
Theorem 7 (CLT for LKCs).
Assume and for are even Lipschitz processes and (E2) and (L) hold true. Then, if ,
and, if , we obtain
where is a Gaussian process with mean zero and covariance function as in Theorem 5 and is a parametrization of the boundary .
Assume additionally to the Assumptions of Theorem 7 that is Gaussian with covariance function and . Then, we have the simplified representation
5.3 Estimation of LKCs in the Two Sample Problem
In the previous section we dealt with the case of estimating the LKCs in the case of one sample. This idea can be extended to the two sample case leading to a consistent estimator of the LKCs of the asymptotic process given in Theorem 4. Using the same notations as before we have that the difference in mean satisfies the following fCLT
Therefore, note that independence of the samples implies that the covariance matrix defined in equation (15) splits into a sum of covariance matrices depeding on and , i.e.,
The latter summands, however, under the assumptions of Theorem 5 can be separately consistently estimated using the previously discussed method and the residuals
The sum of these estimators is a consistent estimator of and thus the estimator described in the previous section based on the estimate of using the above residuals gives a consistent estimator of the LKCs of .
6 Discrete Measurements, Observation Noise and Scale Spaces
We discuss now the realistic scenario, where the data is observed on a measurement grid with possible additive measurement noise. Again for simplicity we restrict ourselves to the signal-plus-noise model, which is