Functional data analysis constitutes a collection of statistical methods to analyse data comprised of ensembles of random functions: multiple occurrences of random processes evolving continuously in time and/or space, typically over a bounded rectangular domain (Ramsay and Silverman [ramsay2007applied], Ferraty and Vieu [ferraty2006nonparametric], Hsing and Eubank [hsing2015theoretical], Wang et al. [wang2016functional]). The challenges arising in functional data on the one hand arise from their infinite-dimensional nature: this calls upon tools and techniques from functional analysis, while standard inference problems may become ill-posed. On the other hand, the data, though continuous in nature, are seldom observed as such. Instead, finitely sampled versions are available to the statistician. If the sampling is sufficiently dense, the data can often be treated as genuinely functional data, possibly after a pre-smoothing step. The statistical estimators and procedures may be then based on the intrinsically infinite dimensional inputs and techniques. This approach was popularised by Ramsay and Silverman [ramsay2007applied].
It can very well happen, though, that the data are recorded only at some intermediate locations of their domain, possibly corrupted by measurement error. In this case it is necessary to regard the underling functional nature of the data only as a latent process, and additional effort is required to construct adequate statistical methodology. This scenario is often referred to as sparsely observed functional data, and usually occurs when the independent realisations of the latent functional process is a longitudinal trajectory. In a key paper, Yao et al. [yao2005functional] demonstrated how to estimate the covariance operator of the latent functional process using kernel regression and how to estimate the principal components of the latent process through conditional expectations. See also Yao et al. [yao2005functional_linear_regression]
for an application of the proposed methodology in functional linear regression. The rate of convergence of the kernel smoother of Yao et al.[yao2005functional] was later strengthened by Hall et al. [hall2006properties] and Li and Hsing [li2010uniform]. Other methods to deal with sparsely observed functional data make use of minimizing a specific convex criterion function and expressing the estimator within a reproducing kernel Hilbert space, see Cai and Yuan [cai2010nonparametric], and Wong and Zhang [wong2017nonparametric]).
Still, there are many applications where independence of the underlying curves cannot be assumed, for instance when the functional data are naturally ordered into a temporal sequence indexed by discrete time. We then speak of functional time-series, and these are usually analysed by assuming stationarity and weak dependence across the time index. Historically, the research has been focused mostly into generalizing linear processes into functional spaces, see Bosq [bosq2012linear] and Bosq and Blanke [blanke2007inference] for overview publications. More recently, the research has moved beyond the linear structure. Hörmann and Kokoszka [hormann2010weakly]
considered the effect of weak dependence on principal component analysis and studied the estimation of the long-run covariance operator. Horváth et al.[horvath2013estimation]
provided a central limit theorem for the mean of a stationary weak dependent sequence and considered the estimation of the long-run covariance operator.
A step further from the estimation of isolated characteristics such as the mean function and the said long-run covariance operator is to estimate the entire second-order structure of the process, without assuming linearity. To this aim, Panaretos and Tavakoli [panaretos2013cramer] introduced the notation of spectral density operators and harmonic principal components, capturing the complete second order dynamics in the frequency domain, whereas Panaretos and Tavakoli [panaretos2013fourier] showed how to estimate the said spectral density operators by smoothing the operator-valued analogue of the peridogram. They formalised weak dependence by cumulant-type mixing conditions, à la Brillinger [brillinger1981time]. In parallel work, Hörmann et al. [hormann2015dynamic] introduced the notation of dynamic principal components, closely related to the harmonic principal components of Panaretos and Tavakoli [panaretos2013cramer], and estimated the spectral density operators by the operator version of Bartlett’s estimate [bartlett1950periodogram].
Despite the long tradition of functional time series as a driving force behind theoretical and methodological progress in functional data analysis more generally, a surprising fact is that the focus has been almost exclusively “densely” observed functional time series, where it is assumed that the full functional data are available. Indeed discrete sampling appears to be a nearly absent consideration, with the exceptions (to our knowledge) being: Panaretos and Tavakoli [panaretos2013fourier], who show the stability of their asymptotics under dense discrete observation but with measurement error of decaying magnitude; and, more recently, Kowal et al. [kowal2017functional]
who studied functional autoregressive models by means of Bayesian hierarchical Gaussian models. They derived a Gibbs sampler for inference and forecasting but the paper does not examine the asymptotic behaviour of the method. In particular, in one of their considered sampling regimes, which they callsparse-fixed design, posterior Bayesian concentration would be intangible. The Bayesian modelling framework was also extended to multivariate dynamic linear models by Kowal et al. [kowal2017bayesian] and to dynamic function-on-scalar regression by Kowal [kowal2018dynamic]. A related problem was studied by Paul and Peng [paul2011principal], who considered correlated sparsely observed functional data with separable covariance structure, but the focus was not on dynamics.
In this article we address this gap (or, rather, chasm) and consider the problem of estimating the complete dynamics, and recovering the latent curves, in a stationary functional time-series that is observed sparsely, irregularly, and with measurement errors. The number of observations per curve is assumed to be random, almost surly finite, and not increasing to infinity. Therefore we speak of genuine sparsity, much in the same vein as Yao et al. [yao2005functional]. As a first step we show how to estimate the full second-order dynamics of the functional time-series based on sparse noisy data using kernel regression methods. We construct estimators of individual characteristics such as the mean function and the lag autocovariance operators, as an aside, but the main contribution is the kernel-based generalization of Bartlett’s estimate of the spectral density operators. By integrating back the spectral density into the time domain we construct a consistent estimator of the entire space-time covariance structure.
Our methodology can also be interpreted in a design context: in certain applications, it might be possible for the scientist to choose how to distribute a given fixed budget of measurements over individual curves and over time. In this case, one might ask how to better estimate the underlying dynamics: whether it is better to sample a functional time-series more densely over shorter time-span, or to record fewer observations per curve but over a longer time-space. In Section 5 we perform a simulation study to examine this tradeoff, and find that under sufficient smoothness, the sparse sampling regime over a longer period seems preferable.
The second contribution of the article is the establishment of a functional data recovery framework. We show how to predict the unobserved functional data once the space-time dynamics have been estimated. The recovery of the functional data is done by conditioning on all observed data, borrowing strength from the complete dynamics of the process (rather than just the marginal covariance). When the functional time-series is Gaussian, we furthermore show how to construct confidence bands for the latent functional data, with both pointwise and simultaneous coverage. In addition we show how the functional recovery methodology naturally leads to forecasting.
Functional time-series methodology is often useful in analysing continuously measured scalar time-series, that can subdivided into segments of an obvious periodicity, usually days. A key benefit of this technique is separation of the intra-day variability and the temporal dependence among the consecutive days. The approach is especially fruitful in the analysis of environmental or meteorological phenomena, for example particulate matter atmospheric pollution (Hörmann and Kokoszka [hormann2010weakly], Hörmann et al. [hormann2015dynamic, hormann2016detection], Aue et al. [aue2015prediction]). Nonetheless, some meteorological variables cannot be measured continuously and uninterruptedly. A practical motivation of this article comes from the data on atmospheric electricity (Tammet [tammet2009joint]). The peculiarity of this data is that the atmospheric electricity can be reliably measured only in fair-weather conditions. Otherwise, the physical-chemical processes behind the atmospheric electricity are altered and thus a different kind of process is measured. Details of this mechanism are reported in the data analysis in Section 6. Because of this censoring protocol, the considered functional time-series is genuinely sparsely observed. We analyse such a data set using our proposed methods, as a means of illustration.
The rest of the article is organised as follows. In Section 2 we define the functional time-series framework we work with and introduce the space-time covariance estimation methodology. We explain how to construct estimators of the lagged autocovariance operators and the spectral density operators. We also introduce the functional data recovery framework to estimate the unobserved functional data from the complete stretch of discrete observations and how to construct pointwise and simultaneous confidence bands. In Section 3 we formulate the asymptotic theory for the suggested estimators. Section 5 contains the results of numerical experients designed to probe the finite-sample performance of our methodology. Section 6 illustrates the proposed methodology on the fair-weather atmospheric electricity time-series. Some additional results of the numerical experiments are presented in Appendix A. The proofs of the formal statements are included in Appendix B.
2 Model and Estimation Methodology
2.1 Functional Time Series Framework
Functional time-series is a sequence of random function defined on the interval and is denoted as . We assume that and . Moreover we assume that the realisations (paths) of are smooth functions (concrete smoothness assumptions will be introduced in Section 3). This space-time process will be referred to as a functional time-series. Assuming second-order stationarity in the time variable , we may define the (common) mean function of by
and capture the second-order dynamics of the functional time-series by its lag- autocovariance kernels,
Each kernel introduces a corresponding operator defined by right integration
In addition to the stationarity, we assume weak dependence, in that the autocovariance kernels are summable in the supremum norm and the autocovariance operators summable in the nuclear norm
Under these conditions, Panaretos and Tavakoli [panaretos2013fourier] showed that for each , the following series converge in the supremum norm (denoted by ) and the nuclear norm (denoted by ), respectively
The kernel and the operator are called the spectral density kernel at frequency and the spectral density operator at frequency respectively. The lagged autocovariance kernels and and operators can be recovered by the inversion formula (Panaretos and Tavakoli [panaretos2013fourier]) that holds in the supremum and the nuclear norm, respectively:
In particular, the spectral density operator is a non-negative, self-adjoint trace-class operator for all .
2.2 Observation Scheme
We consider a sparse observation scheme with additive independent measurement errors. Let be the -th measurement on the -th curve at spatial position , where and is the number of measurements on the curve for . The additive measurement errors are denoted by and are assumed to be iid realisations of a mean
and variancerandom variable. Furthermore, the measurement errors are assumed to be independent of as well as the measurement locations . The observation model can be then written as
The spatial positions as well as their number are considered random and concrete conditions for the asymptotic results are given in Section 3.
2.3 Nonparametric Estimation of the Model Dynamics
Given the sparsely observed data generated by the observation scheme (2.4), we wish to estimate the mean function and the lag autocovariance kernels . Thanks to the formulae (2.2) and (2.3), the estimation of the lag autocovariance operators is equivalent to the estimation of the spectral density .
In a first step, we estimate the common mean function by a local linear smoother, see, for example, Fan and Gijbels [FanJianqing1996Lpma]. Let
be a one-dimensional symmetric probability density function. Throughout this paper we work with the Epanechnikov kernelfor and otherwise, but any other usual smoothing kernel would be appropriate. Let be the bandwidth parameter. We define the estimator of as by minimizing the weighted sum of squares:
Then, in a second step, we show how to estimate the second order characteristics of the functional time-series, namely the lag- covariance and the lag- autocovariance kernels. Since the measurement errors contribute only to the diagonal of the lag-0 autocovariance kernel, where if and only if the condition in the subscript is satisfied. Therefore we consider the “raw” covariances
where , , , and . We anticipate that . Hence, the diagonals of the raw lag- covariances must be removed when estimating the lag- covariance kernel.
Specifically, to estimate the lag- covariance kernel, we employ a local-linear surface-smoother at applied to the raw covariances where and . Precisely, we let where is obtained by minimizing the following weighted sum of squares:
and is the bandwidth parameter.
We estimate the measurement error variance using the approach of Yao et al. [yao2005functional]. That is, we first estimate by smoothing the variance on the diagonal. We assign where:
Instead of using as the estimator of the diagonal of the lag- covariance kernel (without the ridge contamination), Yao et al. [yao2003shrinkage, yao2005functional]
opted for a local-quadratic smoother – arguing that the covariance kernel is maximal along the diagonal, and so a local-quadratic smoother is expected to outperform a local linear smoother. This heuristic was also confirmed by our own simulations. Therefore, following Yao et al.[yao2005functional], we fit a local-quadratic smoother along the direction perpendicular to the diagonal. Concretely, the estimator is defined as where is the minimizer of the following weighted sum of squares:
where is the first coordinate (which is the same as the second one) of the projection of the point onto the diagonal of . The measurement error variance is then estimated by
Since the estimator (2.10) is based on smoothers, it is not guaranteed to be a positive number. This problem was already commented on by Yao et al. [yao2005functional]. In the theoretical part of their paper, the negative estimate is replaced by zero and, in the code, it is replaced by a small positive number. The replacement by a positive number can be seen as a form of regularization.
Next, we proceed with the estimation of the lag- autocovariance kernels for . We define the estimator for by minimizing
For we set . Observe that we did not need to remove the diagonal as in (2.7). Denote the corresponding estimated covariance operators as .
2.4 Spectral Density Kernel Estimation
To estimate the spectral density kernels one has to resort to smoothing or a different sort of regularization at some point. Panaretos and Tavakoli [panaretos2013fourier] performed kernel smoothing of the periodogram in the spectral domain whereas Hörmann et al. [hormann2015dynamic] made use of Barlett’s estimate. Bartlett’s estimate involves a weighted average of the lagged autocovariances, with a choice of weights that downeighs higher order lags. From the theoretical perspective, this approach is equivalent to kernel smoothing of the periodogram, see Priestley [PriestleyMauriceB1981Saat, §6.2.3]. In fact, the Bartlett weights correspond to the Fourier coefficients of the smoothing kernel, assumed compactly supported.
In this paper we opt for Bartlett’s perspective and generalize the estimator for the case of sparsely observe functional time-series. This we do mainly for simplicity, and it should be noted that any other choice of weights would be equally applicable. See Rice and Shang [rice2017plug] for other possible choices of weights.
Consider the Bartlett’s span parameter and define the weights for and otherwise. These weights are called Bartlett’s weights or sometimes the triangular window. If the full functional observations were available, the spectral density would be estimated by the formula (cf. Hörmann et al. [hormann2015dynamic])
where are the standard empirical autocovariance operators. We could use the formula (2.12) and plug-in the smoothed autocovariance kernels obtained in Section 2.3 but instead we opt to show how to directly construct a smoother-based estimator of the spectral density kernels. Specifically, we estimate the spectral density kernel at frequency by the local-liner surface-smoother applied to the raw covariances multiplied by complex exponentials. The weights for the smoother are based both on the spatial distance from the raw covariances as well as the time lag. Specifically, we estimate the spectral density kernel as
where is obtained by minimizing the following weighted sum of squares
It turns out that the minimizer of this complex minimization problem can be expressed explicitly. Moreover, the minimizer depends only on a few quantities that are independent of , and can be pre-calculated. The estimator can be thus constructed for a given by multiplying these quantities by complex exponentials and performing a handful of inexpensive arithmetic operations. Consequently, it is computationally feasible to evaluate the estimator (2.13) on a dense grid of frequencies. See Section B.2 for further details.
Denote the integral operator corresponding to is as . We can go back to the temporal domain by integrating the spectral density and reproduce the estimators of the autocovariance kernels and operators by the formulae (2.3)
The estimators of spectral density kernels are achieved by kernel smoothing. Therefore, especially for smaller sample sizes, the operators
might not be strictly non-negative, and may feature some tail negative eigenvalues of small modulus. To ensure numerical stability of the method in the following section, it is recommended to truncate these negative eigenvalues ofat each frequency .
If dimensionality reduction is of interest, one can truncate the spectral density operators at each frequency to an appropriate rank. Such dimensionality reduction is based on the Cramér-Karhunen-Loève expansion and was proven optimal in preserving the functional time-series dynamics by Panaretos and Tavakoli [panaretos2013cramer], and independently by Hörmann et al. [hormann2015dynamic]. Since dimension reduction is not necessary for our theory/methods in the next section, we do not pursue it further.
2.5 Periodic Behaviour Identification
As discussed at the beginning of Section 2.4, the choice of the Bartlett’s span parameter corresponds to the bandwidth in the frequency domain. To achieve consistent spectral density estimation, the parameter needs to be kept quite small (cf. condition 4 and Theorem 2). However, for the purpose of exploratory data analysis, it is useful to explore the data for periodic behaviour in a similar way as a periodogram is used in the case of scalar time-series.
When the periodicity examination is indeed of interest, we propose to evaluate the estimator (2.13) for a fairly large value of . The selection of adequate value of is a question of computational power available because the computational time to evaluate (2.13) grows linearly in . In the data analysis Section 6 we work with which is roughly half of the considered time-series length.
Once the estimator (2.13) is evaluated for a given value of we propose to calculate the trace of the spectral density operator at frequency . Peaks in this plot indicate periodic behaviour of the functional time-series. Existence of periodicity is not only a useful insight into the nature of the data but may us prompt into approaching the periodic behaviour in a different way, for example by modelling the periodicity in a deterministic way as we do it in the data analysis carried out in Section 6.
2.6 Functional Data Recovery Framework and Confidence Bands
We now consider the problem of recovering the latent functional data given the sparse noisy samples , and provide corresponding confidence bands.
Consider the random element composed of “stacked” functional data (formally, it is an element of the product Hilbert space ). Note that
Now define the stacked observables as where is the total number of observations up to time . By analogy to , stack the measurement errors
and denote this vector. Note that . Further define the evaluation operators for each and the stacked censor operator . Finally define the projection operator for .
In this notation we can rewrite the observation scheme (2.4) as
The best linear unbiased predictor of given , which we denote by , is given by the formula
where denotes the adjoint operator. The term is in fact a positive semi-definite matrix. Owing to the fact that , the matrix is always invertible.
Now fix . The best linear unbiased predictor of the functional datum , which we denote by , is given by
Hence the recovery of by the formula (2.19) uses the observed data across all , borrowing strength across all the observations.
In practice however, we need to replace the unknown parameters involved in the construction of the predictor by their estimates. Define and by substituting and for their theoretical counterparts in formulae (2.16) and (2.17) respectively. Now replace , , by , and , respectively, in formulae (2.18) and (2.19). The resulting predictors are denoted by
In order to construct confidence bands for the unobservable paths, we work under the Gaussian assumption:
The functional time-series as well as the measurement errors are Gaussian processes.
Thanks to the Gaussian assumption 1, the predictors of and given by formulae (2.18) and (2.19) are in fact given by conditional expectations and are the best predictors among all predictors. Furthermore, we can calculate the exact conditional distribution of given by the formula
From (2.22) we can access the conditional distribution of for fixed , by writing
To construct a band for
with pointwise coverge, we construct a confidence interval forat each — as we will see, the endpoints of these intervals are continuous functions of , and so automatically define a confidence band. In practice, one constructs bands for a dense collection of locations in
and interpolates. In particular, fix. Given the conditional distribution , the -confidence interval is constructed as
where is the
In practice, when we do not know the true dynamics of the functional time-series, we have to use the estimates of and . We define and by replacing and with and in the formulae (2.23), (2.24), (2.26) respectively. Therefore the asymptotic confidence interval for is obtain by rewriting (2.27) using the empirical counterparts
For the construction of the simultaneous band we use the method introduced by Degras [degras2011simultaneous]. Fix . In the previous section we derived the conditional distribution of given in formula (2.25). Define the conditional correlation kernel
Then, the collection of intervals
forms a (continuous) confidence band with simultaneous coverage probability over . Here is the quantile of the law of where is a zero mean Gaussian process with covariance kernel . The definition of a quantile specifically requires that . Degras [degras2011simultaneous] explains how to calculate this quantile numerically.
In practice, we replace the population level quantities in (2.30) by their estimated counterparts and define the asymptotic simultaneous confidence band as
where and are as above and the quantile is calculated for the correlation structure defined as the empirical counterpart to (2.29).
Note that for any correlation kernel , see Degras [degras2011simultaneous]. Therefore, as expected, the pointwise confidence bands are enveloped by the simultaneous band. Once again, in practice, one evaluates the band limits defining (2.31) on a dense grid of and interpolates.
A natural next step to consider, and indeed one of the main reasons why one may be interested in recovering the functional time-series dynamics, is that of forecasting. In this section we comment on how the forecasting problem naturally fits into the functional data recovery framework introduced in Section 2.6.
Assume that we are given sparse data and we wish to forecast the functional datum for as well as to quantify the uncertainty of the forecast. We define the random element . If the forecasts for the intermediate data are not of interest, we may delete these elements and naturally alter the explained method below. Nevertheless, we opt to explain the approach for forecasting up to the time simultaneously.
We utilize the notation introduced in Subsection 2.6. By formulae (2.16) and (2.17) we obtain the law of and can calculate the conditional distribution given the observed data . In particular, by taking in the equations (2.19), (2.27), and (2.30) we obtain the forecast, the pointwise confidence band, and the simultaneous confidence band respectively for the functional datum . In practice, we substitute the unknown population level quantities by their empirical estimators. Therefore, by taking in the equations (2.21), (2.28), and (2.31) we obtain the forecast, the (asymptotic) pointwise confidence band, and the (asymptotic) simultaneous confidence band for .
3 Asymptotic Results
3.1 Consistency and Convergence Rates for Nonparametric Estimators
The number of measurements in time is a random variable with where , and .
The measurement locations are independent random variables generated from the density and are independent of the number of measurements . The density is assumed to be twice continuously differentiable and strictly positive on .
Not that we allow the event to potentially have positive probability. This corresponds to the situation where no measurements are available at time , for example when we additionally have missing data at random. We also need to impose smoothness conditions on the unknown functional parameters
The common mean function, , is twice continuously differentiable on .
The autocovariance kernels, , are twice continuously differentiable on for each . Moreover,
is uniformly bounded in for all combinations of where .
To prove the consistency of autocovariance kernels estimators
we need to further assume some mixing conditions in the time domain. The smoothing estimators are essentially moment-based, therefore it is natural to consider cumulant-type summability conditions. For the introduction to the cumulants of real random variables see Rosenblatt[rosenblatt2012stationary] and for the definitions and properties of the cumulant kernels and cumulant operators see Panaretos and Tavakoli [panaretos2013fourier].
Denote the 4-th order cumulant kernel of as . Assume the summability in the supremum norm
We will also need to strengthen the summability assumption (2.1).
The last two conditions correspond to conditions C(1,2) and C(0,4) in Panaretos and Tavakoli [panaretos2013fourier], respectively. Finally, we impose the following assumptions on the decay rate of the bandwidth parameters
We may now state our asymptotic results on uniform consistency and convergence rates:
3.2 Functional Data Recovery and Confidence Bands
In this section we turn our attention to developing asymptotic theory for the the recovered functional data and the associated confidence bands, in particular the asymptotic behaviour of the plug-in estimator (2.21) vis-à-vis its theoretical counterpart (2.19).
First of all, we need to clarify what asymptotic result we can hope to accomplish. Before venturing into functional time-series, let us comment on the asymptotic results for independent identically distributed functional data by Yao et al. [yao2005functional]. As the number of sparsely observed functional data grows to infinity, one can consistently estimate the second-order structure of the stochastic process (which in this case consists in the zero-lag autocovariance, due to independence). This is then used in the plug-in prediction of a given functional datum, say , given the sparse measurements on this datum. In the limit, this prediction is as good as if we knew the true lag zero covariance of the stochastic process (Theorem 3, Yao et al. [yao2005functional]). Because the predictor uses the estimate of the lag zero covariance based on all the observed data, Yao et al. [yao2005functional] call this trait as borrowing strength from the entire sample.
In the time series setting of the current paper, one can expand the concept of borrowing strength from the entire sample. As the number of sparsely observed functional data (i.e. the time horizon ) expands to infinity, one can not only estimate the dynamics of the functional time-series consistently (Theorem 2 and Corollary 1), but also further exploit the fact that neighbouring data are correlated to further improve the recovery. Because of the weak dependence, the influence of the observations decreases as we part away from the time . Therefore we fix a span of times where and we will be interested in prediction of given the data in this span. To be precise, we are going to prove that the prediction of from the data in the local span and based on the estimated dynamics from complete data is, in the limit, as good as the prediction based on the true (unknown) dynamics. Therefore, in our case, we are borrowing strength across the sample in a twofold sense – firstly for the estimation of the functional time-series dynamics, and then for prediction of the functional datum .
The span can in principle be chosen to be as large as one wishes, but is held fixed with respect to . This is justified by the weak dependence assumption. In practice, one must also entertain numerical considerations and not choose to be exceedinly large, since the the evaluation of the predictors (2.19) and (2.21) based on longer spans requires the inversion of a big matrix.