1 Introduction
Functional time series has become a recent focus within the statistical research of functional data analysis due to the fact that functional data are often collected sequentially over time. Typically, we consider a stationary functional sequence whose terms are random elements of the separable Hilbert space . The central issue in the analysis of functional time series is to take into account the temporal dependence between the observations. That amounts to the investigation of secondorder characteristics of the functional sequences. A handful of early papers have studied the covariance structure of the functional sequences with dependence. For example, Bosq (2002) [4], Dehling and Sharipov (2005) [8] consider the estimation of covariance operator for functional autoregressive processes, Horváth and Kokoszka (2010) [11] studied the covariance structure of weakly dependent functional time series under an dependence condition; see also Horváth and Kokoszka (2012) [12] for an overview.
Nevertheless, to obtain a complete description of the secondorder structure of dependent functional sequences, one needs to consider autocovariance operators, or autocovariance kernels relating different lags of the series, analogous to the autocovariance matrices in the context of multivariate time series analysis. One statistic of interest associated with the autocovariance operator is the longrun covariance kernel (or longrun covariance function) defined as where for and , is the socalled autocovariance kernel. The analysis of the longrun covariance kernel is applicable to general functional dependence sequences without a particular model assumption.
Horváth et. al (2013) [13] proposed the kernel lagwindow estimator of and showed its consistency under mild conditions. The asymptotic normality of the estimator is established in Berkes et. al. (2016) [2]. The estimation has applications in mean and stationarity testing of functional time series, see Horváth et. al (2015) [14] and Jirak (2013) [16]. Horváth et. al (2016) [15] and Rice and Shang (2016) [31] address the bandwidth selection for the kernel of the lagwindow estimator.
Rather than focus on the isolated characteristic like the longrun covariance, Panaretos and Tavakoli (2013) [19] approach the problem of inferring the secondorder structure of stationary functional time series via Fourier analysis, formulating a frequency domain framework for weakly dependent functional data. In the frequency domain of functional setting, the entire secondorder dynamical properties are encoded in the spectral density kernel which is defined as
(1) 
where the autocovariance kernel and the spectral density kernel comprise a Fourier pair. The notion of spectral density kernel in the functional setting is a generalization of finitedimensional notion in the context of spectral density analysis of multivariate time series, which has been extensively studied by prominent statistical researchers; see, e.g. Parzen (1957, 1961) [21, 22], Brillinger and Rosenblatt (1967) [6], Hannan (1970) [10] and Priestley (1981) [30]. A consistent estimate of the spectral density kernel in the form of a weighted average of the periodogram kernel—the functional analogous of periodogram matrix—is also proposed in Panaretos and Tavakoli (2013) [19] under a type of cumulant mixing condition. This weak dependence condition is the functional analog of classical cummulanttype mixing condition of Brillinger (2001) [5].
In this paper, we propose a new class of spectral density kernel estimators based on the notion of flattop kernel defined in Politis (2001) [23]; see also Politis and Romano (1995, 1996, 1999) [26, 27, 28]. The new class of estimators employs the inverse Fourier transform of a flattop function to construct the weight function smoothing the periodogram. With the choice of a highorder flattop kernel, it is shown to be able to achieve bias reduction, and hence the higherorder accuracy in terms of optimizing the integrated mean square error (IMSE). It is also nearly equal to the general lagwindow type estimators which is a wellknow fact in finitedimensional case; see Brockwell and Richard (2013) [7] and Brillinger (2001) [5].
The higherorder accuracy of flattop estimation typically comes at the sacrifice of the positive semidefinite property. To address this issue, we show how a flattop estimator can be modified to become positive semidefinite (even strictly positive definite) while retaining the favorable asymptotic properties. The modification is similar to the one proposed in Politis (2011) [25], for the treatment of flattop spectral density matrix estimators. In addition, we introduce a data driven bandwidth selection procedure realized by an automatic inspection of the correlation structure.
The structure of the paper is as follows. In the next section, the flattop estimator of the spectral density kernel is defined after introduction of some basic definitions of the frequency domain framework, and theorems on the asymptotic accuracy are given. Section 3 shows the almost equivalence of the proposed estimator in the form of weighted average of periodogram and the flattop lagwindow estimator. A modification of the flattop spectral density estimator is introduced in Section 4 which results into an estimator that is positive semidefinite while retaining the estimator’s higherorder accuracy. Section 5 addresses the issue of datadependent bandwidth choice where an empirical rule of picking bandwidth is proposed. Our favorable asymptotic results are supported by a finitesample simulation in Section 7 where higherorder accuracy of the flattop estimators are manifested in practice. Finally, the technical proofs are gathered in the Appendix in Section 7.
2 Spectral density kernel estimation
We consider a functional time series where each belongs to the separable Hilbert space possessing mean zero, i.e., for all , and autocovariance kernel for and .
The space is equipped with the inner product and the induced norm ,
We assume the series is strictly stationary in the sense that for any finite set of indices and any , the joint law of is identical to that of .
In addition, the weak dependence structure among the observations is quantified by employing the notion of cumulant kernel of a series. The pointwise definition of a th order cumulant kernel is
where the sum extends over all unordered partitions of . We will make use of the following cumulant mixing condition, defined for fixed and
Condition . For each
We inherit the frequency domain framework of functional time series developed in Panaretos and Tavakoli (2013) [19], in which the functional version of discrete Fourier transform is introduced. Given a functional sequence of length , , the functional Discrete Fourier Transform (fDFT) is defined as
(2) 
The tensor products of the fDFT leads to the notion of
periodogram kernel–the functional analogue of the periodogram matrix in the multivariate case. The periodogram kernel is defined as(3) 
The periodogram kernel is asymptotically unbiased under certain cumulant mixing conditions. However, it is not a consistent estimator of the spectral density kernel as its asymptotic covariance is not zero. Panaretos and Tavakoli (2013) [19] proposed a consistent estimator by convolving the periodogram kernel with a weight function, which has the form
(4) 
The weight function, , is constructed as
(5) 
where is a sequence of scale parameters with the properties and as . is a fixed function satisfying that is a positive, even function and
The summation over in (5) makes the weight function periodic with period . The same will be true for the estimator by its definition in (4). With the above constraints imposed on function , it has been shown in Panaretos and Tavakoli (2013) [19] that is a consistent estimator of the spectral density kernel in mean square (with respect to HilbertSchmidt norm). The bias of is partly attributed to the assumption that is positive and it can potentially be significantly reduced if an appropriate function is chosen that is not restricted to be positive. To this aim, we propose a class of higherorder accurate estimator by making use of the socalled flattop kernels in the construction of weight function . The resulting estimator is shown to achieve bias reduction while retaining the asymptotic covariance structure of in (4).
To describe our estimator, we need the notion of a “flattop” kernel. A general flattop kernel is defined in terms of its Fourier transform , which is in turn defined as
(6) 
where is a parameter and is a symmetric function, continuous at all but a finite number of points satisfying and . The flattop kernel is then given by the inverse Fourier transform of
(7) 
Note that in the preceding definition, the function , and hence the kernel , depend on the function and the parameter , but this dependence will not be explicitly denoted.
The function is ‘flat’, i.e., constant, over the region , hence the name flattop for the kernel function . If a kernel function has finite
th moment, and its moments up to order
are equal to zero, i.e. , and for all , then the kernel is said to be of order . We have the following property concerning the order of the kernel function :Proposition 2.1.
If is p times differentiable flattop function and is Hölder continuous of order , then is a kernel of order .
Proof.
See Appendix. ∎
In the following, we will be using to denote the flattop estimator employing the flattop kernel , which is in turn induced by a flattop function . The estimator is of the same form as (4), except the difference in the weight function, i.e.,
(8) 
where
(9) 
with being the flattop kernel induced by a flattop function defined in (7).
The following theorems investigate the performance of employing the general flattop kernel .
Theorem 2.1.
Provided that and as , and assume is the maximum value that can be attained such that C(p,2) holds; then by choosing an appropriate flattop kernel of order , we have
where the equality holds in , and the error terms are uniform in .
Proof.
See Appendix. ∎
Remark 2.1.
According to Proposition 2.1, a sufficient condition for a kernel to be of order is that is a times differentiable flattop function and is Hölder continuous. On the other hand, the decay rate of bias crucially depends on the cumulant mixing condition satisfied by the functional sequence. A type of moment condition is provided in Panaretos and Tavakoli (2013) [19], which is sufficient for the cumulant mixing condition to hold for a general linear process of the form ; see Proposition 4.1 therein.
Remark 2.2.
It is worth mentioning that Theorem 2.12.3 can hold for a nonflattop kernel of order as long as it satisfies properties (i)(iv) in Lemma 7.2. Nevertheless, in the paper we will be focusing on the flattop kernels for the simplicity of the proof. In addition, an empirical datadriven bandwidth selection rule is proposed in Section 6, which can be applied exclusively on flattop kernels.
The flattop estimator achieves bias improvements while retaining the rate of decay of the covariance structure as stated in the following theorem:
Theorem 2.2.
Under C(1,2) and C(1,4),
where the equality holds in , uniformly in the ’s.
Using our Lemmas 7.2 and 7.3, Theorem 2.2 can be proved along the same lines of proof of Corollary 3.3 in Panaretos and Tavakoli (2013) [19]. For fixed , the covariance can be shown to have a sharper bound ; see Proposition 3.4 in Panaretos and Tavakoli (2013) [19].
Concerning the mean square error, we need the notion of spectral density operator of functional time series, which is introduced in Panaretos and Tavakoli (2013)[19]. The spectral density operator is an operator induced by the spectral density kernel through right integration,
(10) 
where is the autocovariance operator induced by the autocovariance kernel through right integration,
(11) 
The spectral density operator is the integral operator with kernel . Analogously, we denote the operator induced by the the kernel through right integration, and thereby the estimator of
. Combining the results on the asymptotic bias and variance of the spectral density operator, we have the following consistency in
integrated mean square of the induced estimator for the spectral density operator .Theorem 2.3.
Provided assumptions C(p,2) and C(1,4) hold, , , then the spectral density operator estimator employing a flattop kernel of order is consistent in integrated mean square, that is,
where is the HilbertSchmidt norm. More precisely, as .
Theorem 2.3 gives the rate of convergence of to . In the meantime, it also suggests the optimal value of the bandwidth parameter in terms of optimizing the decay rate of integrated mean square error. Apparently, the optimal depends on the cumulant condition a functional sequence possesses, that is, the value of . For any finite , the optimal decay rate can be achieved with . In the case that , one can choose to obtain a favorable rate of .
3 Alternate estimates and flattop kernel choice
3.1 Alternate estimates
The spectral density kernel estimator considered in the previous section has the form of a weighted average of periodogram ordinates. In fact, the weight function in the estimate (8) has an alternate form
(12) 
The equivalence of (9) and (12) can be easily verified by using Poisson summation formula. Moreover, if the discrete average in (8) is replaced by a continuous one, the estimate becomes
(13) 
is the sample autocovariance kernel. If this is substituted into (13), then the estimate takes the form
(15) 
where
With a flattop function in place, the estimate (15) has a formal resemblance to the flattop estimation for multivariate spectrum explored in Politis (2011) [25] with the bandwidth parameter . The estimator has been shown to achieve higherorder accuracy in estimating the spectral density matrix; see Politis (2011) [25] for details. In fact, the estimate (15) is the general form of spectral estimation that has been extensively investigated by prominent statistical researchers as early as 1950s and 1960s. See, e.g., Grenander (1951) [9], Parzen (1957) [21] and Priestley (1962) [29].
3.2 Flattop kernel choice
As suggested by Theorem 2.1, in order to achieve favorable asymptotic rates, it is desirable to choose a flattop kernel of higher order, and hence a flattop function as smooth as possible according to Proposition 2.1. McMurry and Politis (2004) [18] constructed a member of the flattop family that is infinitely differentiable, which is defined as
(16) 
where determines the region over which is identically 1, and is a shape parameter, making the transition from to more or less abrupt.
The function connects the regions where is 0 and the region where is 1 in a manner such that is infinitely differentiable for all , including where and . The resulting kernel is of infinite order in the sense that decays faster than , for all positive finite , as . Fig.1 shows the plots of the infinitely differentiable flattop function with and , and the resulting kernel as well as the corresponding weight function . Note that the plot of can be created from either Equation (9) or (12) as their equivalence stated in Section 3.1.
Nevertheless, while the effectiveness of the flattop kernels is reflected in Theorem 2.1 and 2.3, they in fact provide merely theoretical bounds for the decay rate of bias and IMSE. In the meantime, according to Theorem 2.1, the reduction of bias of the flattop estimation could potentially be limited by the order of the cumulant condition, which indicates that an infiniteorder kernel might not be necessary. That leads us to attempt other choices within the flattop family. One simple representative flattop function has the trapezoidal shape defined as
(17) 
The trapezoidal is continuous everywhere and it already exhibits good performance when being implemented for the estimation of spectral density matrix; see Politis (2011) [25]. The infinitely differentiable function looks very much like the trapezoidal with ultrasmoothed corners.
Another choice to be considered is the flattop function created by adding a piecewise cubic tail, similar to that of Parzen’s (1961) [22] kernel, to the flattop region. It is defined as
(18) 
Plots of flattop functions and are shown in Figure 2. Concerning the choice of parameters of flattop kernels, i.e., and , we refer the readers to Politis (2011) [25] where a detailed discussion is given.
4 Positive semidefinite spectral estimation
By employing the infiniteorder flattop kernels, the flattop estimator is capable of achieving higherorder accuracy with improved estimation bias. The disadvantage of flattop kernels, however, is that they are not positive semidefinite. As a result, the operator estimation is not almost surely positive semidefinite for all , while it converges to a positive semidefinite operator .
The positive semidefiniteness of the estimation is desirable especially in the case of
when the object is estimation of a longrun covariance operator. In the context of finitedimensional time series analysis, the spectral density matrix estimators can be easily adjusted to be positive semidefinite via replacing negative eigenvalues by zeros in the diagonalization of the estimated matrices; see e.g. Politis (2011)
[25]. Analogously, we now show how the flattop operator estimator can be modified to render a positive semidefinite estimator while preserving the asymptotic consistency.The spectral decomposition of operators in an infinitedimensional Hilbert space is much more intricate than that in a finitedimensional context. However, recall that both operators and are induced by kernel functions through right integration, and therefore they are symmetric HilbertSchmidt operators that admit the following decompositions
(19) 
(20) 
where and are two sequences of real numbers tending to zero; and are two orthonormal bases of . We have for ,
thus and , are complete sequences of eigenelements of and respectively.
Noting that the eigenvalues are all nonnegative since the operator is positive semidefinite. To fix the possible negativity of , let for all , and define the estimator
(21) 
We keep nonnegative eigenvalues of and replace negative eigenvalues by zero, which makes the resulting operator an positive semidefinite estimator. The connection of and is shown in the following inequality:
Proposition 4.1.
Let be the positive semidefinite operator estimator of defined in (9), then for a fixed
(22) 
where is the HilbertSchmidt norm.
Proof.
See Appendix. ∎
A direct consequence of the last result is the following corollary which shows that, in addition to being positive semidefinite, possesses the same mean square convergence of given in Theorem 2.3.
Theorem 4.1.
Under the condition of Theorem 2.3, the positive semidefinite spectral density operator estimate employing a flattop kernel of order is consistent in integrated mean square with
where is the HilbertSchmidt norm.
In the case that the estimand is not only positives semidefinite but strictly positive definite, it is desirable to have a strictly positive definite estimator of . A similar modification of can be applied here to make the estimator strictly positive definite. Let for all , where is some chosen sequence, and define the estimator
(23) 
The estimator is positive definite and it can be verified that it maintains the high accuracy of the flattop estimator if . Thus, is a higherorder accurate, strictly positive definite estimator.
5 Datadependent bandwidth choice
As it has been demonstrated in Section 3.1 that the lagwindow estimate (15) with bandwidth is nearly equal to the estimate (8) with bandwidths , we propose here an empirical rule for choosing the bandwidth in practice, which resembles the bandwidth choosing rule for the flattop lagwindow introduced in Politis (2011) [25].
Recall that the sample autocovariance kernel
the proposed bandwidth choice rule is done by a simple inspection of the functional version of correlogram/crosscorrelogram, i.e. a plot of vs. where
for all .
We look for a point, say , after which the correlogram for each pair of appears negligible, i.e. for , and . Here is taken to mean that is not taken significantly different from 0. In practice, we determine by considering the correlogram for over a finite grid of . After identifying , the recommendation is to take
(24) 
where is the parameter determines the ‘flattop’ region of .
From the flattop lagwindow perspective, the intuition behind the above bandwidth choice rule is an effort to extend the ‘flattop’ region of over the whole of the region where is thought to be significant so as not to downweigh it and introduce bias. As scrutinized in Politis (2011) [25], the ‘flattop’ region of can be greater than depending on the choice of function . The decreasing rate of near could be slow enough so that for an interval much greater than ; see, for example, (16) and Figure 1(a) regarding the infinitely differentiable with and . Instead of the interval , we consider an ‘effective’ flattop region of defined as the interval where is the largest number such that for all in ; here is some small number chosen number, e.g. .
Let be a finite grid of . Now we can formalize the empirical rule of choosing bandwidth .
EMPIRICAL RULE OF CHOOSING BANDWIDTH .
For , let be the smallest nonnegative integer such that , for , where is a fixed constant, and is a positive, nondecreasing integervalued function of such that . Then, let , and .
The constant and the form of are the practitioner’s choice. Politis (2003) [24] makes the concrete recommendations and
that have the interpretation of yielding (approximately) 95% simultaneous confidence intervals for
with by Bonferroni’s inequality.It is also worth noting that by considering the correlogram over the finite grid , we actually generate a matrix of thresholds and, is picked as the maximum among the entries of the matrix, i.e, for . This rule of identifying can be blemished in the situation that certain are radically greater compared to the others. Picking to be the average of the matrix entries will be a more reasonable choice when such a special case arises. Nevertheless, if the target is to estimate the spectral kernel for a particular pair , one can always choose the bandwidth by using the specified , i.e. .
6 Simulations
We now present some numerical simulations to complement our asymptotic results. The main goal of the simulations is to compare the performance of the estimators employing flattop kernels with that of the nonflattop estimation, as well as to illustrate the main issues discussed in the paper. The simulations are performed on a simple functional moving average model
(25) 
The simulations we carry out are analogous to that conducted in Panaretos and Tavakoli (2013) [19]. The innovation functions ’s are independent Wiener processes on , which are represented using a truncated KarhuenLoève expansion,
where ,
are independent standard Gaussian random variables and
is orthonormal system in ; see Adler (1990) [1]. The operators and are constructed so that their image be contained within a 50dimensional subspace of , spanned by an orthonormal basis . Representing in the basis, and in the basis, we have the matrix representation of the process as , where is a matrix, each is a matrix, and each is a matrix.A stretch of is generated for with . Matrices are constructed as random Gaussian matrices with independent entries, such that element in th row are distributed.
For the simulation, simulation runs are generated for each which are used to compute the IMSE by approximating the integral
by a weighted sum over the finite grid . We consider the estimators with proposed flattop kernels and compare them with the Epanechnikov kernel, , which is nonflattop implemented in the simulations of Panaretos and Tavakoli (2013) [19]. We apply bandwidth for the estimator of each kernel. In addition, the bandwidths of the estimators employing flattop kernels are also estimated using the empirical rule proposed in Section 5.
The simulation results are presented in Table 1, entries of which are logarithm of IMSE in base 2. As expected, the estimators employing flattop kernels show a faster decay rate of IMSE compared to the one with the nonflattop Epanechnikov kernel. The performance of flattop Parzen’s kernel and flattop infinitely differentiable kernel is close, while each slightly outperforms the trapezoid as the sample size grows. This might be due to the fact that the smoothness of the flattop functions is indeed a factor on the decay of IMSE, but oversmoothing might not be necessary as the performance could potentially be limited by the order of cumulant conditions as Theorem 2.3 suggests. Also note that implementing the empirical rule of bandwidth choice yields a slight improvement as the sample size grows.



7 Appendix: Proofs
7.1 Proof of Proposition 2.1
To prove Proposition 2.1, we need the following lemma, which measures the magnitude of Fourier coefficient.
Lemma 7.1.
Let be an integrable function on the interval , and be its Fourier coefficients defined by
(26) 
If is times differentiable and is Hölder continuous of order , then
Proof.
Proof of Proposition 2.1. For a flattop function and its inverse Fourier transform , we have
(30) 
(31) 
Since the assumption that is times differentiable and is Hölder continuous of order , by Lemma 7.1, we have for some constant , which implies has finite moments up to order , i.e. for .
7.2 Proof of Theorem 2.1
To prove Theorem 2.1, the following lemmas are necessary.
Lemma 7.2.
We have the following properties for the flaptop kernel and the function :
(i)
(ii)
Let , and denoted by the total variation of a function .
(iii) If , ;
(iv) If ,
Comments
There are no comments yet.