With this paper, we aim to contribute to the growing literature on general white noise tests for functional time series. The scope of methodology for analyzing time series taking values in a function space has grown substantially over the last two decades; see Bosq (2000) for seminal work on linear processes in function space, and Hörmann and Kokoszka (2012), as well as Chapters 13–16 of Horváth and Kokoszka (2012) for more recent reviews. Most methods are still founded on non-parametric and non-likelihood based approaches that rely on the estimation of autocovariance operators to quantify serial dependence in the sequence; see, e.g., Kargin and Onatski (2008); Klepsch and Klüppelberg (2017); Panaretos and Tavakoli (2013); Zhang and Shao (2015). It is then often of interest to measure whether or not an observed sequence of curves, or model residuals, exhibit significant autocorrelation.
Quite a large number of methods have been introduced in order to test for serial correlation in the setting of functional time series. These methods build upon a well developed literature on portmanteau tests for scalar and multivariate time series, see e.g. Li (2004) and Chapter 5 of Francq and Zakoïan (2010) for a summary, and can be grouped along the classical time series dichotomy of time domain and spectral domain approaches. In the time domain, Gabrys and Kokoszka (2007)
proposed a method based on applying a multivariate portmanteau test to vectors of scores obtained from projecting the functional time series into a few principal directions.Horváth et al. (2013) and Kokoszka et al. (2017) develop portmanteau tests based directly on the norms of empirical autocovariance operators. In the spectral domain, Zhang (2016) and Bagchi et al. (2018) develop test statistics based on measuring the difference between periodogram operators and the spectral density operator under the assumption that the sequence is a weak white noise. Among these, the tests of Horváth et al. (2013), Zhang (2016) and Bagchi et al. (2018) are general white noise tests in the sense that they are asymptotically consistent against serial correlation at any lag, whereas the other tests cited utilize only the autocovariance information up to a user selected maximum lag, and do not have more than trivial power for serial correlation occurring beyond that lag.
Each of these statistics are indelibly connected in that they reduce to weighted sums of semi-norms of empirical autocovariance operators, and hence it is natural to consider alternate ways of choosing the weights in order to increase power; see Escanciano and Lobato (2009) for a nice discussion of these connections for scalar portmanteau tests. The statistics developed in Zhang (2016) and Bagchi et al. (2018), which are based on the periodogram operator, effectively give equal weights to the autocovariance operators at all lags. One might expect then that reweighting the periodogram so that it is a more efficient estimate of the spectral density operator might lead to more powerful tests. A sensible and well studied method of choosing these weights is to employ a kernel and lag-window or bandwidth as is common in nonparametric spectral density operator estimation. Panaretos and Tavakoli (2013) were the first to put forward similar estimators of the spectral density operator based on smoothing the periodogram operator for -valued time series. Hong (1996) developed general white noise tests for scalar sequences based on this principle by comparing kernel lag-window estimates of the spectral density to the constant spectral density of a weak white noise, which were further studied and extended in Shao (2011b) and Shao (2011a).
In this paper, we develop a general white noise test based on kernel lag-window estimators of the spectral density operator of a time series with values in a separable Hilbert space
. Under standard conditions on the kernel and bandwidth used, we show that the estimated distance between the spectral density operator and the constant spectral density operator based on such estimators can be normalized to have an asymptotic standard normal distribution when the observed series is a strong white noise. We further show that this standardized distance diverges in probability to infinity at a quantifiable rate under general departures from a weak white noise.
These results compare to and generalize the main results of Horváth et al. (2013). Letting denote the length of the functional time series and denote the bandwidth parameter, the main result of Horváth et al. (2013) establishes the asymptotic normality of the unweighted sum of the norms of empirical autocovariance operators up to a bandwidth satisfying , , as . Our main result establishes this under the less restrictive conditions on the bandwidth ,
, while also allowing for a general class of weights in the sum. This condition on the bandwidth is optimal in the sense that without it the corresponding kernel lag-window estimator of the spectral density cannot generally be consistent in mean squared norm sense. These improvements owe primarily to a new, more general, method of proof relying on a martingale central limit theorem.
Implementing the test requires the choice of a kernel and bandwidth, and we suggest several methods for this, including a new data driven and kernel adaptive bandwidth. We also investigate power transformations of our test statistics that improve their size properties in finite samples. We investigated these results and various choices of the kernel and bandwidth by means of a Monte Carlo simulation study, which confirm that the proposed tests have good size as well as power exceeding currently available tests, although with the drawback that they are not built for general white noise (e.g. functional conditionally heteroscedastic) series.
The rest of the paper is organized as follows. We present in Section 2 our main methodological contributions and theory, including all asymptotic results for the proposed test statistic and its power transformation. In Section 3, we discuss some details of implementing the proposed test, and present the results of a Monte Carlo simulation study. The practical utility of the proposed tests are illustrated in Section 4 with an application to Eurodollar futures curves. Some concluding remarks are given in Section 5, and all proofs as well as the definition of the data adaptive bandwidth are given in the Appendix following the references.
2 Statement of method and main results
We first define some notation that is used throughout the paper. Suppose that is a real separable Hilbert space with the inner product . For
, the tensor ofand is a rank one operator defined by for each . Let be a second-order stationary sequence of
-valued random elements. Throughout we make use of the following moment conditions:
and for each .
The autocovariance operators of are defined by
The spectral density operator with
is the discrete-time Fourier transform of the sequence of autocovariance operatorsdefined by
where . is well-defined for provided that , where is the Hilbert-Schmidt norm. We say that is a weak white noise if for , and evidently in this case is constant as a function of .
2.1 Definition of test statistic and null asymptotics
In order to measure the proximity of a given functional time series process to a white noise, it is natural then to consider the distance , in terms of integrated normed error, between the spectral density operator , , and :
specifically measures how far the second-order structure of deviates from that of a weak white noise. Given a sample from , we are then interested in testing the hypothesis
We discuss in Section 5 how one might adapt the proposed test statistic to generally test versus , but we do not pursue this in detail here. One can estimate via estimates of , , and . The sample autocovariance operators are defined by
and for , where denotes the adjoint of an operator. The spectral density operator may be estimated using a kernel lag-window estimator defined by
where is a kernel and is the lag-window or bandwidth parameter. We make the following assumptions on the kernel and .
is a symmetric function that is continuous at zero and at all but finite number of points, with and , for some as .
satisfies that and as .
Assumption 2 covers all typically used kernels in the literature on spectral density estimation, and guarantees that integrals of the form are finite, for . The above estimates yield an estimate of defined by
Using the fact that , where is the Hilbert-Schmidt inner product and the fact that the functions defined by for and are orthonormal in , we obtain
Further, since the kernel is symmetric, , and , we also have that
Remarkably, can be normalized under these general conditions in order to satisfy the central limit theorem under , and we now proceed by defining the normalizing sequences and constants needed to do so. Let , and . Also, let us denote
We propose to use the test statistic defined by
We now state our main result, which establishes the asymptotic normality of .
Let us observe that
If , then it follows that as , and we recover precisely the test statistic proposed by Hong (1996) (see the statistic on page 840 of Hong (1996)). In a general real separable Hilbert space , converges in probability to , which need not equal one. Intuitively, the structure “inside” determines the value of .
Taking the kernel to be the truncated kernel, , one has that
Additionally if , then also , and as . It follows then under these and the conditions of Theorem 1 that the statistic is asymptotically equivalent with
which is identical to the statistic considered in Horváth et al. (2013). It was shown there that this statistic has a normal limit under and the assumption that , as . Therefore Theorem 1 can be viewed as a generalisation of their result.
2.2 Transformation of test statistic
tends to be right skewed, causing the test to be oversized. We confirm their findings in our simulation study in Section3 as the test statistic suffers from the same problem in the general Hilbert space setting as well. In order to alleviate this, Chen and Deo (2004) suggests a power transformation of the test statistic of the form
for with . It follows from Theorem 1 and the application of the delta method that as for .
Chen and Deo (2004) then recommend to choose the value of in order to make approximate skewness of equal to zero. In order to describe how to achieve this in our setting, let us suppose that is a Gaussian random element with values in such that and the covariance operator of is given by , where is the space of the Hilbert-Schmidt operators from to and denotes the tensor of two elements of . As a result of the central limit theorem applied to the estimated covariance operators ,
is approximately a weighted sum of independent and identically distributed random variables with the distribution of. From this we can obtain the approximate skewness of using Taylor’s theorem (see Chen and Deo (2004) for more details). The value of which makes the approximate skewness equal to zero is given by
where , and .
If , is distributed as a scaled random variable, and then . Consequently, in the univariate case only depends on the sample size , the kernel and the bandwidth . The choice of even in the general setting can be motivated by using a Welch–Satterthwaite style approximation of the norm of a Gaussian process
with a scaled chi-squared distribution; see for instanceZhang (2013) and Krishnamoorthy (2016). However, in the general Hilbert space setting the value of depends on the data and should be estimated. Let us suppose that
are the eigenvectors of
with the corresponding eigenvalues. We have that
where are independent and identically distributed random variables. It follows by calculating the cumulant generating function of that
and we can estimate , and by estimating using the estimator given by (3). It follows that the plug-in estimators of , and are given by
These results motivate two tests of each with asymptotic size : to Reject if or with , where is the quantile of the standard normal distribution. The finite sample properties of each of these tests for various choices of the bandwidth and kernel, as well as a comparison of these tests to existing methods, are presented in Section 3.
We now establish the consistency of our test under . In order to describe time series that are generally weakly dependent, we follow Tavakoli (2014) and introduce cumulant summability conditions. Suppose that is a sequence of -valued random elements. We say that is -th order stationary with if for all and, for all and ,
The joint cumulant of real or complex-valued random variables such that for all with is given by
where runs through the list of all partitions of , runs through the list of all blocks of the partition , and is the number of parts in the partition; see Brillinger (2001) for more details. Suppose that are -valued random variables such that for all . The -th order cumulant is a unique element in that satisfies
for all . Suppose that is -th order stationary and let us denote
for . Then is an operator that maps the elements of to such that
for all . The following assumption describes the allowable strength of the serial dependence of the ’s.
is a fourth order stationary sequence of zero mean random elements with values in a real separable Hilbert space such that and , where is the nuclear norm.
Assumption 4 is essentially a generalisation of the assumption used by (Hong, 1996, Assumption A.4) to a real separable Hilbert space . A similar assumption is used by (Zhang, 2016, Assumption 3.1). The next theorem establishes the consistency of our test, which shows that, under the alternative hypothesis, the rate at which diverges to infinity as is .
3 Simulation study
We now present the results of a simulation study which aimed to evaluate the performance of the tests based on and in finite samples, and to compare the proposed method to some other approaches available in the literature. For this we take the Hilbert space to be , i.e. the space of equivalence classes of almost everywhere equal square-integrable real valued functions defined on . In particular, we applied the proposed method to simulated samples from the following data generating processes (DGP’s):
IID-BM: , where is a sequence of independent and identically distributed standard Brownian motions.
where and are integral operators defined, for and , by
(a constant function), and
where are independent and identically distributed Brownian bridges.
FAR(, )-BM: , where follows IID-BM, and . The constant is then chosen so that .
Evidently the data generated from IID-BM satisfies both and . The functional GARCH process that we study here was introduced in Aue et al. (2017), and the particular settings of the operators and error process are meant to imitate high-frequency intraday returns. This process satisfies but not . The FAR(1, )-BM satisfies neither nor , and we study in particular the case when in order to compare to the results in Zhang (2016). Each random function was generated on 100 equally spaced points on the interval, and for the DGP’s fGARCH(1,1) and FAR(1, )-BM a burn-in sample of length 30 was generated prior to generating a sample of length .
The tests based on and require the choice of a kernel function and bandwidth, which we now discuss. The problem of kernel and bandwidth selection for nonparametric spectral density estimation enjoys an enormous literature, going back to the seminal work of Bartlett (1950) and Parzen (1957). This problem has recently received attention for general function space valued time series in, for example, Panaretos and Tavakoli (2013) and Rice and Shang (2017). Their theoretical and empirical findings each support using standard kernel functions with corresponding bandwidths tuned to the order or “flatness” of the kernel near the origin. Rice and Shang (2017) showed that some further gains can be achieved in terms of estimation error of the spectral density operator at frequency zero by utilizing data driven bandwidths. With this in mind, we considered the following kernel functions
The respective orders of these kernels are ,, and . For each of these kernels, we considered bandwidths of the form where is the order of the respective kernel, as well as bandwidths of the form where is a constant estimated from the data aiming to adapt the size of the bandwidth to the serial dependence of the observed time series. In particular, it aims to minimize the integrated normed error of the spectral density operator based on a pilot estimates of and its generalized derivative. The details of this data driven bandwidth selection are given in Appendix D below.
Our simulations show that the values of and heavily depend on the choice of the kernel. The sample size and the choice of the bandwidth affect the values of and to a much lesser extent. We obtain similar values of for the different DGP’s that we consider. when the kernel is either the Bartlett kernel or the Parzen kernel and when the kernel is the Daniell kernel. when the kernel is the Bartlett kernel, when the kernel is the Parzen kernel and for the Daniell kernel.
In addition to the proposed statistics, we also compared to the statistic/test of Zhang (2016) and to the statistic/test of Bagchi et al. (2018). Zhang (2016) utilizes the choice of a block size for a block bootstrap. We present the results for given the relative similarity in performance for different values of .
The number of rejections from 1000 independent simulations with nominal levels of 5% and 1% are reported in Table 1. From this we can draw the following conclusions about the test:
The tests based on
tend to be oversized. Histograms and summaries of the distribution of the test statistic for data generated according to IID-BM indicate that the distribution is right skewed relative to the standard normal distribution, but has approximately mean zero and variance one. These findings are consistent with the findings ofChen and Deo (2004) and Horváth et al. (2013). Applying the power transformation to corrects for this fairly well in finite samples. Our simulation results show that is well sized, although seems to be slightly undersized.
Regarding kernel and bandwidth selection, no one kernel or bandwidth setting displayed substantially superior performance. There were only negligible differences in the results when comparing the data driven bandwidth with standard bandwidths for the DGP’s considered. In terms of size all kernels exhibited similar performance, although the Parzen kernel exhibited slightly higher power relative to other kernels.
The tests based on , or are not appropriately sized for the fGARCH(1,1) DGP, which might be expected since these statistics are not adjusted in any way to handle general weak white noise sequences. We provide some further discussion on this in Section 5. By contrast the tests of Zhang (2016) and Bagchi et al. (2018) are built for such sequences, although the test of Bagchi et al. (2018) seems to be undersized.
The tests based on , and exhibited higher power when compared to the test or for FAR(1,0.3)-BM data, especially at the level of 1%. Another advantage of the proposed approach over the test of Zhang (2016) is its reduced computational burden. Using a -bit implementation of R running on Windows 10 with an Intel i3-380M (2.53 GHz) processor, one single calculation of the test statistic and/or -value based on or when takes less than second, whereas calculating with the same data takes several minutes.
In practice, we recommend the use of the statistic with the data driven bandwidth for general white noise testing as long as conditional heteroscedasticity is not thought to be an issue. In the case when conditional heteroscedasiticy is of concern, the statistic of Zhang (2016) or Bagchi et al. (2018) is expected to give more reliable results.
4 Empirical example
In order to illustrate the utility of the proposed tests, we present here the results of an application to daily Eurodollar futures curves. A Eurodollar futures contract represents an obligation to deliver 1,000,000 USD to a bank outside of the United States at a specified time, and their prices are given as values between zero and 100 defining the interest rate on the transaction. The specific data that we consider are daily settlement prices available at monthly delivery dates for the first six months, and quarterly delivery dates for up to 10 years into the future. Following Kargin and Onatski (2008), we transformed this raw data into smooth curves using cubic -splines, and these curves were reevaluated at 30 day “monthly” intervals to produce the discretely observed curves that we used in subsequent analyses. The corresponding daily Eurodollar futures curves from the year 1994 are illustrated in the left hand panel of Figure 1.
We considered data spanning 10 years from 1994 to 2003, consisting of approximately 2,500 curves. We treated these data as 10 yearly samples of functional time series each of length approximately 250. The basic question we wish to address in each sample is whether or not the curves in that year seem to exhibit significant serial dependence as measured by their autocovariance operators. We applied the proposed test based on the power-transformed statistic using the Bartlett kernel and corresponding empirical bandwidth of the form to each sample. The approximate -values of these tests are displayed in the right hand panel of Figure 2, which are essentially equal to zero in all cases. This suggests that the Eurodollar futures curves exhibit substantial serial dependence. This observation is consistent with the suggested FAR(1) model for these curves proposed by Kargin and Onatski (2008).
Although these results are consistent with the data following an FAR(1) model, they may also be explained by the fact that the raw Eurodollar futures time series are apparently not mean stationary; over periods as long as a year they typically exhibit strong trends and seasonality. We evaluated the stationarity of each of these samples using the test proposed in Horváth et al. (2014), which suggest that in general the raw Eurodollar futures curves are non-stationary. Letting denote the futures curve on day , we studied then instead the first order differenced curves . The first order differenced Eurodollar futures curves from 1994 are shown in the right hand panel of Figure 1, and the stationarity test of Horváth et al. (2014) applied to these curves suggest that they are reasonably stationary. The results of these stationarity tests are illustrated in the left hand panel of Figure 2.
We applied the proposed test using the statistic with the same settings as above to each sample of first differenced curves. In six of the ten years considered the hypothesis that the first differenced futures curves evolve as a functional white noise cannot be rejected at the level. Interestingly however, in consecutive years from 1998 to 2001 the first differenced Eurodollar futures curves exhibit significant autocovariance operators to the level as measured by our tests.
We have introduced a new test statistic for white noise testing with functional time series based on kernel lag-window estimates of the spectral density operator. The asymptotic properties of the proposed test have been established assuming the observed time series is a strong white noise, and it was also shown to be consistent for general time series exhibiting serial correlation. The test seems to improve upon existing tests in terms of power against functional autoregressive alternatives, although it has the drawback that it is not well sized for general weak white noise sequences in function space, such as for functional GARCH processes.
Based on the work of Shao (2011b) and Shao (2011a), we conjecture that Theorem 1 can be established for general weakly dependent white noise sequences in , despite the fact the proposed tests have markedly inflated size for sequences exhibiting conditional heteroscedasticity in our simulation. Roughly speaking, Shao (2011b) shows that in the scalar case the limit distribution of is determined asymptotically by the normalized autocovariances of the process at long lags, which behave essentially the same with strong white noise sequences as with weakly dependent white noises. With this in mind and in light of our simulation results, one expects to need very long time series and a large bandwidth in order for this asymptotic result to be predictive of the behavior of the test statistic for general weak white noise sequence exhibiting serial dependence, which encourages the use of a block bootstrap in finite samples. We leave these issues as potential directions for future research.
The first author would like to acknowledge the support of the Communauté Française de Belgique, Actions de Recherche Concertées, Projects Consolidation 2016–2021. The second author is partially supported by the Natural Science and Engineering Research Council of Canada’s Discovery and Accelerator grants. We would like to thank Professor Alessandra Luati for directing us to the work of Chen and Deo (2004) and we would also like to thank the anonymous reviewers as well as the associate editor for their comments and suggestions that helped us improve this work substantially.
- Aue et al. (2017) Aue, A., Horváth, L., and Pellatt, D. F. (2017). Functional generalized autoregressive conditional heteroskedasticity. Journal of Time Series Analysis, 38:3–21.
- Bagchi et al. (2018) Bagchi, P., Characiejus, V., and Dette, H. (2018). A simple test for white noise in functional time series. Journal of Time Series Analysis, 39(1):54–74.
- Bartlett (1950) Bartlett, M. (1950). Periodogram analysis and continuous spectra. Biometrika, 37:1–16.
Berkes et al. (2016)
Berkes, I., Horváth, L., and Rice, G. (2016).
On the asymptotic normality of kernel estimators of the long run
covariance of functional time series.
Journal of Multivariate Analysis, 144:150–175.
- Bosq (2000) Bosq, D. (2000). Linear Processes in Function Spaces. Springer, New York.
- Brillinger (2001) Brillinger, D. R. (2001). Time Series: Data Analysis and Theory. Classics in Applied Mathematics. Society for Industrial and Applied Mathematics.
- Brown (1971) Brown, R. M. (1971). Martingale central limit theorems. The Annals of Mathematical Statistics, 42:59–66.
- Bühlmann (1996) Bühlmann, P. (1996). Locally adaptive lag–window spectral estimation. Journal of Time Series Analysis, 17:247–270.
- Chen and Deo (2004) Chen, W. W. and Deo, R. S. (2004). Power transformations to induce normality and their applications. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(1):117–130.
- Escanciano and Lobato (2009) Escanciano, J. C. and Lobato, I. (2009). An automatic portmanteau test for serial correlation. Journal of Econometrics, 151(2):140 – 149.
- Francq and Zakoïan (2010) Francq, C. and Zakoïan, J.-M. (2010). GARCH models. Wiley.
- Gabrys and Kokoszka (2007) Gabrys, R. and Kokoszka, P. (2007). Portmanteau test of independence for functional observations. Journal of the American Statistical Association, 102:1338–1348.
- Hong (1996) Hong, Y. (1996). Consistent testing for serial correlation of unknown form. Econometrica, 64:837–864.
- Hörmann and Kokoszka (2012) Hörmann, S. and Kokoszka, P. (2012). Functional time series. In Rao, C. R. and Rao, T. S., editors, Time Series, volume 30 of Handbook of Statistics. Elsevier.
- Horváth et al. (2013) Horváth, L., Hušková, M., and Rice, G. (2013). Testing independence for functional data. Journal of Multivariate Analysis, 117:100–119.
- Horváth and Kokoszka (2012) Horváth, L. and Kokoszka, P. (2012). Inference for Functional Data with Applications. Springer.
- Horváth et al. (2014) Horváth, L., Kokoszka, P., and Rice, G. (2014). Testing stationarity of functional time series. Journal of Econometrics, 179:66–82.
- Kargin and Onatski (2008) Kargin, V. and Onatski, A. (2008). Curve forecasting by functional autoregression. Journal of Multivariate Analysis, 99:2508–2526.
- Klepsch and Klüppelberg (2017) Klepsch, J. and Klüppelberg, C. (2017). An innovations algorithm for the prediction of functional linear processes. Journal of Multivariate Analysis, 155:252 – 271.
- Kokoszka et al. (2017) Kokoszka, P., Rice, G., and Shang, H. L. (2017). Inference for the autocovariance of a functional time series under conditional heteroscedasticity. Journal of Multivariate Analysis, 162:32 – 50.
- Krishnamoorthy (2016) Krishnamoorthy, K. (2016). Modified normal-based approximation to the percentiles of linear combination of independent random variables with applications. Communications in Statistics - Simulation and Computation, 45:2428–2444.
- Li (2004) Li, W. K. (2004). Diagnostic Checks in Time Series. Chapman and Hall.
- Newey and West (1994) Newey, W. K. and West, K. D. (1994). Automatic lag selection in covariance matrix estimation. The Review of Economic Studies, 61(4):631–653.
- Panaretos and Tavakoli (2013) Panaretos, V. M. and Tavakoli, S. (2013). Fourier analysis of stationary time series in function space. The Annals of Statistics, 41(2):568–603.
- Parzen (1957) Parzen, E. (1957). On consistent estimates of the spectrum of stationary time series. The Annals of Mathematical Statistics, 28:329–348.
- Rice and Shang (2017) Rice, G. and Shang, H. (2017). A plug‐in bandwidth selection procedure for long‐run covariance estimation with stationary functional time series. Journal of Time Series Analysis, 38:591–609.
- Rosenblatt (1985) Rosenblatt, M. (1985). Stationary Sequences and Random Fields. Birkhäuser Boston.
- Senatov (1998) Senatov, V. V. (1998). Normal Approximation: New Results, Methods and Problems. VSP.
- Shao (2011a) Shao, X. (2011a). A bootstrap-assisted spectral test of white noise under unknown dependence. Journal of Econometrics, 162(2):213 – 224.
- Shao (2011b) Shao, X. (2011b). Testing for white noise under unknown dependence and its applications to diagnostic checking for time series models. Econometric Theory, 27:312–343.
- Tavakoli (2014) Tavakoli, S. (2014). Fourier Analysis of Functional Time Series with Applications to DNA Dynamics. PhD thesis, École Polytechnique Fédérale de Lausanne.
- Zhang (2013) Zhang, J. T. (2013). Analysis of Variance for Functional Data. Chapman and Hall.
- Zhang (2016) Zhang, X. (2016). White noise testing and model diagnostic checking for functional time series. Journal of Econometrics, 194(1):76–95.
- Zhang and Shao (2015) Zhang, X. and Shao, X. (2015). Two sample inference for the second-order property of temporally dependent functional data. Bernoulli, 21:909–929.
Appendix A Proof of Theorem 1
We begin with a note comparing the basic approach here to that of Horváth et al. (2013). Horváth et al. (2013) uses at its core a central limit theorem for vectors with increasing dimension adapted from Senatov (1998), and as a result the fairly restrictive condition on the bandwidth is needed. Our proof is somewhat more straightforward in the sense that we show that the suitably normalized statistic is a martingale with uniformly asymptotically negligible increments. This requires a number of intermediary approximations. Before we prove Theorem 1, we state two elementary lemmas that we use in the proof.
Suppose that and are independent random elements with values in a separable Hilbert space such that , and or . Then .
Suppose that and are independent and identically distributed random elements with values in a separable Hilbert space with zero means and finite second moments. Then
Now we are ready to prove Theorem 1. The basic format of the proof follows Hong (1996), but in some instances must significantly deviate due to the assumption that the underlying variables are in an arbitrary separable Hilbert space. We denote and .
Proof of Theorem 1.
By (7) and Slutsky’s theorem, it suffices to show that as and
Now we show (13) holds. By Chebyshev’s inequality, as . Since
Markov’s inequality implies that as . We obtain
because as . Hence,