A residual-based bootstrap for functional autoregressions

We consider the residual-based or naive bootstrap for functional autoregressions of order 1 and prove that it is asymptotically valid for, e.g., the sample mean and for empirical covariance operator estimates. As a crucial auxiliary result, we also show that the empirical distribution of the centered sample innovations converges to the distribution of the innovations with respect to the Mallows metric.



There are no comments yet.



Bootstrap of residual processes in regression: to smooth or not to smooth ?

In this paper we consider a location model of the form Y = m(X) + ε, whe...

Moving Block and Tapered Block Bootstrap for Functional Time Series with an Application to the K-Sample Mean Problem

We consider infinite-dimensional Hilbert space-valued random variables t...

Weighted residual empirical processes, martingale transformations and model checking for regressions

In this paper we propose a new methodology for testing the parametric fo...

What is the distribution of the number of unique original items in a bootstrap sample?

Sampling with replacement occurs in many settings in machine learning, n...

What is the dimension of a stochastic process? Testing for the rank of a covariance operator

How can we discern whether a mean-square continuous stochastic process i...

Bootstrap prediction intervals with asymptotic conditional validity and unconditional guarantees

Focus on linear regression model, in this paper we introduce a bootstrap...

Robust Nonparametric Distribution Forecast with Backtest-based Bootstrap and Adaptive Residual Selection

Distribution forecast can quantify forecast uncertainty and provide vari...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The seminal work of (Bosq, 2000) has initiated a lot of research on the theory, computational aspects and applications of functional data analysis. The recent monograph of (Horváth and Kokoszka, 2010) and, with a focus on functional time series, the review article of (Kokoszka, 2012) give an overview over the field of research. In this paper, we consider a time series with values in a Hilbert space , e.g. curves in a function space like L. In particular, we are interested in functional autoregressions, also known as autoregressive Hilbertian models (ARH). As is well known, a functional autoregressive process of order or FAR(

)-process can be easily be written as a FAR(1)-process by an appropriate change of state vector and Hilbert space. Therefore, it essentially suffices to consider the case of order 1, where


Here, is a linear operator, and are independent, identically distributed (i.i.d.) innovations. Recently, several new statistical methods for data generated by (1) have been proposed, in particular regarding tests and forecasts. (Kokoszka et al., 2008) have investigated a test of the hypothesis , i.e. of independence of the data. (Gabrys and Kokoszka, 2007) consider a related problem, a test of independence for general functional time series. (Horváth et al., 2010) propose a CUSUM test for a sudden change in the dependence structure of the data, i.e. for the presence of a point in time where the value of changes, which has been applied to neurophysiological data by (Franke et al., 2018). Other papers concentrate on the task of forecasting the data. (Didericksen et al., 2012) present an empirical study of forecasting by where denotes some estimate of . (Kargin and Onatski, 2008) develop an appropriate theory for a particular kind of estimate . Also, forecasting on the basis of FAR(1) models has been used in a lot of applications partly discussed below in the context of the bootstrap.

Asymptotics for the distribution of estimates of the autoregressive operator is involved, as pointed out by (Mas, 2007), and as, additionally, it frequently provides decent approximations only for large sample sizes, a lot of applied papers use resampling techniques to derive critical values for tests or prediction intervals for forecasts (compare, e.g., (Shang, 2015) for an overview). The theory for bootstrapping functional data, which provides guidelines under which circumstances bootstrap approximations are valid, is, however, still rather incomplete. E.g., only recently (Paparoditis and Sapatinas, 2015) show that bootstrap methods work for testing the equality of means and covariance operators in samples of independent functional data.

We are, in particular, interested in the residual-based bootstrap where resampling is done on the basis of the centered sample residuals . This kind of bootstrap is quite common in the context of scalar autoregressive and ARMA models (compare (Kreiss and Paparoditis, 2011)) and forms the starting point for the widely applicable autoregressive sieve bootstrap (compare (Kreiss et al., 2011)).

This kind of bootstrap has been investigated in the analogous, but, from the viewpoint of theory, considerably simpler regression situation. (González-Manteiga and Martínez-Calvo, 2011) discuss the linear functional regression model , where is scalar and is a linear functional. Treating

as fixed which is common in the regression context, they prove that the residual-based bootstrap and, for heteroscedastic residuals

, the wild bootstrap works. In the same model, (González-Manteiga et al., 2014) apply the pairwise bootstrap and the wild bootstrap to a test of the hypothesis . (Ferraty et al., 2010) consider the functional regression model with general, not necessarily linear operators and prove that the residual-based and the wild bootstrap works for nonparametric kernel estimates of . (Ferraty et al., 2012)

extend those results to the case where the response variable is also of functional nature, e.g.

. (Zhu and Politis, 2017) and (Raña et al., 2016) discuss the analogous situation for nonparametric functional autoregressions, considering the regression bootstrap and the wild bootstrap respectively (compare (Franke et al., 2002) for these concepts, their advantages and drawbacks in the scalar case), but not the residual-based bootstrap.

Bootstrap techniques are also quite popular in approximating the distribution of statistics from functional time series data. (Horváth and Kokoszka, 2010) use in their section 14.1 the residual-based bootstrap for evaluating the performance of a test for a change in the autoregressive operator of a FAR(1)-process. (Aneiros-Pérez et al., 2011) consider the nonparametric FAR()-model , estimate the autoregression operator nonparametrically by kernel and local linear estimates and apply the residual-based bootstrap to get prediction intervals. (Mingotti et al., 2015) discuss the residual-based bootstrap for the integrated FAR(1)-model, i.e. for the special case , the Hilbert space identity. They derive bootstrap approximations of critical bounds for unit root tests where, under the hypothesis, is known. (Fernández de Castro et al., 2005) investigate among other bootstrap techniques a variant of the residual-based bootstrap in forecasting applications. They start from the centered sample residuals, but do not resample directly from their empirical distribution. They first consider a finite principal component decomposition of the sample residuals and, then, resample the coefficients of this decomposition separately. In a similar spirit, (Hyndman and Shang, 2009)

assume from the start that the time series has a finite Karhunen-Loève expansion which allows to reduce the functional time series to the finite-dimensional time series of the coefficients. They derive bootstrap prediction intervals based on bootstrap confidence intervals for the scalar coefficient time series. All these papers focus on simulations and applications and do not consider the accompanying theory. This gap is filled for the stationary bootstrap, which is a variant of the well-known block bootstrap with random block lengths, in an early paper of

(Politis and Romano, 1994)

. They consider general Hilbert space valued times series and prove, based on a central limit theorem for triangular arrays of such data, that this bootstrap provides valid approximations for the asymptotic distribution of certain statistics.

Based on the thesis (Nyarige, 2016), we show in this paper that the residual-based bootstrap is applicable to FAR(1)-processes. The theory has direct practical implications as, e.g., the necessary centering of the lag-1 autocovariance operator in the bootstrap world is different from what one would naively expect due to the particular nature of the estimate of . For the proof, we cannot use the approach of (González-Manteiga and Martínez-Calvo, 2011) for the residual-based bootstrap in regression and of (Politis and Romano, 1994) for the stationary bootstrap who, for the bootstrap data, both mimic the proof of asymptotic normality of the corresponding functions of the real data. We have to use different methods which are similar to the scalar situation presented by (Kreiss and Franke, 1992); more details will be given in section 4.

In section 2 we describe the details of our model including the relevant assumptions, and we introduce some estimates from the literature which we need later on.

In section 3 we present the crucial result that the empirical distribution of the centered sample innovations converges to the distribution of the innovations.

In section 4 we give the details for the residual-based bootstrap and, as an illustration, state that it works for estimates of the mean and of the first two covariance operators of the data.

Finally, technical results and proofs are given in the appendix.

2 The Model and the Estimates

In this section, we mainly collect some properties of our model and some estimates which are standard in the literature on functional autoregressions and which we need later on. This also serves to introduce notation.

Let be separable Hilbert space with scalar product and norm . As a norm for bounded linear operators from to like we use

A sufficient condition for the existence of a stationary solution of (1) is (compare (Bosq, 2000), section 3.2). We call a linear operator compact if for two orthonormal bases and of and a sequence of real numbers converging to 0,

is, in particular, a Hilbert-Schmidt operator if . The Hilbert-Schmidt norm is an upper bound for . The Hilbert-Schmidt operators form a Hilbert space themselves with a scalar product given by

for an arbitrary orthonormal basis of (compare (Horváth and Kokoszka, 2010), section 2.1).

For the definition of covariance operators, it is convenient to introduce the Kronecker product of which is a linear operator defined by

For later reference, we state two rules of calculation which we use repeatedly and which follow immediately from the definition


where are two linear operators on and here and the following denotes the adjoint of the linear operator which is characterized by for all .

We assume throughout the paper that the data are part of a stationary functional autoregression (1) with mean . Correspondingly, the covariance operator and the lag 1-autocovariance operator are given by and

. Furthermore, we always assume that 0 is not an eigenvalue of

. Then, all eigenvalues of are positive. Let

denote the corresponding orthonormal eigenvectors in


are related to the autoregressive operator

by the analogue to the scalar Yule-Walker equation


The mean is estimated as usual by the sample mean

As estimates of we follow (Horváth and Kokoszka, 2010) and use the simplified sample versions

We use the last observation only in estimating to streamline notation later on. Due to the same reason, we do not center the around in the definitions of . Under our assumption , this has an asymptotically neglible effect. All results remain true in the general case but then we of course have to center the data around 0 in calculating the covariance estimates.

denote the eigenvalues and eigenvectors of . Solving the Yule-Walker equation (3) is an ill-conditioned problem as is not a bounded linear operator defined on the whole space . Therefore, has to be regularized. We use the popular approach via a finite principal component expansion, compare (Bosq, 2000), (Horváth and Kokoszka, 2010), and consider

where slowly for to get a consistent estimate of . Note that is an eigenvector of , and is the orthogonal projection of onto the span of the eigenvector . Then, we get as an estimate of

3 Approximation of the innovation distribution by the empirical measure of sample residuals

The basis for residual-based bootstrapping in scalar regression and autoregression models is the approximability of the innovations by the bootstrap innovations where the latter are drawn from the centered sample residuals. This is stated in the following theorem in terms of the Mallows metric

which is discussed in detail by (Bickel and Freedman, 1981). For two distributions on , it is defined by

where the infimum is taken over all

-valued random variables

and with marginal distributions resp. . By Lemma 8.1. of (Bickel and Freedman, 1981) the infimum is attained.

By , we denote the distribution of respectively the empirical distribution of the centered sample residuals with

Theorem 3.1.

Let be a sample from a stationary FAR(1) process satisfying
i) i.i.d., ,
ii) is a Hilbert-Schmidt operator with ,
iii) the eigenvalues of are all positive and have multiplicity 1.

if and, with ,


A fourth moment condition like i) is not unexpected, as

depends on which are quadratic in the data and which we want to be -consistent estimates. Condition ii) may be relaxed to for some as in the work of (Bosq, 2000)

; we prefer the somewhat stronger assumption to simplify the proofs. The positivity of the eigenvalues in iii) is necessary to exclude singular cases. Assuming dimension 1 of all eigenspaces is standard in the literature on functional autoregressions to circumvent the notational problems with the nonuniqueness of eigenvectors generating a particular eigenspace, but it is not essential for the validity of the results.

The following lemma illustrates the meaning of the rate condition (5) for two particular examples where we impose lower bounds on which is related to the rate of decrease of the eigenvalues. If is allowed to decrease exponentially fast, then may increase at most logarithmically in . If may converge to 0 only with a polynomial rate in then may increase faster like for appropriate . These kinds of relationship between and the rate of decrease of the eigenvalues is quite plausible regarding the character of as a regularization parameter. In similar situations, (Guillas, 2001) found the same kind of rate conditions in his study of the convergence rate of .

Lemma 3.1.

a) Let for some . Then, (5) is satisfied for if, for all large enough ,

b) Let for some . Then, (5) is satisfied for if


a) From the condition of the lemma, we immediately have . Using the formula for geometric sums,

as . Moreover we have

for large enough , as, for some and all , again for large enough ,

b) The proof proceeds in a similar manner as for part a), using and

4 The residual-based bootstrap

We start with a sample from a stationary functional autoregression (1). The basic idea of the bootstrap is to replace the data by pseudodata , calculated from the given sample, with two features:
i) The distribution of certain functions of the data can be approximated by the conditional distribution of the corresponding functions of the pseudodata given .
ii) The conditional distribution of given

is known such that distributional characteristics like moments or quantiles can be numerically calculated by Monte Carlo simulation.

In this section, we generalize the well-known residual-based bootstrap for scalar ARMA-processes, compare, e.g. (Kreiss and Paparoditis, 2011), to the functional setting. Let be the centered sample residuals given by (4), and let be their empirical distribution function. The procedure for generating the pseudodata is the following:

1) Draw bootstrap innovations , purely randomly from the centered sample residuals:

such that the are i.i.d. with distribution conditional on the original data. Here and in the following, we write

for conditional probabilities and expectations given


2) We generate the bootstrap data , recursively by

for some suitable initial value .

If is large, the choice of is of minor importance due to the exponentially decreasing memory of our stationary FAR(1)-process. This follows from its representation as an infinite moving average process (e.g. Theorem 13.1 of (Horváth and Kokoszka, 2010)) together with and . Popular choices are , which are used in the simulations of (Nyarige, 2016), or .

Let us remark that the theory of the residual bootstrap has already been studied for the quite similar functional linear regression model

with real-valued and functional regressors by (González-Manteiga and Martínez-Calvo, 2011). Note that the situation there is much simpler, not only due to the lack of dependence, but equally due to the fact that, by construction, . Therefore, the regressors in the bootstrap world trivially satisfy exactly the same assumptions as the real regressors which is quite useful in showing that the same kind of asymptotics holds for functions of the real resp. the bootstrap data. In particular, the critical covariance operator estimate

, for which we need a regularized inverse, and its eigenvalues and eigenfunctions are the same for the real and the bootstrap data, i.e. Theorem

4.2 below is trivially satisfied in the regression context. Obviously, for functional autoregressions, those assertions do not hold, and we cannot use the proof of validity of the bootstrap for the regression case at all, but have to use quite different arguments.

The regression and wild bootstrap, considered by (Zhu and Politis, 2017) respectively (Raña et al., 2016) for nonparametric functional autoregressions, also use , i.e. they do not mimic the whole time series in the bootstrap world but only the local predictor relationship. So, for proofs, they can rely on the same kind of simpler methods as in the case of regression with independent data.

4.1 Bootstrapping the sample mean

In this subsection we investigate the sample mean and its analogue in the bootstrap world

Note that . In the proof, we show that for the bootstrap analogue also holds. Therefore, we have to compare the distributions of and without additional centering. In the next theorem and in the following, we use a common convention and write for the Mallows distance between the marginal distributions of the random variables resp. .

Theorem 4.1.

Under the assumptions of Theorem 3.1 and if satisfies additionally


we have for

The following lemma provides two examples of a sufficient rate condition for depending on the rate of decrease of , It is proven in the same manner as Lemma 3.1.

Lemma 4.1.

a) Let for some . Then, (5) and (6) are satisfied for if, for all large enough ,

b) Let for some . Then, (5) and (6) are satisfied for if

4.2 Bootstrapping the covariance operators

In this section, we show that the bootstrap works for the covariance operator estimates , too. We compare them with their bootstrap analogues

We again consider the Mallows metric, which, for bounded linear operators , we define with respect to the operator norm :

where the infimum is taken over all random operators and with the same marginal distribution as resp. .

Note that

is an unbiased estimate of

as . In the bootstrap world, we have an analogous property asymptotically. More precisely, we show in Lemma 5.4 that . Therefore, we have to compare the estimation error with .

Theorem 4.2.

Under the assumptions of Theorem 4.1, we have for

The theorem, in particular, implies that and, conditional on , have the same asymptotic distribution by Lemma 8.3 of (Bickel and Freedman, 1981).

For the lag-1 autocovariance operator, we have, again from Lemma 5.4, that where denotes the projection onto the span of the first eigenvectors of . So, this provides the appropriate reference point in the bootstrap world if we want to approximate the distribution of the estimation error . More precisely,

Theorem 4.3.

Under the assumptions of Theorem 4.1, we have for

5 Appendix - Technical Lemmas and Proofs

Throughout this section,

denote the projections onto the span of the first orthonormal eigenfunctions resp. empirical eigenfunctions . As the eigenfunctions are only uniquely determined up to their sign, we have to compare later on with where

The first two auxiliary results have been essentially used already by (Mas, 2007). We defer their proofs to the supplement 6.

Lemma 5.1.


Lemma 5.2.

with .

Next we state that the well-known strong consistency of as an estimate of in particular holds under our set of assumptions, and we collect some immediate consequences for reference.

Lemma 5.3.

Let . Under the conditions of Theorem 4.1, we have

a) for .

b) for all large enough ,

c) .


a) The result is a slight modification of Theorem 8.7 of (Bosq, 2000), and the proof is defered to the supplement 6.

b) From a) we immediately have for large enough .

c) First, we note that

The assertion follows from, using b) and ,

for all large enough . ∎


(Theorem 3.1)
Let denote the empirical distribution of . Then, from Lemma 8.4 of (Bickel and Freedman, 1981), we have a.s. Hence it suffices to show that . Let

where is Laplace distributed on , i.e. . The random variables have marginal distributions respectively . As in the proof of Theorem 3.1 of (Kreiss and Franke, 1992), we have from the definition of the Mallows metric

From the law of large numbers for i.i.d. random variables we have

such that the second term on the right-hand side vanishes for . For the first term, we show in the following parts a)-c) of the proof

where does not depend on , and . Hence, for ,

as, by Corollary 6.2 of (Bosq, 2000), , and, by stationarity of

for , using a monotone convergence argument and .

a) By definition of , we have

using . We now show that the first and the second terms are bounded in the required manner.

b) We split into two terms

As are orthonormal, we have for the second term

where the right hand side converges to 0 in probability, as, from the remarks after Theorem 16.1 of (Horváth and Kokoszka, 2010) and (5)

For the first term, we have, as ,

where again the right hand side converges to 0 in probability as, from above,

c) Using Lemma 5.2, we have

using the Cauchy-Schwarz inequality. Moreover, as and ,

as, from the remarks after Theorem 16.1 of (Horváth and Kokoszka, 2010), we have , and from Theorem 3 of (Mas and Pumo, 2009), analogously