Functional Lagged Regression with Sparse Noisy Observations

A (lagged) time series regression model involves the regression of scalar response time series on a time series of regressors that consists of a sequence of random functions (curves), also known as a functional time series. In practice, the underlying regressor curve time series are not always directly accessible, and often need to be treated as latent processes that are observed (sampled) only at discrete measurement locations. In this paper, we consider the so-called sparse observation scenario where only a relatively small number of measurement locations have been observed, indeed locations that may be different for each curve. The measurements can be further contaminated by additive measurement error. A spectral approach to the estimation of the model dynamics is considered. The spectral density of the regressor time series and the cross-spectral density between the regressors and response time series are estimated by kernel smoothing methods from the sparse observations. The estimation of the impulse response regression coefficients of the lagged regression model is regularised by means of the ridge regression approach (Tikhonov regularisation) or the PCA regression (truncation regularisation). The latent functional time series are then recovered by means of prediction, conditioning on all the observed observed data. A detailed simulation study investigates the performance of the methodology in terms of estimation and prediction for a wide range of scenarios.

Authors

• 5 publications
• 16 publications
11/15/2018

Sparsely Observed Functional Time Series: Estimation and Prediction

Functional time series analysis, whether based on time of frequency doma...
07/06/2020

Yield curve and macroeconomy interaction: evidence from the non-parametric functional lagged regression approach

Viewing a yield curve as a sparse collection of measurements on a latent...
06/14/2017

A Modularized Efficient Framework for Non-Markov Time Series Estimation

We present a compartmentalized approach to finding the maximum a-posteri...
06/01/2019

Functional time series prediction under partial observation of the future curve

Providing reliable predictions is one of the fundamental topics in funct...
03/04/2019

Multiscale clustering of nonparametric regression curves

In a wide range of modern applications, we observe a large number of tim...
07/11/2021

A prediction perspective on the Wiener-Hopf equations for discrete time series

The Wiener-Hopf equations are a Toeplitz system of linear equations that...
07/31/2018

Modeling joint probability distribution of yield curve parameters

US Yield curve has recently collapsed to its most flattened level since ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A (lagged) time series regression model is perhaps the most basic –and certainly one of the most and longest studied (kolmogoroff1941interpolation; wiener1950extrapolation)– forms of coupled analysis of two time series. In a general context, given two discrete time stationary time series and

, the input (or regressor) and output (or response), valued in some vector spaces

and , such a model postulates that

 Zt=a+∑k∈ZBkXt−k+et,t∈Z,

for some constant , a sequence of random disturbances valued in and a sequence of linear mappings . This linear coupling would be the typical dependence model, for instance, if were a jointly Gaussian stationary process in , and is also known as a time-invariant linearly filtered time series model. The estimation problem is, then, to estimate the unknown transformations given the realisation of a finite stretch of the joint series (a problem also known as system identification, particularly in signal processing).

This problem is very well understood and has been extensively studied in the classical context where the spaces coincides with the Euclidean spaces of potentially different dimensions (brillinger1981time; priestley1981spectral; shumway2000time). Nevertheless, generalising these results to the case where either space, and particularly the regressor space , may be an infinite dimensional vector space is far from straightforward, and has been comparatively much less studied. The difficulty in this case is that one needs to manipulate operations involving the inverses of compact or even trace-class operators, which fail to exist boundedly on the entire codomain. Consequently, analysis of such models requires drawing on tools from functional analysis, and developing novel methodology that incorporates suitable regularising schemes. Such a setting has only recently been considered for functional time series regression models, for example by hormann2015estimation, who treat the problem of estimation of the filter coefficients by means of spectral truncation regularisation (PCA regression), and pham2018methodology, who deduce convergence rates for the estimated coefficients when using Tikhonov regularisation (ridge regression).

While this may seem to be an artificial abstraction at first sight, such time infinite-dimensional time series are becoming increasingly prominent for applications, since the abstraction of an infinite dimensional (Hilbert) space, captures the scenario where the value of a series at each time is a square-integrable function, e.g. a curve. Examples include time series of DNA minicircles evolving in solution (seen as a time series of closed curves in 3D indexed by discrete time, see e.g. tavakoli2016detecting) or the data constructed by dividing a continuously observed scalar time series into segments of an obvious periodicity, usually days. Examples of the latter form are particularly prominent in environmental applications, for example the analysis of particulate matter atmospheric pollution (hormann2010weakly; hormann2015dynamic; hormann2016detection; aue2015prediction), traffic data modelling (klepsch2017prediction), or financial applications of intra-day trading (muller2011functional; kokoszka2017dynamic). Another promising financial application of functional time series emerges in the yield curve modelling (hays2012functional; kowal2017functional; sen2019timeseries).

We focus here on the case , i.e. when the response process is scalar. This is indeed the case that is most often studied in the literature, due in large part due to the many examples it covers, but also because it is the simplest version of the problem that captures the essence of higher generality: the difficulty in estimating the filter coefficients lies in the ill-possessedness of the spectral density operator inversion that requires regularisation. For a scenario involving being infinite dimensional, i.e. with a functional response, see hormann2015estimation confirming this difficulty.

In practice, one can never observe the regressor time series in its “Platonic” continuum form. For example, if is a space of functions on and each is a curve, then one might be able to observe evaluations of at various locations of its domain. In some cases, the sampled locations are sufficiently dense, and the measurement instrumentation sufficiently precise to be free of any additional noise contamination, so that one can disregard the effects of sampling. This is essentially the approach taken in hormann2015estimation and pham2018methodology where the regressors are treated as being fully observed as elements of . However, it may well happen that is only measured at few and randomly positioned locations in its domain, indeed varying with time, and that the measurements are themselves contaminated by noise. That is, instead of observing for all in , we instead observe

 Ytj=Xt(xtj)+ϵtj,j=1,…,Nt,t=1,...,T.

for a sequence of point processes

, independent in time, and a white noise measurement error sequence. This observation scheme is illustrated on Figure

1 and is exhibited, for example, when dealing with fair weather atmospheric electricity data (rubin2018sparsely; tammet2009joint).

Such a setting escapes both the methods and the theory developed at the level of continuum, and instead requires methodology that accounts for the observation scheme, as well as theory that incorporates the latent/emission processes explicitly. The purpose of this paper is precisely to construct such a methodology and develop the associated asymptotic theory.

Our approach consists in estimating the complete space-time covariance structure of the data by estimating the spectral density operator of the regressor time series, and the cross-spectral density operator between the regressor and response time series. Our estimators are based on the kernel smoothing methods (yao2005functional; yao2005functional_linear_regression; hall2006properties; li2010uniform; rubin2018sparsely). Once these are estimated, one obtains estimating equations whose solution yields the estimators of the filter coefficients. The solution of the estimating equations, however, comprises an ill-posed inverse problem, hence regularisation is required. We offer two regularisation techniques, spectral truncation regularisation (hormann2015estimation) and Tikhonov regularisation (pham2018methodology). The forecasting of the response process is then implemented by first predicting the latent functional regressor data (using their estimated spectral characteristics) and then plugging-in these predictions into the estimated lagged regression model.

Sparsely observed functional time series have only recently received attention (kowal2017functional; kowal2017bayesian; rubin2018sparsely; sen2019timeseries), and our results appear to be first in the context of the lagged regression model where the regressor process is functional. A related problem of dynamic function-on-scalar regression was studied by kowal2018dynamic by the means of Bayesian factor models.

Our presentation is organised as follows. Section 2 defines the framework of sparsely observed functional time series as well as the functional lagged regression model and its analysis in the spectral domain. The section explains the estimation methodology for the model components, the spectral and the cross-spectral densities, and the regularised estimator of the filter coefficients of the lagged regression. Furthermore, the forecasting algorithm is introduced. Section 3 presents the asymptotic results of the proposed method: the consistency and the convergence rate are provided. Section 4 verifies the finite sample properties of the proposed methodology on a simulation study. The proofs of the formal statements are presented in Section 5.

2 Model and Estimation Methodology

2.1 Functional Time Series Regression Model

The regressor functional time series is a sequence of smooth random functions defined on the interval . The random functions are treated as random elements in the Hilbert space . Additionally, the regressor time series is assumed to be second-order stationary in the variable . We define the mean function , the lag- autocovariance kernels and the corresponding lag- autocovariance operators defined by the right integration for . The autocovariance kernels and operators are assumed to be summable in the supremum norm (denoted ) and the nuclear norm (denoted ) respectively,

 ∑h∈Z∥RXh∥∞=∑h∈Zsupx,y∈[0,1]|RXh(x,y)|<∞,∑h∈Z∥RXh∥1=∑h∈Ztrace{√(RXh)∗RXh}<∞. (A1)

The response time series is considered to be scalar. The sequence of filter coefficients consists of unknown fixed (deterministic) continuous linear functionals . By the Riesz representation theorem, there exist functions such that for all .

Throughout the paper it is assumed that the filter coefficients are summable in both supremum norm and the vector norm

 ∑k∈Z∥bk∥∞=∑k∈Zsupx∈[0,1]|bk(x)|<∞,∑k∈Z∥Bk∥H<∞. (A2)

The response time series is then modelled by the lagged regression equation

 Zt=a+∑k∈ZBkXt−k+et (2.1)

where is a constant, called the intercept, and

is a sequence of independent identically distributed zero-mean real random variables with variance

. Our assumptions imply that the response time series is also stationary.

We define the cross-covariance kernel and the cross-covariance operator between the time series and the time series as

 RZXh(x)=E[ZhX0(x)],x∈[0,1],RZXh=E[ZhX0]∈H.

We make here a few simplifying assumptions concerning the first order structure of the data, to keep focus on the more challenging second order structure. Firstly, we assume that the regressor time series is centred, i.e. . If that was not the case, the mean function could be estimated by the kernel smoother method introduced by rubin2018sparsely. Secondly, we assume that the intercept is null, i.e. . Otherwise it could be estimated using the relation where could be estimated by the sample mean and the coefficients by the methods of this paper. As a consequence of the assumption we note that the response time series is also centred, i.e. .

2.2 Observation Scheme

We now describe the sparse observation scheme for the regressor functional time series . Let be the -th measurement on the -th curve at spatial position , where and is the number of measurements on the curve for . The additive measurement errors are denoted by and are assumed to be independent identically distributed realisations of a mean and variance random variable. Furthermore, the measurement errors are assumed to be independent of as well as the measurement locations . The observation model can be then written as

 Ytj=Xt(xtj)+ϵtj,j=1,…,Nt,t=1,...,T. (2.2)

The spatial positions as well as their number are considered random and concrete conditions on their distributions are given in Section 3, in the context of our asymptotic theory. The response time series is considered to be observed is the same time horizon, hence the observations are available to us.

2.3 Spectral Analysis of Functional Lagged Regression

Under the condition (A1), panaretos2013fourier defined the spectral density kernels and spectral density operators

 fXω(⋅,⋅)=12π∑h∈ZRXh(⋅,⋅)e−iωh,FXω=12π∑h∈ZRXhe−iωh, (2.3)

for . The sums in (2.3) converge converge in the supremum norm and the nuclear norm respectively. Furthermore, the spectral density operator is a non-negative, self-adjoint trace-class operator for all . Hence it admits the spectral representation

 Fω=∞∑m=1λωmφωm⊗φωm (2.4)

where

is the tensor product in

. The elements of the sequence

are called harmonic eigenvalues and the corresponding functions

are called harmonic eigenfunctions.

Furthermore, the lagged autocovariance kernels and operators can be recovered by the inversion formula (panaretos2013fourier) that holds in the supremum and the nuclear norm, respectively:

 RXh(⋅,⋅)=∫π−πfXω(⋅,⋅)eiωhdω,RXh=∫π−πFXωeiωhdω. (2.5)

Under the assumptions (A1) and (A2), hormann2015estimation defined the cross-spectral density operator between and by the formula

 FZXω=12π∑h∈ZRZXhe−iωh,ω∈[−π,π], (2.6)

and showed that this sum converges in the Hilbert-Schmidt norm, which is equal to the vector norm for the scalar response case as in the setting of this paper. Using similar ideas, one can define the cross-spectral density kernel between and by

 fZXω(⋅)=12π∑k∈ZRZXh(⋅)e−iωh,ω∈[−π,π]. (2.7)

The sum on the right-hand side of (2.7) converges in the supremum norm.

Further, hormann2015estimation introduced the frequency response operator

 Bω=∑h∈ZBke−ihω,ω∈[−π,π].

and obtained the relation between the spectral density operators, cross-spectral density operators, and the frequency response operators

 FZXω=BωFXω,ω∈[−π,π], (2.8)

which provides the basis for the estimation of the filter coefficients introduced in the next section. In our case of scalar response, the are in fact functionals, i.e. . The filter coefficients can be recovered by the formula

 Bk=12π∫π−πBωeihωdω,k∈Z. (2.9)

2.4 Nonparametric Estimation of the Model Dynamics

Given the sparsely observed measurements of the underlying regressor functional time series we are able to estimate its second order dynamics using the methods derived by rubin2018sparsely.

We work with the “raw” covariances defined as for . Because of the independence of the additive measurement error , this error “contaminates” only the diagonal of lag- covariance kernel, hence

 E[GXh,t(xt+h,j,xtk)|xt+h,j,xtk]=Rh(xt+h,j,xtk)+σ21[h=0,j=k]

where if the condition in the subscript is satisfied, and is equal to otherwise.

The lag- covariance kernel is estimated by the locally linear surface smoother on over the pooled lag- raw covariances when the diagonal is removed. Specifically, we set where

 (^b0,^b1,^b2)=argminb0,b1,b2T∑t=1∑j≠kK(xtj−xBR)K(xtk−yBR){GX0,t(xtj,xtk)−b0−b1(x−xtj)−b2(y−xtk)}2

and where is a symmetric smoothing kernel with a bandwidth parameter. Throughout this paper we work with the Epanechnikov kernel for , and otherwise, however any other usual choice of a smoothing kernel would be appropriate.

The diagonal of the lag- covariance kernel with the ridge contamination is estimated by the local linear smoother over the lag- “raw” covariances on the diagonal. Hence we set where

 (^c0,^c1)=argminc0,c1T∑t=1Nt∑j=1K(xtj−xBV){Ytj−c0−c1(x−xtj)}2. (2.10)

The lag- autocovariance kernel for can be estimated similarly to (2.10) if the lag- “raw” covariances are smoothed and the diagonal is no longer required to be removed.

The estimation of the diagonal of the lag- covariance kernel without the ridge contamination is performed by the local quadratic smoother along the direction perpendicular to the diagonal (yao2003shrinkage; yao2005functional) setting where

 (¯c0,¯c1,¯c2)=argminc0,c1,c2T∑t=1∑j≠kK(xtj−xBR)K(xtk−xBR)××{GX0,t(xtj,xtk)−c0−c1(x−P(xtj,xtk))−c2(x−P(xtj,xtk))2}2

and is the first coordinate (which is the same as the second one) of the projection of the point onto the diagonal of . Having the estimates and , the measurement error variance is estimated by integrating the difference

 ^σ2=∫10(^V(x)−¯R0(x))dx. (2.11)

The spectral density kernels are estimated by the Bartlett’s approach (hormann2015dynamic) to weight down higher lags using the Barlett’s (triangular) weights. For a fixed , the spectral density kernel at frequency and point is estimated as

 ^fω(x,y)=L2π^d0

where is obtained by minimizing the following weighted sum of squares

 (^d0,^d1,^d2)=argmin(d0,d1,d2)∈C31TL∑h=−Lmin(T,T−h)∑t=max(1,1−h)[j≠k if h=0]Nt+h∑j=1Nt∑k=1∣∣GXh,t(xt+h,j,xtk)e−ihω−−d0−d1(xtt+h,j−x)−d2(xtk−y)∣∣2Wh1B2RK(xtt+h,j−xBR)K(xtk−yBR). (2.12)

Having estimated the spectral density , whose operator counterpart is denoted , enables us to estimate all autocovariance kernels and operators by the inversion formula

 ~Rh(⋅,⋅)=∫π−π^fω(⋅,⋅)eihωdω,~Rh=∫π−π~Fωeihωdω. (2.13)

In the following paragraphs we extend these kernel smoothing techniques to estimate the cross spectral density between the response time series and the regressor time series .

Define the raw lagged cross covariances where , , and . Since we may use the raw lagged cross covariances as a basis for estimation of the cross-spectral kernels.

Consider the Bartlett’s weight function for and otherwise where is Bartlett’s span parameter controlling the amount of regularisation involved in the estimation of the spectral density. Again making use of the local linear kernel smoothing techniques, we estimate the cross-spectral density at frequency as

 ^fZXω(x)=L2π¯d0∈C (2.14)

where is realized as the minimizer of the following weighted sum of squares

 (¯d0,¯d1)=argmin(d0,d1)∈C21TL∑h=−Lmin(T,T−h)∑t=max(1,1−h)Nt∑j=1∣∣GZXh,t(xtj)e−ihω−d0−d1(xtj−x)∣∣2WhBCK(xtj−xBC). (2.15)

The solution to the complex minimization problem (2.15) can be found explicitly. The formula, which is given in Section 5.3, consists of a handful frequency independent terms, that can be pre-calculated, and thus enables the computationally feasible evaluation of the estimator (2.14) even on a dense grid of frequencies.

Once the estimates of the spectral density and the cross-spectral density have been constructed, we turn the attention to the estimation of the frequency response operators

. Heuristically, from relation (

2.8), we would like to write . This formula is indeed only heuristic because the operator , being trace class, is not boundedly invertible. The same issue is present also for its empirical counterpart . Therefore, to achieve consistent estimation, a regularisation of the inverse is required.

 ^Fω=∞∑j=1^λωj^φωj⊗^φωj

which can be viewed as the empirical version of (2.4). The difficulty in inverting can be seen from the fact that , implying that decays at least as fast as , . It is the small values of that cause problems and there are two classical strategies how to overcome the issue:

1. Spectral truncation. The inverse is formally replaced by

 KωT∑j=11λωj^φωj⊗^φωj

where is the spectral truncation parameter that needs to grow to infinity sufficiently slowly to allow for the consistency. It may or may not depend on the frequency .

The estimator of the spectral transfer function becomes

 ^Btruncω=KωT∑j=11λωj⟨^φωj,⋅⟩^FZXω^φωj,ω∈[−π,π]. (2.16)

This approach was adopted by hormann2015estimation.

2. Tikhonov regularisation. Here, the inverse is formally replaced by

 (^FXω+ρTI)−1=∞∑j=11λωj+ρωT^φωj⊗^φωj

where is the identity operator on and the Tikhonov regularisation parameter tends to zero slowly enough to allow for the consistency. Even though the parameter may depend on , usually Tikhonov regularisation penalises all small values of equally.

The estimator of the spectral transfer function becomes

 ^BTikhω=^FZXω(^FXω+ρTI)−1=∞∑j=11λωj+ρωT⟨^φωj,⋅⟩^FZXω^φωj,ω∈[−π,π]. (2.17)

This form of regularisation is discussed in detail by pham2018methodology.

Once the estimate of the spectral transfer operator have been established by either of the above regularisation techniques, the filter coefficients are estimated by

 ^Btrunck =12π∫π−π^Btruncωe−iωkdω,k∈Z, (2.18) ^BTikhk =12π∫π−π^BTikhωe−iωkdω,k∈Z. (2.19)

2.5 Forecasting the Response Process

In Section 2.2 we assume the data to be available up to time for both the regressor time series as well as the response time series

. It may very well happen, though, that the measurement of the response variable is terminated at a sooner time

where but the measurements of the regressor time series carry out further until time . In this case it is of interest to forecast the unobserved values of .

It turns out that an essential building block of the response process forecasting algorithm is prediction of the latent regressors time series (rubin2018sparsely) which we outline in the following text. In fact, our presentation covers also the out-of-sample forecasts both before the time and beyond the horizon . Specifically, we assume that we want to predict the functional data

, i.e. the sample padded by

extra functional observations at each side of the considered time span.

Denote the random element representing “stacked” curves of the latent regressors series. Its second order structure satisfies

 Var(X)≡S=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣R0R⊤1R⊤2…R⊤T+2M−1R1R0R⊤1…R⊤T+2M−2⋮⋮⋮⋱⋮RT+2M−1RT+2M−2RT+2M−3…R0⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦∈L(HT+2M). (2.20)

The vector consists of all observed data of the regressor time series where is the total number of observations up to time . The measurement errors are stacked into a vector denoted . Due to independence, where

is the identity matrix of size

. Further define the evaluation operators for each and the stacked censor operator . Hence the observation scheme (2.2) becomes .

The best linear unbiased predictor of given the observed data , denoted as , is

 Π(X|Y)=SH∗(HSH∗+σ2INT1)−1Y∈HT+2M (2.21)

where denotes the adjoint operator. The best linear predictor of each functional datum given the observed data , denoted as , is then given by the projection where is the projection operator for .

The predictor (2.21) requires the knowledge of the unknown dynamics of the regressors time series through autocovariance operators (2.20) as well as the measurement error variance . Instead of these parameters we plug-in the estimated (2.13) and (2.11) and denote these predictors as .

The predictor (2.21) is extremely useful because it serves as a building block for the response time series forecasting algorithm as the next proposition motivates.

Proposition 1.

The best linear unbiased predictor of given data , denoted as , is equivalent to constructing the best linear unbiased predictors of given data , denoted as , for all and then applying the filter coefficients to these predictions. Formally:

 Π(Zs|Y)=∑k∈ZBkΠ(Xs−k|Y),s∈Z.

Proposition 1 justifies the following algorithm for prediction the values of :

1. From the measurements realised on the regressor time series estimate the spectral density kernels and the measurement error variance . Using the formula (2.13), integrate the estimated spectral density to obtain the complete space time covariance of the regressor time series .

2. From the measurements and the observed response times series , estimate the cross-spectral density . Using either the truncation regularisation (2.16) or the Tikhonov regularisation (2.17), estimate the spectral transfer function . By the formula (2.18) or (2.19) integrate the spectral transfer function to obtain the filter coefficients or .

3. Using the methodology explained at the beginning of this section predict the latent functional data from the observables , denoted as . This prediction includes a forecast of a few curves, say , before the time and beyond the time horizon . The number is chosen so that the estimated filter coefficients (or ) for are negligible.

4. For each , construct the forecast (or alternatively ), depending on the chosen regularisation technique.

3 Asymptotic Results

The following conditions ensure the consistent estimation of the spectral density and the cross-spectral density . Besides the conditions on the sampling regime and the asymptotics of the tuning parameters, these conditions require smoothness of the spectral density and a weak dependence of the time series realised by the cumulant-type condition.

1. [label=(B0)]

2. The number of measurements in time are independent identically distributed random variables with law where , and .

3. The measurement locations are independent random variables generated from the density and are independent of the number of measurements . The density is assumed to be twice continuously differentiable and strictly positive on .

4. The autocovariance kernels, , are twice continuously differentiable on for each . Moreover,

 supx,y∈[0,1]∣∣∣∂2∂yα1∂xα2RXh(y,x)∣∣∣

is uniformly bounded in for all combinations of where .

Recall that the 4-th order cumulant kernel of (zero-mean functional time series) (panaretos2013fourier) is defined as

 cum(Xt1,Xt2,Xt3,Xt4)(xt1,xt2,xt3,xt4)=E[Xt1(xt1)Xt2(xt2)Xt3(xt3)Xt4(xt4)]−−E[Xt1(xt1)Xt2(xt2)]E[Xt3(xt3)Xt4(xt4)]−E[Xt1(xt1)Xt3(xt3)]E[Xt2(xt2)Xt4(xt4)]−−E[Xt1(xt1)Xt4(xt4)]E[Xt2(xt2)Xt3(xt3)]

for and . Compare with the 4-th order cumulant of real random variables (rosenblatt2012stationary, p. 36).

1. [resume, label=(B0)]

2. Assume that the 4-th order cumulant kernel of is summable in the supremum norm

 ∞∑h1,h2,h3=−∞supx1,x2,x3,x4∈[0,1]∣∣cum(Xh1,Xh2,Xh3,X0)(x1,x2,x3,x4)∣∣<∞.
3. Assume