Functional time series prediction under partial observation of the future curve

06/01/2019 ∙ by Shuhao Jiao, et al. ∙ University of California-Davis 0

Providing reliable predictions is one of the fundamental topics in functional time series analysis. Existing functional time series methodology seeks to predict a complete future functional observation based on a set of observed functions. The problem of interest discussed here is how to advance prediction methodology to cases where partial information on the next trajectory is available, with the aim of improving prediction accuracy. The proposed method combines "next-interval" prediction and fully functional regression prediction, so that the partially observed part can aid in producing a better guess for the unobserved part of the future curve. An automatic selection criterion based on minimizing the prediction error helps select unknown tuning parameters. Simulations indicate that the proposed method can outperform existing methods with respect to mean-square prediction error of the unobserved part, and its practical usefulness is illustrated in an analysis of environmental and traffic flow data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Functional data are often collected in natural consecutive time intervals, such as days, weeks and years, where similar behavior is expected. The collected functions may be described by a time series , denoting the integers, with observations in the sequence being random functions for taking values in some domain , here taken to be the unit interval . The object will be referred to as a functional time series. Interest in this new method arises from the consideration of the dynamic features of functional time series data. It is natural to have partial observations of the future curve available. With this partial observation, improving prediction accuracy of the unobserved part becomes possible.

Complete curve prediction has been discussed in recent decades. The existing methods focus often on the Functional AutoRegressive model of order

, FAR(

), model. Bosq (2000) derived one-step ahead predictors based on a functional form of the Yule–Walker equations for FAR(

) processes. Besse, Cardot, and Stephenson (2000) proposed non-parametric kernel predictors. Antoniadis and Sapatinas (2003) studied FAR(

) curve prediction based on linear wavelet methods. Kargin and Onatski (2008) introduced the predictive factor method, which seeks to replace functional principal components with directions most relevant for predictions. Didericksen, Kokoszka, and Zhang (2012) evaluated several competing prediction models in a comparative simulation study, finding Bosq’s (2000) method to have the best overall performance. Aue, Dubart Norinho, and Hörmann (2015) proposed a method that deals with functional time series prediction in a multivariate way, together with a final prediction error criterion to select the order of FAR process and the dimension of the auxiliary VAR model. As for fully functional regression models, Wang, Chiou and Müller (2016) have discussed functional regression models with both response and predictors variables as functions. Fully functionally regression methodology has been considered in providing updated time series prediction in Chiou (2012), who proposed a functional mixture method for predicting traffic flow. The proposed method is a combination of fully functional regression with functional clustering and discrimination. Shang (2017) also considered the fully functional regression method, together with moving block method, to update functional time series predictions.

Figure 1 shows a specific case where the proposed method will work. The figure presents one week’s PM10 data, where the trajectories from Monday to Saturday and part of Sunday’s trajectory are observed, and interest is in predicting the unobserved part of Sunday’s trajectory highlighted by the dotted grey curve. The details of this data example are given in Section 5 below.

Figure 1: One week’s PM10 concentration. The dotted grey line represents the unobserved part of Sunday’s trajectory.

In contrast to complete curve prediction, our method aims to give partial predictions. Compared with the complete curve prediction, our method adds flexibility, since it can update the prediction according to different times of day, and the prediction error over the forecasting time interval should then be smaller. The predicted curve can be smoothly connected to the partially observed curve if the eigenvalues of the covariance function decay at a suitable rate. This is important in practice, since if we know the partial observation ends at a certain value, then the beginning value of predicted curve should not deviate from that value, and our method will produce reasonable prediction.

The prediction algorithm is a stepwise procedure and can be summarized as follows. For smoothed trajectories, we decompose the observations into two parts:

where is the signal function, and is the independent innovation function. For , assume that has already been observed and that the goal is to predict . To do this, we first use functional time series methodology to calculate the fitted functions for all , and obtain the residual functions . We then separate the residual functions into two segments and at the current time , and fully functionally regress the unobserved on the observed . The fitted function is then used to update the prediction of the unobserved part of the innovation function of the current curve. The final prediction is proposed to be the summation of predictions at each step.

In noisy case, we further decompose the observations into three parts:

Besides the two aforementioned stages, we propose one more step to extract the time series information in the random error , which represents short-term dynamics. In this article, we will discuss the following: (1) How well does the proposed method perform, compared with “next interval” prediction method and fully functional regression method? (2) How to select the tuning parameters? (3) How to adjust the method such that it will still produce decent and reasonable prediction for noisy data?

The paper is organized as follows. In Section 2, we describe the functional time series prediction methodology proposed by Aue et al. (2015), and discuss the fully functional linear model, and its application to intra-day prediction. We also propose a data-driven criterion of parameter selection for the prediction by fully functional regression model. Section 3 gives the prediction algorithm for both smooth and noisy functional time series. Section 4 shows simulation results, including the prediction MSEs of various methods, the result of order and dimension selection, and nonparametric bootstrap prediction intervals. Real data analyses on PM10 concentration curves and traffic flow trajectories are shown in Section 5.

2 Functional Autoregressive Model and Fully Functional Regression Model

2.1 Preliminaries

Let be an arbitrary stationary functional time series satisfying the following assumptions:

  • All random functions are defined on some common probability space

    . The notation is used to indicate that, for some , . We assume the observations are elements of the Hilbert space equipped with the inner product . Each is a square integrable function satisfying .

  • Any possesses a mean curve , and any possess a covariance operator , defined by , equivalently, . By spectral decomposition, we have the following expression of ,

    where are the eigenvalues (in strictly descending order) and

    the corresponding normalized eigenfunctions, so that

    and . To satisfy the condition of Mercer’s theorem, we usually assume the covariance operator to be continuous.

  • The form an orthonormal basis of . Then by the statement of Karhunen–Loève theorem, allows for the representation

    The coefficients in this expansion are called the fPC scores of . Suppose now that we have observed . In practice as well as

    and its spectral decomposition should be unknown and need to be estimated from the sample. We estimate

    pointwise by

    and the covariance operator by

2.2 Multivariate technique of predicting FAR() process

There are many works on prediction of functional time series. Among them, Aue et al. (2015) proposed a dimension-reducing method for the prediction of stationary functional time series, which can be easily implemented and provide competitive prediction.They also propose a model selection criterion, fFPE criterion, to determines both the order of the FAR model and the dimension of the auxiliary projected eigenspace. The FAR(

) process is defined by the stochastic recursion

where are centered, independent and identically distributed innovations in and are bounded linear operators. We can represent an FAR() process in the state space form (Bosq (2000)),

(2-1)

The operator matrix in (2-1) is represented by , and the elements and mean the identity operators and zero operators on , respectively. Then should satisfy for some . This condition ensures that the time series process has a strictly stationary and causal solution in .

The prediction algorithm of Aue et al. (2015)’s method proceeds in three steps.

Step 1

Select the number of principal components for the observed curves. With the sample eigenfunctions, empirical fPC scores can now be computed for each observation ,

. Then we have the fPC score vectors for the

th observation

By nature of fPCA, the vector contains most of the information on the curve .

Step 2

Fix the prediction lag . Then find a multi-dimensional time series model for the eigenscore vectors to produce the -step ahead prediction

Durbin–Levinson and innovations algorithm can be readily applied here, given the vectors .

Step 3

The multivariate predictions are retransformed to functional trajectories. This retransformation is achieved by defining the truncated Karhunen–Loève representation

Based on the predicted fPC scores and the estimated eigenfunctions , the resulting is then used as the -step ahead functional prediction of .

2.3 Fully Functional Regression Model

In a fully functional regression model, both predictors and responses are functions. Here we use multivariate technique after projection to do the estimation for the regression model. Suppose we have random predictors and independent response functions . Denote their mean functions by and , and their covariance functions by , . The Karhunen–Loève expansions for the trajectories and are

where ’s and ’s (’s and ’s) are the fPC scores and eigenfunctions of (). The fully functional linear regression model with response function and predictor function can be written as

where ’s are independent error functions, and the bivariate regression kernel is assumed to be continuous and square integrable, as a consequence, .

This kernel function indicates which parts of the predictor trajectory has positive contribution or negative contribution to the response function . Under the given assumptions, has the basis representation

For simplicity, in the following we will assume the mean function of ’s and ’s are both zero. Replacing and with their Karhunen Loéve representation, we have

For arbitrary , taking the inner product with on both sides, we have

where . In practice, we only adopt the first fPCs of predictors, so we consider the following equation

(2-2)

where . Equation (2-2) resembles a multivariate regression model. Therefore, the estimation of can be obtained by fitting a regression model to the -dimensional eigenscore vectors of the responses against the -dimensional eigenscore vectors of the predictors as presented in (2-2).

So we can first estimate the eigenscores ’s and ’s, and then estimate ’s by fitting multiple multivariate linear regression models. From prediction perspective, we can first predict the eigenscores of , and obtain the predicted curve by truncated Karhunen-Loève expansion .

2.3.1 Intra-day prediction with functional regression

Without loss of generality, let denote a random function in with mean zero. In a regression setting for intra-day prediction, the sub-curve serve as the predictor function, and the sub-curve serve as response function. The Karhunen–Loève expansions of the two functional variables are

where the notation , , and are defined analogously to those on the entire domain , but they correspond to the sub-domains or .

We consider a fully functional linear regression model

Here, given a fixed value of , assume the bivariate regression function is continuous and square integrable, consequently, . By the discussion in section, the functional regression model can be expressed as

where are the regression parameters to be estimated. Under the continuity assumption on along with , it follows that is also continuous in for all and . In the following section, we will introduce a novel criterion that allow us to jointly select the number of principal components for predictors and responses.

2.3.2 Dimension selection for fully functional regression model

Typically, we will project the functional objects into a finite dimensional space spanned by the first few principal components. The number of principal components are selected such that the proportion of variance explained exceeds a certain threshold such as 90%. However, our purpose is prediction, so it is not always appropriate to select principal components that explain a large portion of variance. So we consider a new criterion for selecting the best dimensions of eigenfunction spaces of predictors and responses. Motivated by Aue et al. (2015), we propose to choose the dimensions by minimizing the mean square error of prediction.

Without loss of generality, assume predictors ’s and responses ’s be elements in with mean function zero and covariance operator resp . Suppose the dimension of eigenfunction space of the predictors is , that of the responses is , and is the prediction of by the regression model, then by orthonormality of eigenfunctions, the MSE of prediction can be decomposed as

where is the truncated eigenscore vector of the curve to be predicted, is the prediction of , and is the th eigenvalue of , and denotes the Euclidean norm.

Let be the truncated eigenscore vector of the predictors, then by the discussion above, there exists a matrix , such that , where with , where is the th eigen-function of . We assume the covariance matrix of to be .

Therefore,

Let , , , , and be its least square estimator. Then we have , or equivalently,

where and . Thus, the multivariate GLS of is given by minimizing , where is the covariance matrix of . Therefore, the GLS estimator of is

and we also have

Now we want to study the asymptotic property of , following the above equation, we have

By the weak law of large number, we have

, where is the covariance matrix of

’s. By the central limit theorem,

Then by Slutsky’s theorem,

(2-3)

As for the first term in equation (2-3), it can be assumed that and are independent, since asymptotically the sample size will go to infinity, and is based on the whole sample, so the dependence between and is negligible. Then by the independence and (2-4),

where means . It can be shown .
Therefore, we have

Replacing with , and with as , we have the fFPE criterion for fully functional regression model shown as follows:

Then it is natural to propose to choose and by minimizing the above objective function. The following theorem shows the consistency of the criterion.

Theorem 1. Suppose , are two series of -m approximable (Hörmann and Kokoszka (2010)) random functions satisfying and for some , serving as predictor and responses in a fully functional regression model

and is the prediction of based on and , and is the prediction of based on and and , . Then if for arbitrary , we have

3 Methodology

We know the following decomposition framework for smooth trajectories,

where is the signal correlated to the previous curves, and is the innovation function independent with the previous curves. Further, if the observed curves are contaminated by random noise, we can decompose the functional time series into three parts:

where represents random error. In practice, the observations are available only at pre-specified discrete grids, so here we use instead of . We propose a stage-wise procedure, where each stage corresponds to predicting one component, and combine them to obtain the final prediction.

3.1 Smooth case

For any function , the trajectory over is denoted by , and the trajectory over is denoted by . Suppose we have observed , and . The updated prediction of the curve over is given by

where is the “next-interval” prediction of and the intra-day prediction of the th innovation function over sub-domain .

To predict , we consider a fully functional regression model, where serve as the predictors, and serve as the responses,

By Karhunen–Loève expansion,

The innovation function is unknown, so we apply the functional regression model to the residual in the first step , where is the full-curve prediction. Replacing the unknown terms with the estimated values, and adopting the first and PCs for predictors and responses respectively, we have

where . is the prediction of given . Then the final prediction of is

The updated prediction can be regarded as the original prediction adjusted by the intra-day prediction of the block of the residual function . The prediction steps can be summarized by the following algorithm.

Step 1

Fix , and , and apply functional time series prediction (such as Aue et al. (2015)), to obtain the -step ahead prediction for .

Step 2

Obtain the prediction residual functions ’s for a training group , where the window size for the prediction of each curve in the training group is .

Step 3

Separate the prediction residual functions in Step 2 at “current time” . Treat the first parts as the predictors, and the second parts as the responses. Fix and , and apply intra-day functional regression model on the second segments against the first segments , and use the fitted model to obtain the prediction of the block of the ’s residual function .

Step 4

Add the segment of the complete predicted curve and the predicted block of the residual function to get the final prediction .

3.2 Noisy case

In this section, we consider functional data as noisy sampled points from a collection of consecutive trajectories. In practice, the observed functional time series is observed at a discrete time grid, thus the observed curves can be rough. The reasons may be measurement errors or sparsely-spaced observation time grids. As has been discussed by Yao et al. (2005), the rough error term will lead to biased FPC scores, so we need to prevent the problem. In practice, we can use some smooth basis functions to smooth the raw trajectories. However, in the random error , which is not smooth, there could still exist short-term time series correlation, so we need one more step to extract the information left in the pre-smoothing residuals. Because the time dependency in the random error usually decays very fast as lag increases, we can only expect reasonable predictions for the near future.

As has been discussed, we decompose any functional time series into three parts,

where is the smooth signal from the smooth part of the past time series observations, is the independent smooth innovation function, and is the random error of the functional time series.

Let represent the smooth part of the functional time series, which can be predicted by functional methodology, while is the rough part. If there is time series correlation in this process, it can be predicted by ARMA model. Here we apply ARMA model to the pre-smoothing residual since the random error is unknown. Then we add two more steps to the previous algorithm for noisy trajectories:

Step 5

Apply ARMA model to the smoothing residuals, to predict the future residuals .

Step 6

Combine the prediction of the smooth part and pre-smoothing residuals to obtain the final prediction.

The final prediction for is , where , is ARMA prediction of . This adjustment is necessary if the raw functional time series curves are significantly rough and time series structure in ’s is strong. The prediction of the smooth part can be also viewed as a de-trending process.

3.3 Selection of

The selection of the unknown parameters can be based on the fFPE criterion, the order and dimension of projected eigenspace at the first stage will influence the covariance function of the residual functions, which will further influence the prediction in the second stage. Therefore, and can be regarded as functions of and , so we can jointly select by minimizing the following objective function

With the use of this functional FPE criterion, the proposed methodology is fully data-driven and we do not need additional tuning parameter adjustment.

4 Simulation

4.1 General setting

To analyze the finite sample properties of the new prediction method, a comparative simulation study is conducted. The proposed method is tested on simulated FAR models. In each simulated test, 400 curves were generated. Beginning from the first curve, the following consecutive 200 trajectories were used as the training group to obtain the residual function of the one-step ahead prediction. Then we switched the training group with the same number of functions in a sliding window way, to obtain the prediction residual function for the next curve. Finally we had 200 estimated prediction residual functions, among which the first 180 functions were fitted by an intra-day functional regression model, which is used to predict the unobserved block of the rest 20 curves. The corresponding mean square error of prediction is computed, as well as the fFPE value for comparison. This procedure is repeated for 100 times for each simulation run.

In the simulation, we worked in -dimensional functional space , which is spanned by Fourier basis functions on the unit interval . An arbitrary elements in has the representation with coefficients . Then for any linear operator , we have , where is a matrix with elements . The linear operator used to generate FAR model then can be represented in matrix form. The innovation function is generated by , where ’s are

normal random variable with mean zero and standard deviation

. Two sets of standard deviations used here are and .

4.2 Prediction comparison for smooth curves

In this section, we show the comparison of our new method with Aue et al. (2015)’s method and intra-day functional regression method on FAR(2) processes