Generalised learning of time-series: Ornstein-Uhlenbeck processes

by   Mehmet Süzen, et al.

In machine learning, statistics, econometrics and statistical physics, k-fold cross-validation (CV) is used as a standard approach in quantifying the generalization performance of a statistical model. Applying this approach directly to time series models is avoided by practitioners due to intrinsic nature of serial correlations in the ordered data due to implications like absurdity of using future data to predict past and non-stationarity issues. In this work, we propose a technique called reconstructive cross validation (rCV) that avoids all these issues enabling generalized learning in time-series as a meta-algorithm. In rCV, data points in the test fold, randomly selected points from the time series, are first removed. Then, a secondary time series model or a technique is used in reconstructing the removed points from the test fold, i.e., imputation or smoothing. Thereafter, the primary model is build using new dataset coming from the secondary model or a technique. The performance of the primary model on the test set by computing the deviations from the originally removed and out-of-sample (OSS) data are evaluated simultaneously. This amounts to reconstruction and prediction errors. By this procedure serial correlations and data order is retained and k-fold cross-validation is reached generically. If reconstruction model uses a technique whereby the existing data points retained exactly, such as Gaussian process regression, the reconstruction itself will not result in information loss from non-reconstructed portion of the original data points. We have applied rCV to estimate the general performance of the model build on simulated Ornstein-Uhlenbeck process. We have shown an approach to build a time-series learning curves utilizing rCV.



There are no comments yet.


page 1

page 2

page 3

page 4


Approximate leave-future-out cross-validation for time series models

One of the common goals of time series analysis is to use the observed s...

Approximate leave-future-out cross-validation for Bayesian time series models

One of the common goals of time series analysis is to use the observed s...

Evaluating time series forecasting models: An empirical study on performance estimation methods

Performance estimation aims at estimating the loss that a predictive mod...

Estimating the Prediction Performance of Spatial Models via Spatial k-Fold Cross Validation

In machine learning one often assumes the data are independent when eval...

Computing AIC for black-box models using Generalised Degrees of Freedom: a comparison with cross-validation

Generalised Degrees of Freedom (GDF), as defined by Ye (1998 JASA 93:120...

Preserving Order of Data When Validating Defect Prediction Models

[Context] The use of defect prediction models, such as classifiers, can ...

Combining Parametric Land Surface Models with Machine Learning

A hybrid machine learning and process-based-modeling (PBM) approach is p...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Temporal data is generated in natural and technological systems and their analysis is very common (Hamilton (1994), Richards et al. (2011), Roberts et al. (2013)). Many of the analysis manifest as building time-series models via learning, and evalution of their performances are not trivial tasks, specially in the sparse settings (Süzen and Ajraou (2016)).

Learning curves are utilised in evaluating machine learning algorithms’ relative performances (Perlich et al. (2003)) and cross-validation (Stone (1974), Efron and Gong (1983)) are used in assesing generalization abilities of the model and together with selecting a model (Kohavi (1995)). However, usage of cross-validation for temporal data is not directly practiced for time-series, and only were available for uncorrelated errors and stationary time-series: by shuffling chunks of time-series (Politis and Romano (1994)) or naive k-fold CV for uncorrelated time-series (Bergmeir et al. (2018)). On the other hand learning curves for time-series models are not common and no specialized technique is addressed earlier for this. While, learning curves usually build via reducing the dataset sample size by removing points randomly. This approach can not be used for time-series learning curves.

We propose a technique called reconstructive Cross Validations (rCV), combining standard cross-validation (CV) and out-of-sample (OOS) evaluation of performance by introducing reconstruction of fold from CV, i.e. imputation or smoothing, allowing k-times OOS evaluation. rCV does not require any assumption on error structure and do not use future data to predict past in the main cross-validation procedure.

1.1 Single model versus model selection

Earlier literature (Stone (1974)

) signifies building a single model in cross-validations. It implies the learning algorithm builds a single model, a single parametrisation such as weights of a neural networks, with rotations of the data split, i.e., so called k-folds, in the core optimisations. However, building a different model or obtaining different parametrisation of the same model in k-folds (

Kohavi (1995)) are introduced later to exercise model selection. Later practice is now used in mainstream machine learning libraries that a k-fold cross-validation would produce k different parametrisation of the same model. Similarly, in rCV, we build

-parametrisations of the model in supervised learning setting. Extending, this to earlier single model cross-validation must be obvious, however we may need to change how underlying solvers interact with data in optimisation phases and modify rCV for a single model build.

2 Proposed techniques

Our contribution has two main implication in generalised learning of time-series. On the one hand, a generic procedure to do cross-validation for time-series model and on the other hand a technique to build learning curves for time-series without need to reducing sample size of the time-series data.

We concentrate on one dimensional time-series for basic investigation. Extensions to higher-dimensions should be self-evident. Consider series of numbers,

, vector of length

, , where and where the ordered dataset reads, . This tuple of ordered numbers considered as time-series, as often is interpreted as time evolution of and usually expressed as .

Out-of-sample (OOS) data usually appears as a continuation of the past time-series, hence, as a continuation of is defined as out-of-sample set, , vector of length , , where and , where where the ordered OOS dataset reads, .

Construction of cross-validated performance measures and learning curves for time-series will appear as a meta-algorithm processing time-series and .

2.1 Reconstructive cross-validation for time-series

The first step in rCV is to identify partitions of the time-series, here as in conventional cross-validation (Efron and Gong (1983)). Consider sets of partitions of are each set having randomly assigned , and partitions are approximately equal in size


A training-fold is defined as a union of all partitions except the removed partion,


Missing data would be on the corresponding removed partition .

Due to ordered nature of the series, the standard CV approach would not be used in different folds which yields to an absurd situation of predicting past using the future values. To overcome this, a reconstruction of full training series is denoted by can be introduces. This can be think of an imputation of missing data at random or smoothing in Bayesian sense. Using each training-fold via a secondary model

. A technique could be interpoloation or more advanced filtering approaches like Kalman filtering resulting


The secondary model could retain the given points on the training-fold in this approach. is the reconstructed portion.

The total error due to reconstructed model expresssed as , here for example we write down as a mean absolute percent error (MAPE), obviously different metrics can be used,


The primary model, , is build on each and predictions on out-of-sample set is applied. This results in set of predictions , the error is expressed as


The total error in rCV is computed as follows.


The lower the number better the generalisation, however both and should be judge seperately to detect any anomalies. We have choosed as multiplivative error of reconstruction and prediction errors, so that it represents weighted error. More complex schemes to estimate can be divised . Note that both and are test errors in a conventional sense, while both reconstruction and prediction computations are performed on a Gaussian process that parameters are fixed, i.e., corresponding to Ornstein-Uhlenbeck processes.

2.2 Learning Curves for time-series

Time-series learning curves are not that common due to fact that data sample sizes are limited. However with reconstruction approach we provided above, one can build a learning curve , based on number of folds


is the error measure used with different errors over range of different values. The error terms can be any of the errors defined above , or . Note that, unlike other learning curves build upon reducing sample-size (Perlich et al. (2003)), is constructed with retaining the sample-size of the original time-series over each point on the learning curve. The reason behind this, the number of missing data at random on reconstructed folds explained above, will decrease with increasing number of folds. Combined reconstruction prediction errors, as performance measure, will be effected by this by changing number of folds, hence the learning curve as in basic definition of supervised learning (Mitchell (1997)).

Figure 1: Simulated Ornstein-Uhlenbeck data corresponding to and series in our formulation (left). 10-fold reconstruction absolute errors, difference between reference and imputed time series , mean difference is (right).

3 Experimental Setup

We have demonstrated the utility of our technique using a specific kernel in a Gaussian process setting (Williams and Rasmussen (2006)). This is corresponding to Ornstein-Uhlenbeck process and used in description of Brownian motion in statistical physics (Gardiner (2009)). The learning task is aim at predicting the OOS data using past series .

Figure 2: Different learning curves in our meta-algorithm: based on reconstruction error (left) and prediction error (right). Increasing number of folds indicates less number of points to reconstruct, i.e. larger sample-size traditional sense.

3.1 Ornstein-Uhlenbeck process

One can generate Ornstein-Uhlenbeck process drawing numbers from multivariate Gaussian with specific covariance structure.


Taking as all at , and build via kernel where is the distance matrix constructed over time-points as follows, which is a symmetric matrix,

We generated Ornstein-Uhlenbeck (OU) time-series for a regular time points with a spacing of with different length scales, mean values , and additional time points for the prediction task, see Figure 1.

3.2 Reconstructive cross-validation

We apply our meta-algorithm to construct both primary and secondory models using Gaussian process predictions with unit regularisation, it is formulated as follows: Given ordered pairs as time-series, we aim at inferring, i.e., reconstructing, missing values at time-points . The missing values can be identified via Bayesian interpretation of kernel regularisation,


whee . The kernel matrices is build via kernel where is the distance matrix over time-points of missing at random folds and the other folds. A secondary model is used to reconstruct . The absolute errors in 10-fold are shown graphically in Figure 1.

Similar procedure is followed in predicting the OOS vector . The reconstruction error, prediction error and rCV error were computed as 0.029, 0.468 and 0.013 respectively. Note that rCV error is not a MAPE but a measure of generalisation. The high prediction error is attributed to long time horizon we choose. In practice much shorter time-horizon must be used for practical utility.

3.3 Learning curves

We produced time-series learning curves for Ornstein-Uhlenbeck process we generated via rCV with varying fold values, i.e., different partitions of the original time-series , see Figure 2. Constructed time-series learning curves constructed with rCV sample sizes are increasing by increasing number of folds in convensional sense. This is attributed to the fact that having larger number of folds implies less time-points missing at random to reconstruct, corresponding to larger sample, i.e., having more experience. Reported learning cruves corresponds to test learning curves as we use fixed Kernel parameters to generate and predict Ornstein-Uhlenbeck process.

4 Conclusion

We have presented a framework with a canonical process from Physics, Ornstein-Uhlenbeck process, that helps us in performing generalised learning in time-series without any restriction on the stationarity and retaining serial correlations order in the original dataset. The approach entails in applying cross-validation directly in combination with OOS estimate of the performance and reconstructing missing at random fold instances via secondary model. This approach, rCV, also allows one to generate a learning curve for time-series, as we have demonstrated.

The meta-algorithm we developed in this work can be used with any other learning algorithm. We only choose Gaussian processs in both reconstrucation and prediction tasks in demonstrating the framework due to its minimalist requirements for a naive implementation. Further implementation of the meta-algorithm in a generic setting is possible without embedding the learning algorithm into rCV procedure.

Code Supplement

We have provided a Python notebook for the prototype implementation of rCV and for reproducing results presented(rCV_prototype.ipynb).


  • C. Bergmeir, R. J. Hyndman, and B. Koo (2018) A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput. Stat. Data Anal. 120 (C), pp. 70–83. External Links: ISSN 0167-9473 Cited by: §1.
  • B. Efron and G. Gong (1983) A leisurely look at the bootstrap, the jackknife, and cross-validation. The American Statistician 37 (1), pp. 36–48. Cited by: §1, §2.1.
  • C. Gardiner (2009) Stochastic methods: a handbook for the natural and social sciences. 4th edition, Springer. Cited by: §3.
  • J. D. Hamilton (1994) Time series analysis. Vol. 2, Princeton university press Princeton. Cited by: §1.
  • R. Kohavi (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In

    Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2

    IJCAI’95, pp. 1137–1143. Cited by: §1.1, §1.
  • T. M. Mitchell (1997) Machine learning. 1st edition, McGraw-Hill, Inc., New York, NY, USA. External Links: ISBN 0070428077, 9780070428072 Cited by: §2.2.
  • C. Perlich, F. Provost, and J. S. Simonoff (2003)

    Tree induction vs. logistic regression: a learning-curve analysis

    J. Mach. Learn. Res. 4, pp. 211–255. External Links: ISSN 1532-4435 Cited by: §1, §2.2.
  • D. N. Politis and J. P. Romano (1994) The stationary bootstrap. Journal of the American Statistical Association 89 (428), pp. 1303–1313. Cited by: §1.
  • J. W. Richards, D. L. Starr, N. R. Butler, J. S. Bloom, J. M. Brewer, A. Crellin-Quick, J. Higgins, R. Kennedy, and M. Rischard (2011) On machine-learned classification of variable stars with sparse and noisy time-series data. The Astrophysical Journal 733 (1), pp. 10. Cited by: §1.
  • S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson, and S. Aigrain (2013) Gaussian processes for time-series modelling. Phil. Trans. R. Soc. A 371 (1984), pp. 20110550. Cited by: §1.
  • M. Stone (1974) Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological) 36 (2), pp. 111–133. Cited by: §1.1, §1.
  • M. Süzen and A. Ajraou (2016) Evaluating gaussian processes for sparse irregular spatio-temporal data. arXiv preprint arXiv:1611.02978. Cited by: §1.
  • C. K. Williams and C. E. Rasmussen (2006) Gaussian processes for machine learning. the MIT Press 2 (3), pp. 4. Cited by: §3.