Generalised learning of time-series: Ornstein-Uhlenbeck processes

10/21/2019
by   Mehmet Süzen, et al.
0

In machine learning, statistics, econometrics and statistical physics, k-fold cross-validation (CV) is used as a standard approach in quantifying the generalization performance of a statistical model. Applying this approach directly to time series models is avoided by practitioners due to intrinsic nature of serial correlations in the ordered data due to implications like absurdity of using future data to predict past and non-stationarity issues. In this work, we propose a technique called reconstructive cross validation (rCV) that avoids all these issues enabling generalized learning in time-series as a meta-algorithm. In rCV, data points in the test fold, randomly selected points from the time series, are first removed. Then, a secondary time series model or a technique is used in reconstructing the removed points from the test fold, i.e., imputation or smoothing. Thereafter, the primary model is build using new dataset coming from the secondary model or a technique. The performance of the primary model on the test set by computing the deviations from the originally removed and out-of-sample (OSS) data are evaluated simultaneously. This amounts to reconstruction and prediction errors. By this procedure serial correlations and data order is retained and k-fold cross-validation is reached generically. If reconstruction model uses a technique whereby the existing data points retained exactly, such as Gaussian process regression, the reconstruction itself will not result in information loss from non-reconstructed portion of the original data points. We have applied rCV to estimate the general performance of the model build on simulated Ornstein-Uhlenbeck process. We have shown an approach to build a time-series learning curves utilizing rCV.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2019

Approximate leave-future-out cross-validation for time series models

One of the common goals of time series analysis is to use the observed s...
research
02/17/2019

Approximate leave-future-out cross-validation for Bayesian time series models

One of the common goals of time series analysis is to use the observed s...
research
05/28/2020

Estimating the Prediction Performance of Spatial Models via Spatial k-Fold Cross Validation

In machine learning one often assumes the data are independent when eval...
research
07/04/2019

Subsampling Bias and The Best-Discrepancy Systematic Cross Validation

Statistical machine learning models should be evaluated and validated be...
research
09/05/2018

Preserving Order of Data When Validating Defect Prediction Models

[Context] The use of defect prediction models, such as classifiers, can ...
research
02/14/2020

Combining Parametric Land Surface Models with Machine Learning

A hybrid machine learning and process-based-modeling (PBM) approach is p...
research
11/12/2018

An Easy Implementation of CV-TMLE

In the world of targeted learning, cross-validated targeted maximum like...

Please sign up or login with your details

Forgot password? Click here to reset