Forecast Evaluation for Data Scientists: Common Pitfalls and Best Practices

by   Hansika Hewamalage, et al.

Machine Learning (ML) and Deep Learning (DL) methods are increasingly replacing traditional methods in many domains involved with important decision making activities. DL techniques tailor-made for specific tasks such as image recognition, signal processing, or speech analysis are being introduced at a fast pace with many improvements. However, for the domain of forecasting, the current state in the ML community is perhaps where other domains such as Natural Language Processing and Computer Vision were at several years ago. The field of forecasting has mainly been fostered by statisticians/econometricians; consequently the related concepts are not the mainstream knowledge among general ML practitioners. The different non-stationarities associated with time series challenge the data-driven ML models. Nevertheless, recent trends in the domain have shown that with the availability of massive amounts of time series, ML techniques are quite competent in forecasting, when related pitfalls are properly handled. Therefore, in this work we provide a tutorial-like compilation of the details of one of the most important steps in the overall forecasting process, namely the evaluation. This way, we intend to impart the information of forecast evaluation to fit the context of ML, as means of bridging the knowledge gap between traditional methods of forecasting and state-of-the-art ML techniques. We elaborate on the different problematic characteristics of time series such as non-normalities and non-stationarities and how they are associated with common pitfalls in forecast evaluation. Best practices in forecast evaluation are outlined with respect to the different steps such as data partitioning, error calculation, statistical testing, and others. Further guidelines are also provided along selecting valid and suitable error measures depending on the specific characteristics of the dataset at hand.


page 4

page 25


Deep Learning for Energy Time-Series Analysis and Forecasting

Energy time-series analysis describes the process of analyzing past ener...

DeepTSF: Codeless machine learning operations for time series forecasting

This paper presents DeepTSF, a comprehensive machine learning operations...

A Review of Open Source Software Tools for Time Series Analysis

Time series data is used in a wide range of real world applications. In ...

Process Model Forecasting Using Time Series Analysis of Event Sequence Data

Process analytics is an umbrella of data-driven techniques which include...

Multi-Variate Time Series Forecasting on Variable Subsets

We formulate a new inference task in the domain of multivariate time ser...

Who Needs MLOps: What Data Scientists Seek to Accomplish and How Can MLOps Help?

Following continuous software engineering practices, there has been an i...

Stack Index Prediction Using Time-Series Analysis

The Prevalence of Community support and engagement for different domains...

Please sign up or login with your details

Forgot password? Click here to reset