A Look at the Evaluation Setup of the M5 Forecasting Competition

08/08/2021
by   Hansika Hewamalage, et al.
3

Forecast evaluation plays a key role in how empirical evidence shapes the development of the discipline. Domain experts are interested in error measures relevant for their decision making needs. Such measures may produce unreliable results. Although reliability properties of several metrics have already been discussed, it has hardly been quantified in an objective way. We propose a measure named Rank Stability, which evaluates how much the rankings of an experiment differ in between similar datasets, when the models and errors are constant. We use this to study the evaluation setup of the M5. We find that the evaluation setup of the M5 is less reliable than other measures. The main drivers of instability are hierarchical aggregation and scaling. Price-weighting reduces the stability of all tested error measures. Scale normalization of the M5 error measure results in less stability than other scale-free errors. Hierarchical levels taken separately are less stable with more aggregation, and their combination is even less stable than individual levels. We also show positive tradeoffs of retaining aggregation importance without affecting stability. Aggregation and stability can be linked to the influence of much debated magic numbers. Many of our findings can be applied to general hierarchical forecast benchmarking.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2021

Probabilistic Hierarchical Forecasting with Deep Poisson Mixtures

Hierarchical forecasting problems arise when time series compose a group...
research
03/26/2019

Short-term Load Forecasting at Different Aggregation Levels with Predictability Analysis

Short-term load forecasting (STLF) is essential for the reliable and eco...
research
09/09/2018

Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology

Performance metrics (error measures) are vital components of the evaluat...
research
02/20/2020

Stochastic Decision-Making Model for Aggregation of Residential Units with PV-Systems and Storages

Many residential energy consumers have installed photovoltaic (PV) panel...
research
01/09/2021

Hierarchical Dynamic Modeling for Individualized Bayesian Forecasting

We present a case study and methodological developments in large-scale h...
research
09/25/2020

Adjusted Measures for Feature Selection Stability for Data Sets with Similar Features

For data sets with similar features, for example highly correlated featu...
research
11/18/2021

How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task

Despite their success, modern language models are fragile. Even small ch...

Please sign up or login with your details

Forgot password? Click here to reset