Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting

05/25/2023
by   Hilaf Hasson, et al.
0

Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box base learners fall under the umbrella of "stacked generalization," namely training an ML algorithm that takes the inferences from the base learners as input. While stacking has been widely applied in practice, its theoretical properties are poorly understood. In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform "much worse" than the oracle best. Our result strengthens and significantly extends the results in Van der Laan et al. (2007). Inspired by the theoretical analysis, we further propose a particular family of stacked generalizations in the context of probabilistic forecasting, each one with a different sensitivity for how much the ensemble weights are allowed to vary across items, timestamps in the forecast horizon, and quantiles. Experimental results demonstrate the performance gain of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2022

Combining Varied Learners for Binary Classification using Stacked Generalization

The Machine Learning has various learning algorithms that are better in ...
research
05/27/2011

Issues in Stacked Generalization

Stacked generalization is a general method of using a high-level model t...
research
05/04/2020

StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics

In machine learning (ML), ensemble methods such as bagging, boosting, an...
research
10/16/2016

Dynamic Stacked Generalization for Node Classification on Networks

We propose a novel stacked generalization (stacking) method as a dynamic...
research
04/03/2020

Stacked Generalizations in Imbalanced Fraud Data Sets using Resampling Methods

This study uses stacked generalization, which is a two-step process of c...
research
06/16/2021

Comparison of Automated Machine Learning Tools for SMS Spam Message Filtering

Short Message Service (SMS) is a very popular service used for communica...

Please sign up or login with your details

Forgot password? Click here to reset