Shrub Ensembles for Online Classification

by   Sebastian Buschjäger, et al.

Online learning algorithms have become a ubiquitous tool in the machine learning toolbox and are frequently used in small, resource-constraint environments. Among the most successful online learning methods are Decision Tree (DT) ensembles. DT ensembles provide excellent performance while adapting to changes in the data, but they are not resource efficient. Incremental tree learners keep adding new nodes to the tree but never remove old ones increasing the memory consumption over time. Gradient-based tree learning, on the other hand, requires the computation of gradients over the entire tree which is costly for even moderately sized trees. In this paper, we propose a novel memory-efficient online classification ensemble called shrub ensembles for resource-constraint systems. Our algorithm trains small to medium-sized decision trees on small windows and uses stochastic proximal gradient descent to learn the ensemble weights of these `shrubs'. We provide a theoretical analysis of our algorithm and include an extensive discussion on the behavior of our approach in the online setting. In a series of 2 959 experiments on 12 different datasets, we compare our method against 8 state-of-the-art methods. Our Shrub Ensembles retain an excellent performance even when only little memory is available. We show that SE offers a better accuracy-memory trade-off in 7 of 12 cases, while having a statistically significant better performance than most other methods. Our implementation is available under .


page 1

page 2

page 3

page 4


Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement

Random Forests (RF) are among the state-of-the-art in many machine learn...

Green Accelerated Hoeffding Tree

State-of-the-art machine learning solutions mainly focus on creating hig...

An Eager Splitting Strategy for Online Decision Trees

We study the effectiveness of replacing the split strategy for the state...

Stochastic Gradient Trees

We present an online algorithm that induces decision trees using gradien...

Efficient implementation of incremental proximal-point methods

Model training algorithms which observe a small portion of the training ...

PACSET (Packed Serialized Trees): Reducing Inference Latency for Tree Ensemble Deployment

We present methods to serialize and deserialize tree ensembles that opti...

LiteMORT: A memory efficient gradient boosting tree system on adaptive compact distributions

Gradient boosted decision trees (GBDT) is the leading algorithm for many...