Streaming Decision Trees and Forests

by   Haoyin Xu, et al.
Johns Hopkins University

Machine learning has successfully leveraged modern data and provided computational solutions to innumerable real-world problems, including physical and biomedical discoveries. Currently, estimators could handle both scenarios with all samples available and situations requiring continuous updates. However, there is still room for improvement on streaming algorithms based on batch decision trees and random forests, which are the leading methods in batch data tasks. In this paper, we explore the simplest partial fitting algorithm to extend batch trees and test our models: stream decision tree (SDT) and stream decision forest (SDF) on three classification tasks of varying complexities. For reference, both existing streaming trees (Hoeffding trees and Mondrian forests) and batch estimators are included in the experiments. In all three tasks, SDF consistently produces high accuracy, whereas existing estimators encounter space restraints and accuracy fluctuations. Thus, our streaming trees and forests show great potential for further improvements, which are good candidates for solving problems like distribution drift and transfer learning.


page 1

page 2

page 3

page 4


VHT: Vertical Hoeffding Tree

IoT Big Data requires new machine learning methods able to scale to larg...

Fairness-guided SMT-based Rectification of Decision Trees and Random Forests

Data-driven decision making is gaining prominence with the popularity of...

Emergent and Unspecified Behaviors in Streaming Decision Trees

Hoeffding trees are the state-of-the-art methods in decision tree learni...

Minimax optimal rates for Mondrian trees and forests

Introduced by Breiman (2001), Random Forests are widely used as classifi...

Coresets for Decision Trees of Signals

A k-decision tree t (or k-tree) is a recursive partition of a matrix (2D...

Decision Tree and Random Forest Implementations for Fast Filtering of Sensor Data

With increasing capabilities of energy efficient systems, computational ...

Random Forests, Decision Trees, and Categorical Predictors: The "Absent Levels" Problem

One of the advantages that decision trees have over many other models is...

Code Repositories


Exploring streaming options for decision tree and random forests. Based on scikit-learn fork.

view repo

Please sign up or login with your details

Forgot password? Click here to reset