Streaming Decision Trees and Forests

10/16/2021
by   Haoyin Xu, et al.
10

Machine learning has successfully leveraged modern data and provided computational solutions to innumerable real-world problems, including physical and biomedical discoveries. Currently, estimators could handle both scenarios with all samples available and situations requiring continuous updates. However, there is still room for improvement on streaming algorithms based on batch decision trees and random forests, which are the leading methods in batch data tasks. In this paper, we explore the simplest partial fitting algorithm to extend batch trees and test our models: stream decision tree (SDT) and stream decision forest (SDF) on three classification tasks of varying complexities. For reference, both existing streaming trees (Hoeffding trees and Mondrian forests) and batch estimators are included in the experiments. In all three tasks, SDF consistently produces high accuracy, whereas existing estimators encounter space restraints and accuracy fluctuations. Thus, our streaming trees and forests show great potential for further improvements, which are good candidates for solving problems like distribution drift and transfer learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/28/2016

VHT: Vertical Hoeffding Tree

IoT Big Data requires new machine learning methods able to scale to larg...
research
11/22/2020

Fairness-guided SMT-based Rectification of Decision Trees and Random Forests

Data-driven decision making is gaining prominence with the popularity of...
research
10/16/2020

Emergent and Unspecified Behaviors in Streaming Decision Trees

Hoeffding trees are the state-of-the-art methods in decision tree learni...
research
03/15/2018

Minimax optimal rates for Mondrian trees and forests

Introduced by Breiman (2001), Random Forests are widely used as classifi...
research
10/07/2021

Coresets for Decision Trees of Signals

A k-decision tree t (or k-tree) is a recursive partition of a matrix (2D...
research
10/27/2020

Decision Tree and Random Forest Implementations for Fast Filtering of Sensor Data

With increasing capabilities of energy efficient systems, computational ...
research
06/12/2017

Random Forests, Decision Trees, and Categorical Predictors: The "Absent Levels" Problem

One of the advantages that decision trees have over many other models is...

Please sign up or login with your details

Forgot password? Click here to reset