Simple is better: Making Decision Trees faster using random sampling

08/19/2021
by   Vignesh Nanda Kumar, et al.
0

In recent years, gradient boosted decision trees have become popular in building robust machine learning models on big data. The primary technique that has enabled these algorithms success has been distributing the computation while building the decision trees. A distributed decision tree building, in turn, has been enabled by building quantiles of the big datasets and choosing the candidate split points from these quantile sets. In XGBoost, for instance, a sophisticated quantile building algorithm is employed to identify the candidate split points for the decision trees. This method is often projected to yield better results when the computation is distributed. In this paper, we dispel the notion that these methods provide more accurate and scalable methods for building decision trees in a distributed manner. In a significant contribution, we show theoretically and empirically that choosing the split points uniformly at random provides the same or even better performance in terms of accuracy and computational efficiency. Hence, a simple random selection of points suffices for decision tree building compared to more sophisticated methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2023

Construction of Decision Trees and Acyclic Decision Graphs from Decision Rule Systems

Decision trees and systems of decision rules are widely used as classifi...
research
11/30/2020

Using dynamical quantization to perform split attempts in online tree regressors

A central aspect of online decision tree solutions is evaluating the inc...
research
10/26/2017

Big Data Classification Using Augmented Decision Trees

We present an algorithm for classification tasks on big data. Experiment...
research
01/24/2023

A Robust Hypothesis Test for Tree Ensemble Pruning

Gradient boosted decision trees are some of the most popular algorithms ...
research
11/02/2020

A better method to enforce monotonic constraints in regression and classification trees

In this report we present two new ways of enforcing monotone constraints...
research
10/13/2021

Sub-Setting Algorithm for Training Data Selection in Pattern Recognition

Modern pattern recognition tasks use complex algorithms that take advant...

Please sign up or login with your details

Forgot password? Click here to reset