Training Big Random Forests with Little Resources

02/18/2018
by   Fabian Gieseke, et al.
0

Without access to large compute clusters, building random forests on large datasets is still a challenging problem. This is, in particular, the case if fully-grown trees are desired. We propose a simple yet effective framework that allows to efficiently construct ensembles of huge trees for hundreds of millions or even billions of training instances using a cheap desktop computer with commodity hardware. The basic idea is to consider a multi-level construction scheme, which builds top trees for small random subsets of the available data and which subsequently distributes all training instances to the top trees' leaves for further processing. While being conceptually simple, the overall efficiency crucially depends on the particular implementation of the different phases. The practical merits of our approach are demonstrated using dense datasets with hundreds of millions of training instances.

READ FULL TEXT
research
01/31/2021

Ordinal Trees and Random Forests: Score-Free Recursive Partitioning and Improved Ensembles

Existing ordinal trees and random forests typically use scores that are ...
research
07/05/2022

An Approximation Method for Fitted Random Forests

Random Forests (RF) is a popular machine learning method for classificat...
research
08/18/2015

ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R

We introduce the C++ application and R package ranger. The software is a...
research
03/30/2021

Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a Random Forest

Due to their long-standing reputation as excellent off-the-shelf predict...
research
08/06/2020

Modeling of time series using random forests: theoretical developments

In this paper we study asymptotic properties of random forests within th...
research
06/05/2023

Fast Search-By-Classification for Large-Scale Databases Using Index-Aware Decision Trees and Random Forests

The vast amounts of data collected in various domains pose great challen...
research
12/12/2017

Attaching leaves and picking cherries to characterise the hybridisation number for a set of phylogenies

Throughout the last decade, we have seen much progress towards character...

Please sign up or login with your details

Forgot password? Click here to reset