Two-stage Best-scored Random Forest for Large-scale Regression

05/09/2019
by   Hanyuan Hang, et al.
0

We propose a novel method designed for large-scale regression problems, namely the two-stage best-scored random forest (TBRF). "Best-scored" means to select one regression tree with the best empirical performance out of a certain number of purely random regression tree candidates, and "two-stage" means to divide the original random tree splitting procedure into two: In stage one, the feature space is partitioned into non-overlapping cells; in stage two, child trees grow separately on these cells. The strengths of this algorithm can be summarized as follows: First of all, the pure randomness in TBRF leads to the almost optimal learning rates, and also makes ensemble learning possible, which resolves the boundary discontinuities long plaguing the existing algorithms. Secondly, the two-stage procedure paves the way for parallel computing, leading to computational efficiency. Last but not least, TBRF can serve as an inclusive framework where different mainstream regression strategies such as linear predictor and least squares support vector machines (LS-SVMs) can also be incorporated as value assignment approaches on leaves of the child trees, depending on the characteristics of the underlying data sets. Numerical assessments on comparisons with other state-of-the-art methods on several large-scale real data sets validate the promising prediction accuracy and high computational efficiency of our algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2019

Best-scored Random Forest Classification

We propose an algorithm named best-scored random forest for binary class...
research
06/24/2019

Density-based Clustering with Best-scored Random Forest

Single-level density-based approach has long been widely acknowledged to...
research
05/09/2019

Best-scored Random Forest Density Estimation

This paper presents a brand new nonparametric density estimation strateg...
research
09/13/2020

Random boosting and random^2 forests – A random tree depth injection approach

The induction of additional randomness in parallel and sequential ensemb...
research
06/17/2020

Regularized ERM on random subspaces

We study a natural extension of classical empirical risk minimization, w...
research
12/29/2020

Random Planted Forest: a directly interpretable tree ensemble

We introduce a novel interpretable and tree-based algorithm for predicti...
research
01/20/2021

Autocart – spatially-aware regression trees for ecological and spatial modeling

Many ecological and spatial processes are complex in nature and are not ...

Please sign up or login with your details

Forgot password? Click here to reset