Generalising Random Forest Parameter Optimisation to Include Stability and Cost

06/29/2017
by   C. H. Bryan Liu, et al.
0

Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that error reduction is only one of several metrics that must be considered when optimizing random forest parameters for commercial applications. We propose a novel metric that captures the stability of random forests predictions, which we argue is key for scenarios that require successive predictions. We motivate the need for multi-criteria optimization by showing that in practical applications, simply choosing the parameters that lead to the lowest error can introduce unnecessary costs and produce predictions that are not stable across independent runs. To optimize this multi-criteria trade-off, we present a new framework that efficiently finds a principled balance between these three considerations using Bayesian optimisation. The pitfalls of optimising forest parameters purely for error reduction are demonstrated using two publicly available real world datasets. We show that our framework leads to parameter settings that are markedly different from the values discovered by error reduction metrics.

READ FULL TEXT
research
05/02/2014

Asymptotic Theory for Random Forests

Random forests have proven to be reliable predictive algorithms in many ...
research
02/20/2015

Feature-Budgeted Random Forest

We seek decision rules for prediction-time cost reduction, where complet...
research
10/04/2013

Narrowing the Gap: Random Forests In Theory and In Practice

Despite widespread interest and practical use, the theoretical propertie...
research
07/09/2022

Attention and Self-Attention in Random Forests

New models of random forests jointly using the attention and self-attent...
research
10/18/2021

Regression with Missing Data, a Comparison Study of TechniquesBased on Random Forests

In this paper we present the practical benefits of a new random forest a...
research
12/23/2019

Large Random Forests: Optimisation for Rapid Evaluation

Random Forests are one of the most popular classifiers in machine learni...
research
10/13/2018

MaaSim: A Liveability Simulation for Improving the Quality of Life in Cities

Urbanism is no longer planned on paper thanks to powerful models and 3D ...

Please sign up or login with your details

Forgot password? Click here to reset