Geometry- and Accuracy-Preserving Random Forest Proximities

01/29/2022
by   Jake S. Rhodes, et al.
8

Random forests are considered one of the best out-of-the-box classification and regression algorithms due to their high level of predictive performance with relatively little tuning. Pairwise proximities can be computed from a trained random forest which measure the similarity between data points relative to the supervised task. Random forest proximities have been used in many applications including the identification of variable importance, data imputation, outlier detection, and data visualization. However, existing definitions of random forest proximities do not accurately reflect the data geometry learned by the random forest. In this paper, we introduce a novel definition of random forest proximities called Random Forest-Geometry- and Accuracy-Preserving proximities (RF-GAP). We prove that the proximity-weighted sum (regression) or majority vote (classification) using RF-GAP exactly match the out-of-bag random forest prediction, thus capturing the data geometry learned by the random forest. We empirically show that this improved geometric representation outperforms traditional random forest proximities in tasks such as data imputation and provides outlier detection and visualization results consistent with the learned data geometry.

READ FULL TEXT

page 9

page 12

page 13

page 14

page 16

page 17

page 18

research
10/02/2019

A note on the consistency of the random forest algorithm

Examples are given of data-generating models for which Breiman's random ...
research
01/28/2020

A random forest based approach for predicting spreads in the primary catastrophe bond market

We introduce a random forest approach to enable spreads' prediction in t...
research
03/14/2023

RODD: Robust Outlier Detection in Data Cubes

Data cubes are multidimensional databases, often built from several sepa...
research
11/27/2019

Single Sample Feature Importance: An Interpretable Algorithm for Low-Level Feature Analysis

Have you ever wondered how your feature space is impacting the predictio...
research
03/02/2023

A Notion of Feature Importance by Decorrelation and Detection of Trends by Random Forest Regression

In many studies, we want to determine the influence of certain features ...
research
02/10/2021

Feature Analyses and Modelling of Lithium-ion Batteries Manufacturing based on Random Forest Classification

Lithium-ion battery manufacturing is a highly complicated process with s...
research
06/18/2018

Comparison-Based Random Forests

Assume we are given a set of items from a general metric space, but we n...

Please sign up or login with your details

Forgot password? Click here to reset