On the consistency of a random forest algorithm in the presence of missing entries

by   Irving Gómez Méndez, et al.

This paper tackles the problem of constructing a non-parametric predictor when the latent variables are given with incomplete information. The convenient predictor for this task is the random forest algorithm in conjunction to the so called CART criterion. The proposed technique enables a partial imputation of the missing values in the data set in a way that suits both a consistent estimation of the regression function as well as a probabilistic recovery of the missing values. A proof of the consistency of the random forest estimator that also simplifies the previous proofs of the classical consistency is given in the case where each latent variable is missing completely at random (MCAR).



page 1

page 2

page 3

page 4


A note on the consistency of the random forest algorithm

Examples are given of data-generating models for which Breiman's random ...

Prediction of Missing Semantic Relations in Lexical-Semantic Network using Random Forest Classifier

This study focuses on the prediction of missing six semantic relations (...

Who wins the Miss Contest for Imputation Methods? Our Vote for Miss BooPF

Missing data is an expected issue when large amounts of data is collecte...

Forest Learning from Data and its Universal Coding

This paper considers structure learning from data with n samples of p va...

Forest Learning Universal Coding

This paper considers structure learning from data with n samples of p va...

Linear predictor on linearly-generated data with missing values: non consistency and solutions

We consider building predictors when the data have missing values. We st...

Latent Dependency Forest Models

Probabilistic modeling is one of the foundations of modern machine learn...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.