A Nonparametric Test of Dependence Based on Ensemble of Decision Trees
In this paper, a robust non-parametric measure of statistical dependence, or correlation, between two random variables is presented. The proposed coefficient is a permutation-like statistic that quantifies how much the observed sample S_n : (X_i , Y_i), i = 1 . . . n is discriminable from the permutated sample ^S_nn : (X_i , Y_j), i, j = 1 . . . n, where the two variables are independent. The extent of discriminability is determined using the predictions for the, interchangeable, leave-out sample from training an aggregate of decision trees to discriminate between the two samples without materializing the permutated sample. The proposed coefficient is computationally efficient, interpretable, invariant to monotonic transformations, and has a well-approximated distribution under independence. Empirical results show the proposed method to have a high power for detecting complex relationships from noisy data.
READ FULL TEXT