Locally Optimized Random Forests

08/27/2019
by   Tim Coleman, et al.
24

Standard supervised learning procedures are validated against a test set that is assumed to have come from the same distribution as the training data. However, in many problems, the test data may have come from a different distribution. We consider the case of having many labeled observations from one distribution, P_1, and making predictions at unlabeled points that come from P_2. We combine the high predictive accuracy of random forests (Breiman, 2001) with an importance sampling scheme, where the splits and predictions of the base-trees are done in a weighted manner, which we call Locally Optimized Random Forests. These weights correspond to a non-parametric estimate of the likelihood ratio between the training and test distributions. To estimate these ratios with an unlabeled test set, we make the covariate shift assumption, where the differences in distribution are only a function of the training distributions (Shimodaira, 2000.) This methodology is motivated by the problem of forecasting power outages during hurricanes. The extreme nature of the most devastating hurricanes means that typical validation set ups will overly favor less extreme storms. Our method provides a data-driven means of adapting a machine learning method to deal with extreme events.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2019

Conformal Prediction Under Covariate Shift

We extend conformal prediction methodology beyond the case of exchangeab...
research
10/09/2022

Test-time Recalibration of Conformal Predictors Under Distribution Shift Based on Unlabeled Examples

Modern image classifiers achieve high predictive accuracy, but the predi...
research
05/04/2022

Estimation of prediction error with known covariate shift

In supervised learning, the estimation of prediction error on unlabeled ...
research
02/06/2023

Adapting to Continuous Covariate Shift via Online Density Ratio Estimation

Dealing with distribution shifts is one of the central challenges for mo...
research
09/18/2020

Sequential changepoint detection for label shift in classification

Classifier predictions often rely on the assumption that new observation...
research
09/21/2021

PKLM: A flexible MCAR test using Classification

We develop a fully non-parametric, fast, easy-to-use, and powerful test ...
research
10/08/2022

Accurate Small Models using Adaptive Sampling

We highlight the utility of a certain property of model training: instea...

Please sign up or login with your details

Forgot password? Click here to reset