DeepAI AI Chat
Log In Sign Up

Locally Optimized Random Forests

by   Tim Coleman, et al.

Standard supervised learning procedures are validated against a test set that is assumed to have come from the same distribution as the training data. However, in many problems, the test data may have come from a different distribution. We consider the case of having many labeled observations from one distribution, P_1, and making predictions at unlabeled points that come from P_2. We combine the high predictive accuracy of random forests (Breiman, 2001) with an importance sampling scheme, where the splits and predictions of the base-trees are done in a weighted manner, which we call Locally Optimized Random Forests. These weights correspond to a non-parametric estimate of the likelihood ratio between the training and test distributions. To estimate these ratios with an unlabeled test set, we make the covariate shift assumption, where the differences in distribution are only a function of the training distributions (Shimodaira, 2000.) This methodology is motivated by the problem of forecasting power outages during hurricanes. The extreme nature of the most devastating hurricanes means that typical validation set ups will overly favor less extreme storms. Our method provides a data-driven means of adapting a machine learning method to deal with extreme events.


page 1

page 2

page 3

page 4


Conformal Prediction Under Covariate Shift

We extend conformal prediction methodology beyond the case of exchangeab...

Test-time Recalibration of Conformal Predictors Under Distribution Shift Based on Unlabeled Examples

Modern image classifiers achieve high predictive accuracy, but the predi...

Estimation of prediction error with known covariate shift

In supervised learning, the estimation of prediction error on unlabeled ...

Adapting to Continuous Covariate Shift via Online Density Ratio Estimation

Dealing with distribution shifts is one of the central challenges for mo...

Sequential changepoint detection for label shift in classification

Classifier predictions often rely on the assumption that new observation...

PKLM: A flexible MCAR test using Classification

We develop a fully non-parametric, fast, easy-to-use, and powerful test ...

Trend detection in GEV models

In recent environmental studies extreme events have a great impact. The ...

Code Repositories


Some (very rough!) code for implementing locally optimized random forests. Code will NOT run in place.

view repo