Classification Tree Pruning Under Covariate Shift

05/07/2023
by   Nicholas Galbraith, et al.
0

We consider the problem of pruning a classification tree, that is, selecting a suitable subtree that balances bias and variance, in common situations with inhomogeneous training data. Namely, assuming access to mostly data from a distribution P_X, Y, but little data from a desired distribution Q_X, Y with different X-marginals, we present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate. Optimality is derived with respect to a notion of average discrepancy P_X→ Q_X (averaged over X space) which significantly relaxes a recent notion – termed transfer-exponent – shown to tightly capture the limits of classification under such a distribution shift. Our relaxed notion can be viewed as a measure of relative dimension between distributions, as it relates to existing notions of information such as the Minkowski and Renyi dimensions.

READ FULL TEXT
research
02/06/2022

A new similarity measure for covariate shift with applications to nonparametric regression

We study covariate shift in the context of nonparametric regression. We ...
research
03/05/2018

Marginal Singularity, and the Benefits of Labels in Covariate-Shift

We present new minimax results that concisely capture the relative benef...
research
07/21/2023

General regularization in covariate shift adaptation

Sample reweighting is one of the most widely used methods for correcting...
research
10/17/2017

On reducing sampling variance in covariate shift using control variates

Covariate shift classification problems can in principle be tackled by i...
research
12/12/2011

Robust Learning via Cause-Effect Models

We consider the problem of function estimation in the case where the dat...
research
07/17/2023

Covariate shift in nonparametric regression with Markovian design

Covariate shift in regression problems and the associated distribution m...
research
08/06/2019

Semiparametric Wavelet-based JPEG IV Estimator for endogenously truncated data

A new and an enriched JPEG algorithm is provided for identifying redunda...

Please sign up or login with your details

Forgot password? Click here to reset