Classification Tree Pruning Under Covariate Shift

05/07/2023
by   Nicholas Galbraith, et al.
0

We consider the problem of pruning a classification tree, that is, selecting a suitable subtree that balances bias and variance, in common situations with inhomogeneous training data. Namely, assuming access to mostly data from a distribution P_X, Y, but little data from a desired distribution Q_X, Y with different X-marginals, we present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate. Optimality is derived with respect to a notion of average discrepancy P_X→ Q_X (averaged over X space) which significantly relaxes a recent notion – termed transfer-exponent – shown to tightly capture the limits of classification under such a distribution shift. Our relaxed notion can be viewed as a measure of relative dimension between distributions, as it relates to existing notions of information such as the Minkowski and Renyi dimensions.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset