A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds

by   Yan Shuo Tan, et al.

Decision trees are important both as interpretable models amenable to high-stakes decision-making, and as building blocks of ensemble methods such as random forests and gradient boosting. Their statistical properties, however, are not well understood. The most cited prior works have focused on deriving pointwise consistency guarantees for CART in a classical nonparametric regression setting. We take a different approach, and advocate studying the generalization performance of decision trees with respect to different generative regression models. This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data, thereby guiding practitioners on when and how to apply these methods. In this paper, we focus on sparse additive generative models, which have both low statistical complexity and some nonparametric flexibility. We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models with C^1 component functions. This bound is surprisingly much worse than the minimax rate for estimating such sparse additive models. The inefficiency is due not to greediness, but to the loss in power for detecting global structure when we average responses solely over each leaf, an observation that suggests opportunities to improve tree-based algorithms, for example, by hierarchical shrinkage. To prove these bounds, we develop new technical machinery, establishing a novel connection between decision tree estimation and rate-distortion theory, a sub-field of information theory.


page 1

page 2

page 3

page 4


Superpolynomial Lower Bounds for Decision Tree Learning and Testing

We establish new hardness results for decision tree optimization problem...

Properly Learning Decision Trees with Queries Is NP-Hard

We prove that it is NP-hard to properly PAC learn decision trees with qu...

Adaptively Pruning Features for Boosted Decision Trees

Boosted decision trees enjoy popularity in a variety of applications; ho...

Learning stochastic decision trees

We give a quasipolynomial-time algorithm for learning stochastic decisio...

Tree Boosted Varying Coefficient Models

This paper investigates the integration of gradient boosted decision tre...

How Interpretable and Trustworthy are GAMs?

Generalized additive models (GAMs) have become a leading model class for...

Random Forests, Decision Trees, and Categorical Predictors: The "Absent Levels" Problem

One of the advantages that decision trees have over many other models is...

Please sign up or login with your details

Forgot password? Click here to reset