DeepAI AI Chat
Log In Sign Up

Sparse learning with CART

by   Jason M. Klusowski, et al.

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the statistical properties of regression trees constructed with CART. In doing so, we find that the training error is governed by the Pearson correlation between the optimal decision stump and response data in each node, which we bound by constructing a prior distribution on the split points and solving a quadratic program. We leverage this connection between the training error and Pearson correlation to show that CART with cost-complexity pruning achieves an optimal complexity/goodness-of-fit tradeoff when the depth scales with the logarithm of the sample size. Data dependent quantities, which adapt to the dimensionality and latent structure of the regression model, are seen to govern the rates of convergence of the prediction error.


page 1

page 2

page 3

page 4


Best Split Nodes for Regression Trees

Decision trees with binary splits are popularly constructed using Classi...

Universal Consistency of Decision Trees in High Dimensions

This paper shows that decision trees constructed with Classification and...

Convergence Rates of Oblique Regression Trees for Flexible Function Libraries

We develop a theoretical framework for the analysis of oblique decision ...

Learning Binary Trees via Sparse Relaxation

One of the most classical problems in machine learning is how to learn b...

On the Optimality of Trees Generated by ID3

Since its inception in the 1980s, ID3 has become one of the most success...

Trees-Based Models for Correlated Data

This paper presents a new approach for trees-based regression, such as s...

The Effect of Heteroscedasticity on Regression Trees

Regression trees are becoming increasingly popular as omnibus predicting...