Decision tree heuristics can fail, even in the smoothed setting

07/02/2021
by   Guy Blanc, et al.
0

Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets f that are k-juntas, they showed that these heuristics successfully learn f with depth-k decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-k decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-k decision trees and show that even in the smoothed setting, these heuristics build trees of depth 2^Ω(k) before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to k-juntas, for which these heuristics build trees of depth 2^Ω(k) before achieving high accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2021

Properly learning decision trees in almost polynomial time

We give an n^O(loglog n)-time membership query algorithm for properly an...
research
08/23/2022

Regularized impurity reduction: Accurate decision trees with complexity guarantees

Decision trees are popular classification models, providing high accurac...
research
06/01/2020

Provable guarantees for decision tree induction: the agnostic setting

We give strengthened provable guarantees on the performance of widely em...
research
04/12/2019

Learning Optimal Decision Trees from Large Datasets

Inferring a decision tree from a given dataset is one of the classic pro...
research
11/03/2020

Estimating decision tree learnability with polylogarithmic sample complexity

We show that top-down decision tree learning heuristics are amenable to ...
research
06/12/2017

Random Forests, Decision Trees, and Categorical Predictors: The "Absent Levels" Problem

One of the advantages that decision trees have over many other models is...
research
05/04/2019

Optimal Resampling for Learning Small Models

Models often need to be constrained to a certain size for them to be con...

Please sign up or login with your details

Forgot password? Click here to reset