Universal guarantees for decision tree induction via a higher-order splitting criterion

10/16/2020
by   Guy Blanc, et al.
0

We propose a simple extension of top-down decision tree learning heuristics such as ID3, C4.5, and CART. Our algorithm achieves provable guarantees for all target functions f: {-1,1}^n →{-1,1} with respect to the uniform distribution, circumventing impossibility results showing that existing heuristics fare poorly even for simple target functions. The crux of our extension is a new splitting criterion that takes into account the correlations between f and small subsets of its attributes. The splitting criteria of existing heuristics (e.g. Gini impurity and information gain), in contrast, are based solely on the correlations between f and its individual attributes. Our algorithm satisfies the following guarantee: for all target functions f : {-1,1}^n →{-1,1}, sizes s∈ℕ, and error parameters ϵ, it constructs a decision tree of size s^Õ((log s)^2/ϵ^2) that achieves error ≤ O(𝗈𝗉𝗍_s) + ϵ, where 𝗈𝗉𝗍_s denotes the error of the optimal size s decision tree. A key technical notion that drives our analysis is the noise stability of f, a well-studied smoothness measure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2020

Provable guarantees for decision tree induction: the agnostic setting

We give strengthened provable guarantees on the performance of widely em...
research
11/18/2019

Top-down induction of decision trees: rigorous guarantees and inherent limitations

Consider the following heuristic for building a decision tree for a func...
research
11/03/2020

Estimating decision tree learnability with polylogarithmic sample complexity

We show that top-down decision tree learning heuristics are amenable to ...
research
06/17/2022

Popular decision tree algorithms are provably noise tolerant

Using the framework of boosting, we prove that all impurity-based decisi...
research
04/12/2016

Confidence Decision Trees via Online and Active Learning for Streaming (BIG) Data

Decision tree classifiers are a widely used tool in data stream mining. ...
research
07/03/2023

Systematic Bias in Sample Inference and its Effect on Machine Learning

A commonly observed pattern in machine learning models is an underpredic...
research
08/23/2022

Regularized impurity reduction: Accurate decision trees with complexity guarantees

Decision trees are popular classification models, providing high accurac...

Please sign up or login with your details

Forgot password? Click here to reset