Decision trees as partitioning machines to characterize their generalization properties

10/14/2020
by   Jean-Samuel Leboeuf, et al.
0

Decision trees are popular machine learning models that are simple to build and easy to interpret. Even though algorithms to learn decision trees date back to almost 50 years, key properties affecting their generalization error are still weakly bounded. Hence, we revisit binary decision trees on real-valued features from the perspective of partitions of the data. We introduce the notion of partitioning function, and we relate it to the growth function and to the VC dimension. Using this new concept, we are able to find the exact VC dimension of decision stumps, which is given by the largest integer d such that 2ℓ≥d⌊d/2⌋, where ℓ is the number of real-valued features. We provide a recursive expression to bound the partitioning functions, resulting in a upper bound on the growth function of any decision tree structure. This allows us to show that the VC dimension of a binary tree structure with N internal nodes is of order N log(Nℓ). Finally, we elaborate a pruning algorithm based on these results that performs better than the CART algorithm on a number of datasets, with the advantage that no cross-validation is required.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2022

Generalization Properties of Decision Trees on Real-valued and Categorical Features

We revisit binary decision trees from the perspective of partitions of t...
research
02/03/2020

Evolutionary algorithms for constructing an ensemble of decision trees

Most decision tree induction algorithms are based on a greedy top-down r...
research
07/28/2016

VHT: Vertical Hoeffding Tree

IoT Big Data requires new machine learning methods able to scale to larg...
research
01/04/2022

Time and space complexity of deterministic and nondeterministic decision trees

In this paper, we study arbitrary infinite binary information systems ea...
research
06/19/2020

When Is Amplification Necessary for Composition in Randomized Query Complexity?

Suppose we have randomized decision trees for an outer function f and an...
research
02/16/2023

On marginal feature attributions of tree-based models

Due to their power and ease of use, tree-based machine learning models h...
research
06/14/2021

Discovering Interpretable Machine Learning Models in Parallel Coordinates

This paper contributes to interpretable machine learning via visual know...

Please sign up or login with your details

Forgot password? Click here to reset