Regularized impurity reduction: Accurate decision trees with complexity guarantees

08/23/2022
by   Guangyi Zhang, et al.
19

Decision trees are popular classification models, providing high accuracy and intuitive explanations. However, as the tree size grows the model interpretability deteriorates. Traditional tree-induction algorithms, such as C4.5 and CART, rely on impurity-reduction functions that promote the discriminative power of each split. Thus, although these traditional methods are accurate in practice, there has been no theoretical guarantee that they will produce small trees. In this paper, we justify the use of a general family of impurity functions, including the popular functions of entropy and Gini-index, in scenarios where small trees are desirable, by showing that a simple enhancement can equip them with complexity guarantees. We consider a general setting, where objects to be classified are drawn from an arbitrary probability distribution, classification can be binary or multi-class, and splitting tests are associated with non-uniform costs. As a measure of tree complexity, we adopt the expected cost to classify an object drawn from the input distribution, which, in the uniform-cost case, is the expected number of tests. We propose a tree-induction algorithm that gives a logarithmic approximation guarantee on the tree complexity. This approximation factor is tight up to a constant factor under mild assumptions. The algorithm recursively selects a test that maximizes a greedy criterion defined as a weighted sum of three components. The first two components encourage the selection of tests that improve the balance and the cost-efficiency of the tree, respectively, while the third impurity-reduction component encourages the selection of more discriminative tests. As shown in our empirical evaluation, compared to the original heuristics, the enhanced algorithms strike an excellent balance between predictive accuracy and tree complexity.

READ FULL TEXT

page 21

page 22

page 26

page 27

page 28

page 29

page 32

research
07/02/2021

Decision tree heuristics can fail, even in the smoothed setting

Greedy decision tree learning heuristics are mainstays of machine learni...
research
09/16/2022

Fast approximation of search trees on trees with centroid trees

Search trees on trees (STTs) generalize the fundamental binary search tr...
research
06/24/2019

Best Split Nodes for Regression Trees

Decision trees with binary splits are popularly constructed using Classi...
research
10/21/2020

Convex Polytope Trees

A decision tree is commonly restricted to use a single hyperplane to spl...
research
12/19/2017

A Faster Drop-in Implementation for Leaf-wise Exact Greedy Induction of Decision Tree Using Pre-sorted Deque

This short article presents a new implementation for decision trees. By ...
research
10/16/2020

Universal guarantees for decision tree induction via a higher-order splitting criterion

We propose a simple extension of top-down decision tree learning heurist...
research
10/18/2011

AOSO-LogitBoost: Adaptive One-Vs-One LogitBoost for Multi-Class Problem

This paper presents an improvement to model learning when using multi-cl...

Please sign up or login with your details

Forgot password? Click here to reset