Unifying Decision Trees Split Criteria Using Tsallis Entropy

11/25/2015
by   Yisen Wang, et al.
0

The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct near-optimal decision trees. ID3, C4.5 and CART are classical decision tree algorithms and the split criteria they used are Shannon entropy, Gain Ratio and Gini index respectively. All the split criteria seem to be independent, actually, they can be unified in a Tsallis entropy framework. Tsallis entropy is a generalization of Shannon entropy and provides a new approach to enhance decision trees' performance with an adjustable parameter q. In this paper, a Tsallis Entropy Criterion (TEC) algorithm is proposed to unify Shannon entropy, Gain Ratio and Gini index, which generalizes the split criteria of decision trees. More importantly, we reveal the relations between Tsallis entropy with different q and other split criteria. Experimental results on UCI data sets indicate that the TEC algorithm achieves statistically significant improvement over the classical algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2023

Permutation Decision Trees

Decision Tree is a well understood Machine Learning model that is based ...
research
01/25/2018

Information gain ratio correction: Improving prediction with more balanced decision tree splits

Decision trees algorithms use a gain function to select the best split d...
research
04/11/2020

Combinatorial Decision Dags: A Natural Computational Model for General Intelligence

A novel computational model (CoDD) utilizing combinatory logic to create...
research
07/11/2019

On the Optimality of Trees Generated by ID3

Since its inception in the 1980s, ID3 has become one of the most success...
research
04/12/2016

Confidence Decision Trees via Online and Active Learning for Streaming (BIG) Data

Decision tree classifiers are a widely used tool in data stream mining. ...
research
08/16/2021

Task-wise Split Gradient Boosting Trees for Multi-center Diabetes Prediction

Diabetes prediction is an important data science application in the soci...
research
11/30/2020

Using dynamical quantization to perform split attempts in online tree regressors

A central aspect of online decision tree solutions is evaluating the inc...

Please sign up or login with your details

Forgot password? Click here to reset