Cost-sensitive C4.5 with post-pruning and competition

11/17/2012
by   Zilong Xu, et al.
0

Decision tree is an effective classification approach in data mining and machine learning. In applications, test costs and misclassification costs should be considered while inducing decision trees. Recently, some cost-sensitive learning algorithms based on ID3 such as CS-ID3, IDX, λ-ID3 have been proposed to deal with the issue. These algorithms deal with only symbolic data. In this paper, we develop a decision tree algorithm inspired by C4.5 for numeric data. There are two major issues for our algorithm. First, we develop the test cost weighted information gain ratio as the heuristic information. According to this heuristic information, our algorithm is to pick the attribute that provides more gain ratio and costs less for each selection. Second, we design a post-pruning strategy through considering the tradeoff between test costs and misclassification costs of the generated decision tree. In this way, the total cost is reduced. Experimental results indicate that (1) our algorithm is stable and effective; (2) the post-pruning technique reduces the total cost significantly; (3) the competition strategy is effective to obtain a cost-sensitive decision tree with low cost.

READ FULL TEXT

page 12

page 13

research
09/24/2015

CRDT: Correlation Ratio Based Decision Tree Model for Healthcare Data Mining

The phenomenal growth in the healthcare data has inspired us in investig...
research
11/12/2012

Minimal cost feature selection of data with normal distribution measurement errors

Minimal cost feature selection is devoted to obtain a trade-off between ...
research
01/25/2018

Information gain ratio correction: Improving prediction with more balanced decision tree splits

Decision trees algorithms use a gain function to select the best split d...
research
04/12/2016

Confidence Decision Trees via Online and Active Learning for Streaming (BIG) Data

Decision tree classifiers are a widely used tool in data stream mining. ...
research
06/03/2011

An Analysis of Reduced Error Pruning

Top-down induction of decision trees has been observed to suffer from th...
research
06/18/2017

Data set operations to hide decision tree rules

This paper focuses on preserving the privacy of sensitive patterns when ...
research
01/24/2023

A Robust Hypothesis Test for Tree Ensemble Pruning

Gradient boosted decision trees are some of the most popular algorithms ...

Please sign up or login with your details

Forgot password? Click here to reset