DeepAI AI Chat
Log In Sign Up

Cost-sensitive C4.5 with post-pruning and competition

by   Zilong Xu, et al.
NetEase, Inc

Decision tree is an effective classification approach in data mining and machine learning. In applications, test costs and misclassification costs should be considered while inducing decision trees. Recently, some cost-sensitive learning algorithms based on ID3 such as CS-ID3, IDX, λ-ID3 have been proposed to deal with the issue. These algorithms deal with only symbolic data. In this paper, we develop a decision tree algorithm inspired by C4.5 for numeric data. There are two major issues for our algorithm. First, we develop the test cost weighted information gain ratio as the heuristic information. According to this heuristic information, our algorithm is to pick the attribute that provides more gain ratio and costs less for each selection. Second, we design a post-pruning strategy through considering the tradeoff between test costs and misclassification costs of the generated decision tree. In this way, the total cost is reduced. Experimental results indicate that (1) our algorithm is stable and effective; (2) the post-pruning technique reduces the total cost significantly; (3) the competition strategy is effective to obtain a cost-sensitive decision tree with low cost.


page 12

page 13


CRDT: Correlation Ratio Based Decision Tree Model for Healthcare Data Mining

The phenomenal growth in the healthcare data has inspired us in investig...

Minimal cost feature selection of data with normal distribution measurement errors

Minimal cost feature selection is devoted to obtain a trade-off between ...

Information gain ratio correction: Improving prediction with more balanced decision tree splits

Decision trees algorithms use a gain function to select the best split d...

Confidence Decision Trees via Online and Active Learning for Streaming (BIG) Data

Decision tree classifiers are a widely used tool in data stream mining. ...

An Analysis of Reduced Error Pruning

Top-down induction of decision trees has been observed to suffer from th...

Data set operations to hide decision tree rules

This paper focuses on preserving the privacy of sensitive patterns when ...

On Using Linear Diophantine Equations to Tune the extent of Look Ahead while Hiding Decision Tree Rules

This paper focuses on preserving the privacy of sensitive pat-terns when...