WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning
Machine learning classifiers often stumble over imbalanced datasets where classes are not equally represented. This inherent bias towards the majority class may result in low accuracy in labeling minority class. Imbalanced learning is prevalent in many real world applications, such as medical research, network intrusion detection, and fraud detection in credit card transaction, etc. A good number of research works have been reported to tackle this challenging problem. For example, SMOTE (Synthetic Minority Over-sampling TEchnique) and ADASYN (ADAptive SYNthetic sampling approach) use oversampling techniques to balance the skewed datasets. In this paper, we propose a novel method which combines a Weighted Oversampling Technique and ensemble Boosting method to improve the classification accuracy of minority data without sacrificing the accuracy of majority class. WOTBoost adjust its oversampling strategy at each round of boosting to synthesize more targeted minority data samples. The adjustment is enforced using a weighted distribution. We compared WOTBoost with other 4 classification models (i.e. decision tree, SMOTE + decision tree, ADASYN + decision tree, SMOTEBoost) extensively on 18 public accessible imbalanced datasets. WOTBoost achieved the best G mean on 6 datasets and highest AUC score on 7 datasets.
READ FULL TEXT