WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning

10/17/2019
by   Wenhao Zhang, et al.
0

Machine learning classifiers often stumble over imbalanced datasets where classes are not equally represented. This inherent bias towards the majority class may result in low accuracy in labeling minority class. Imbalanced learning is prevalent in many real world applications, such as medical research, network intrusion detection, and fraud detection in credit card transaction, etc. A good number of research works have been reported to tackle this challenging problem. For example, SMOTE (Synthetic Minority Over-sampling TEchnique) and ADASYN (ADAptive SYNthetic sampling approach) use oversampling techniques to balance the skewed datasets. In this paper, we propose a novel method which combines a Weighted Oversampling Technique and ensemble Boosting method to improve the classification accuracy of minority data without sacrificing the accuracy of majority class. WOTBoost adjust its oversampling strategy at each round of boosting to synthesize more targeted minority data samples. The adjustment is enforced using a weighted distribution. We compared WOTBoost with other 4 classification models (i.e. decision tree, SMOTE + decision tree, ADASYN + decision tree, SMOTEBoost) extensively on 18 public accessible imbalanced datasets. WOTBoost achieved the best G mean on 6 datasets and highest AUC score on 7 datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2017

MEBoost: Mixing Estimators with Boosting for Imbalanced Data Classification

Class imbalance problem has been a challenging research problem in the f...
research
08/30/2019

Credit Card Fraud Detection Using Autoencoder Neural Network

Imbalanced data classification problem has always been a popular topic i...
research
06/09/2011

SMOTE: Synthetic Minority Over-sampling Technique

An approach to the construction of classifiers from imbalanced datasets ...
research
02/02/2020

Towards Deep Machine Reasoning: a Prototype-based Deep Neural Network with Decision Tree Inference

In this paper we introduce the DMR – a prototype-based method and networ...
research
09/10/2019

Spam filtering on forums: A synthetic oversampling based approach for imbalanced data classification

Forums play an important role in providing a platform for community inte...
research
08/22/2019

LoRAS: An oversampling approach for imbalanced datasets

The Synthetic Minority Oversampling TEchnique (SMOTE) is widely-used for...
research
09/17/2019

Communication-Efficient Weighted Sampling and Quantile Summary for GBDT

Gradient boosting decision tree (GBDT) is a powerful and widely-used mac...

Please sign up or login with your details

Forgot password? Click here to reset