In supervised learning, Machine learning is a process of identifying new or unknown samples employing classification algorithms based on a group of instances[1, 2, 3, 4]. In real world, datasets are often high dimensional, multi class and imbalanced. Typical machine learning algorithms often fail to get a good classification accuracy on these datasets. There are two types of methods to deal with imbalanced datasets : (a) internal and (b) external method. Internal method modifies the preexisting algorithm in order to reduce their sensitiveness to the imbalance ratio of the dataset. External methods apply various data balancing techniques to reduce the imbalance ratio of the dataset.
Primarily there are two types of sampling methods to modify the original distribution of the dataset, they are over sampling and under sampling. Under sampling method reduces instances from the major class based on some intuition or just randomly. Several mainstream under sampling methods are neighborhood cleaning rule , near miss , clustered under sampling [7, 8], One sided selection . Over sampling methods perform in the opposite manner of that of under sampling. In stead of removing instances from the major class it generates samples of minority class using the minority class itself using various techniques such as AdaSyn , SMOTE  or randomly. Both of these techniques have some draw backs. Under sampling method has the potential to lose informative data as it reduces samples from major class data. On the other hand Over sampling generates samples from the minority which creates the potential risk of over fitting .
In the case of imbalanced datasets, the minority class instances are often outnumbered. Even though the concept they represent is usually more important than that of the major class. Traditional machine learning algorithms for data mining like k-nearest neighbors 3]13]14] usually tries to maximize the accuracy of classification rate while ignoring miss classification cost of minority class. Various cost sensitive methods has been proposed to deal the class imbalance problem. Cost sensitive learning applies different costs for misclassification errors to each classes. The goal is to set costs in such way that misclassification cost for minority class will be high and low for majority class. Stable classification is hard to find using cost sensitive method as it is very difficult to set the correct misclassification cost for each class. Ensemble models like bagging and boosting are typically used for imbalanced classification [10, 15]. Ensemble classifier is a type of algorithm where multiple learners are used to improve the performance of individual classification by combining hypotheses of each learner .
In this paper, we present a new boosting algorithm called MEBoost, mixing two weak estimators alternately on the training set. As weak estimators, we have used decision tree and extra tree classifier. By this way are we taking advantage of both learners while avoiding the limitations by using a single base classifier in a boosting model. We tested the performance of MEBoost with other state-of-the-art boosting classifiers like Adaboost, RUSBoost, SMOTEBoost, DataBoost, EUSBoost, Easy ensemble on 12 standard benchmark imbalanced datasets. From the experimental results, it can be validated that using two learners such as decision tree and extra tree classifier alternately with Adaboost significantly improves over the performance of other algorithms and is a promising technique to handle class imbalance problem.
Ii Related work
Through out last decade various sampling methods, ensemble methods, ensemble methods based on bagging and boosting has been the prime focus for dealing the classification problem with class imbalance datasets. Fig. 1 depicts a general sketch of applying boosting algorithm to classification problem.
Several ensemble methods are proposed in the literature to handle imbalanced datasets [10, 15]. An ensemble method was proposed by Sun et al.  which converts a imbalance binary class problem to multiple learning process. The proposed method divided the majority class instances into several sub datasets. Here each sub datasets holds almost same number of minority class instances. Thus several balances datasets were created and used to create a binary classifiers. Then a combination of those classifiers were used to learn an ensemble classifier.
Boosting is a meta classifier that combines the predictions of multiple base estimators and uses a voting technique for classification. It assigns weights to instances based on how arduous they are to classify. Thus it sets high weights to to hard instances. Here each estimator’s contributed weight is used by the next estimator. Then based on the base learner’s predictive accuracy weights are assigned to it. There weights are taken in consideration for new instance prediction. Though boosting was not intended for class imbalance problem, due to this characteristic it has become quite ideal for class imbalance problem.
RUSBoost  is a hybrid boosting algorithm using Adaboost with random under sampling as sampling method. From an imbalanced data random under sampling randomly removes instances from major class in each iteration. A Adaboost  was used with random under sampling to create the RUSBoost algorithm. Similarly SMOTEBoost was created using Adaboost and a over sampling technique called SMOTE. This method was proposed in . It over-samples the minority class instances using an over sampling technique called SMOTE . By employing nearest neighbors of minority class, synthetic instances are generated by operating in feature space. The above mentioned methods which uses sampling inside Adaboost showed impressive performance in terms of area under Receiver Operating Characteristic (ROC) curve. Investigation on the behavior of SMOTEBoost was performed by Blagus and Lusa 
on imbalanced datasets with high dimensions. Here dataset with high dimension means where there are more features than the instances. They came to the conclusion that, as SMOTE biases the classifier towards minority class, it is necessary to do feature selection.
De Souza et al. proposed a new dynamic 
Adaboost algorithm where 10 different estimators were used alternately in each iteration. As Adaboost keeps the better estimators and discards estimator with high error, by allowing 10 different estimators alternately it reduces the burden of the user to choose a learner. This algorithm does not follow the weak learner concept as it uses estimators like Random forest, SVM, Neural Network thus also making it highly computationally expensive. An evolutionary ensemble boosting algorithm was proposed by Galar et al.. It uses evolutionary under-sampling method thus called EUSBoost. This algorithm creates several sub datasets are generated by random under sampling method in order to find the best under-sampled from the original dataset. EUSBoost was also built based on AdaBoost algorithm .
DataBoost algorithm or DataBoost-IM method was presented by Hongyu Guo . He proposed an ensemble model which uses data generation. In this algorithm, hard majority and minority class instances are identified during the execution of boosting. Then those hard examples are chosen separately and used to create synthetic instances of respective class. After that those created instances are added to the main dataset. Easy Ensemble is an ensemble method was proposed by Xu-Yung Liu . They create several subsets of majority class instances. Then using each of those datasets it trains a learner. These subsets are created using random under sampling. However, it creates several sub datasets to overcome the main limitation of random under sampling which is it discard instances from majority class randomly regardless of its importance.
Iii MEBoost Algorithm
Most of the boosting algorithms discussed in Section II used a single weak estimator to create the ensemble model. In our proposed method MEBoost, instead of using a single estimator we use two different estimators alternately. For each iteration it either uses decision tree or extra tree classifier as its learner. By doing this the algorithm is taking the benefits of both classifiers. The algorithm is also discarding learners with poor performance by design of the boosting procedure. MEBoost algorithm does not perform any sampling on the train set. For each iteration we use decision tree and Extra tree classifier alternately on the train dataset. Decision trees are built from train set using information entropy . The train data is classified samples. Sample consists of vector . Here is a representation of samples’ feature or values of the attribute. For each node an feature/attribute is choose in such way that it splits the train dataset into subsets of each class most effectively. Extra tree is a randomized tree classification algorithm. While looking for best split in order to separate instances of a node into groups, Extra tree draws random splits for each number of feature randomly selected and among them the best split is chosen . Extra Tree Classifier acts like a decision tree if the number of randomly selected features are 1.
The pseudo-code of our proposed method MEBoost is given in Algorithm 1. It is a modification of the basic Adaboost classifier. The number of base classifiers is not restricted here. However, at each iteration of the algorithm, MEBoost tests each of the weak estimators learned and discards if it fails to be a weak classifier or its error rate is greater or equal to 0.5. The meta classifier is tested on the test data kept apart and stores the best combination according to the auROC score. It keeps adding weak learners to the model until there are no significant change in the auROC on the test data.
The intuition behind this idea is that tree algorithms like extra tree and decision tree usually are better suited for boosting scheme cause of their instability. In a particular dataset, any number of SVM is more likely to create similar decision boundaries. But there is a good possibility that the on that dataset each tree algorithms will generate different trees in different ways and cover different sub-spaces of the total dataset. So as they cover different sub-spaces, combing them under the boosting scheme is a excellent recipe for good classification algorithm. In order to further maximize the diversity, we used 2 different tree algorithms, ie Extra tree and decision tree, under the boosting scheme. In Fig 2, the rectangular represents the whole dataset and each black dot represents an instance. Here the dots inside red box represents the part of dataset that has be explored by a Decision Tree(DT). Similarly the green boundary represents the part of dataset that has be explored by a Extra Tree(ET). By using them inside a boosting mechanism, we take the instability characteristic of the tree algorithms as an advantage for better classification results.
Convergence of the algorithm depends on a stagnation window parameter . In this paper, we kept . Therefore after the best combination found it will continue up to adding estimators in the model and if there are no significant improvement in the test score then the ensemble model will return the best meta classifier learned. This early stopping criteria was first used by Bühlmann in 2003  and were later studied by Jiang . MEBoost algorithm is a combination of De Souza’s alternative estimator usage and Bühlmann’s early stopping criteria in an Adaboost algorithm.
Iv Experimental Results
This section presets the experimental results and analysis of the performance of the MEBoost algorithm.
Iv-a Benchmark Datasets
Datasets with different imbalance ratio were chosen from KEEL-dataset repository . The following TABLE I presents a summary of the datasets. Each of the datasets imbalanced with imbalance ratio lies in the range from 1.87 up to 41.03.
Iv-B Evaluation Metrics
Several evaluation metrics are used in the literature to measure the performance of classification algorithms. In this paper we used, area under Receiver Operating characteristic curve (auROC) as the comparison metric that has been widely used as the standard for comparison of performance in the literature of imbalanced datasets. ROC curve is a representation of best decision boundaries for cost between true positive rate (TPR) and false positive rate (FPR). ROC curve plots TPR against FPR. TPR and FPR are defined as following:
Here, TP denotes the number of positive samples correctly classified, TN denotes the number of negative samples correctly classified, FP denotes the number of negative samples incorrectly classified and FN denotes the number of positive samples correctly classified by the estimator. A point on auROC curve is limited between up to , where means all instances are misclassified and means all positive instances are classified correctly. The line
is the minimum threshold as that line represents the scenario where classes are randomly guessed. Area Under the ROC Curve is very useful as performance metric for class imbalance problems. Because it doesn’t depend on decision criterion selected and prior probabilities. A dominant relationship can be established between classifiers using AUC comparison.
Iv-C1 MEBoost vs other boosting algorithms
We compared the performance of MEBoost with that of RUSBoost, AdaBoost, Easy Ensemble, DataBoost, SMOTEBoost and EUSBoost methods. Classification performance on the datasets were measured in terms of auROC. For the other methods C4.5 decision tree was used as base learner. In case of MEBoost, our proposed method, we used C4.5 and Extra Tree classifiers. Keel-dataset  repository’s implementation was used for the DataBoost, AdaBoost, RUSBoost, SMOTEBoost, EUSBoost and Easy Ensemble algorithms. All the datasets were spited in 3 sections : Train set, Test set and Validation set. Validation set contains of the dataset. The classifier was trained and tested on train and test sets using 5 fold cross validation. Mean auROC scores on validation set from 10 experiments are shown in TABLE II.
TABLE II shows the performance of the MEBoost classifier against other state of the art Boosting algorithms such as Adaboost, EUSBoost, EasyEnsemble, SMOTEBoost, RUSBoost and DataBoost. MEBoost was able to achieve highest auROC score in all the datasets except pima where RUSBoost achieved the highest value. Note that the imbalance ratio of pima dataset is the lowest among all datasets.
Iv-C2 Multiple estimator vs single estimator
We also compared the performance of MEBoost with boosting using single base estimators like Decision tree, Extra tree, Random forest, Support Vector Machine. Table III presents the results for this experiment. Here we used an AdaBoost algorithm. As a learner we used decision tree, extra tree, Support vector machine, Random Forest and they have been compared with our proposed method where we use multiple estimators, decision tree and extra tree alternately. The results in Table III shows that the benefit of the usage of multiple estimators is promising. Even though in some cases the proposed method were unable to obtain the best result, training a learner like decision tree or extra tree takes much less computational power than training a support vector machine(SVM) or random forest. In dataset glass-0-1-2-3 and yeast6 MEBoost and Random Forest estimator achieved the highest auROC score. Similarly in dataset newthyroid2 and yeast5 MEBoost and Random Forest estimator achieved the top auROC score. In newthyroid1 and segment0 MEBoost, Random Forest estimator and SVM estimator achieved the highest auROC score together. Computation wise Support Vector Machine and Random are far more complex than Extra tree and decision tree together. So even though they provided similar score MEBoost is more preferable considering computational cost.
From the tabular results from TABLE II, III we have found that the multiple estimator technique shows better performance comparing with other state of the art boosting algorithms and other weak or strong learners as well. We also present the ROC analysis for all different datasets and the plots are given in Fig 3.
Most classification algorithms primarily focuses on majority class instances rather than the minority class instances which is more important. So, the task is quite challenging to construct a classifier which can classify minority class instances correctly. In the hope to alleviate this class imbalance problem in this paper a new algorithm called MEBoost, or Boosting with multiple learner, is presented. MEBoost has been compared with effective boosting techniques like SMOTEBoost, RUSBoost, Adaboost, DataBoost, EUSBoost and Easy Ensemble algorithms.From the experimental results, we have concluded that MEBoost performed favorably comparing with similar techniques.
MEBoost algorithm is different from all the other boosting methods because it uses C4.5 and Extra tree classifier alternately instead of using only one of them. This allows to take advantage of both learner’s characteristic and discard their individual weaknesses. Both C4.5 and Extra tree classifier have the pros and cons over one another. The results show that in using 2 different estimators instead on 1 has much impact on auROC score. In future, we intend to perform extensive experiments to continue investigating the performance of MEBoost with other learners.
-  D. M. Farid, M. A. Al-Mamun, B. Manderick, and A. Nowe, “An adaptive rule-based classifier for mining big biological data,” Expert Systems with Applications, vol. 64, pp. 305–316, December 2016.
-  D. M. Farid, L. Zhang, C. M. Rahman, M. Hossain, and R. Strachan, “Hybrid decision tree and naïve bayes classifiers for multi-class classification tasks,” Expert Systems with Applications, vol. 41, no. 4, pp. 1937–1946, March 2014.
-  D. M. Farid, L. Zhang, A. Hossain, C. M. Rahman, R. Strachan, G. Sexton, and K. Dahal, “An adaptive ensemble classifier for mining concept drifting data streams,” Expert Systems with Applications, vol. 40, no. 15, pp. 5895–5906, November 2013.
-  D. M. Farid, A. Nowé, and B. Manderick, “A new data balancing method for classifying multi-class imbalanced genomic data,” 25th Belgian-Dutch Conference on Machine Learning (Benelearn), pp. 1–2, 12-13 September 2016.
-  J. Laurikkala, “Improving identification of difficult small classes by balancing class distribution,” Artificial Intelligence in Medicine, pp. 63–66, 2001.
I. Mani and I. Zhang, “knn approach to unbalanced data distributions: a case study involving information extraction,” inProceedings of workshop on learning from imbalanced datasets, vol. 126, 2003.
-  S.-J. Yen and Y.-S. Lee, “Cluster-based under-sampling approaches for imbalanced data distributions,” Expert Systems with Applications, vol. 36, no. 3, pp. 5718–5727, 2009.
-  F. Rayhan, S. Ahmed, S. Shatabda, D. M. Farid, Z. Mousavian, A. Dehzangi, and M. S. Rahman, “idti-esboost: Identification of drug target interaction using evolutionary and structural features with boosting,” arXiv preprint arXiv:1707.00994, 2017.
-  M. Kubat, S. Matwin et al., “Addressing the curse of imbalanced training sets: one-sided selection,” in ICML, vol. 97. Nashville, USA, 1997, pp. 179–186.
-  H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, June 2009.
-  N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, June 2002.
-  Z. Sun, Q. Song, X. Zhu, H. Sun, B. Xu, and Y. Zhou, “A novel ensemble method for classifying imbalanced data,” Pattern Recognition, vol. 48, no. 5, pp. 1623–1637, May 2015.
-  C. Cortes and V. Vapnik, “Support vector machine,” Machine learning, vol. 20, no. 3, pp. 273–297, 1995.
-  A. Liaw, M. Wiener et al., “Classification and regression by randomforest,” R news, vol. 2, no. 3, pp. 18–22, 2002.
-  Y. Sun, A. K. C. Wong, and M. S. Kamel, “Classification of imbalance data: A review,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 23, no. 4, pp. 687–719, June 2009.
-  C. Seiffert, T. M. Khoshgoftaar, J. V. Hulse, and A. Napolitano, “Rusboost: A hybrid approach to alleviating class imbalance,” IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, vol. 40, no. 1, pp. 185–197, January 2010.
-  Y. Freund, R. E. Schapire et al., “Experiments with a new boosting algorithm,” in icml, vol. 96, 1996, pp. 148–156.
-  N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, “Smoteboost: Improving prediction of the minority class in boosting,” 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 107–109, 22-26 September 2003.
-  R. Blagus and L. Lusa, “SMOTE for high-dimensional class-imbalanced data,” BMC Bioinformatics, vol. 14, no. 106, pp. 1–16, March 2013.
-  É. de Souza and S. Matwin, “Extending adaboost to iteratively vary its base classifiers,” Advances in Artificial Intelligence, pp. 384–389, 2011.
-  M. Galar, A. Fernández, E. Barrenechea, and F. Herrera, “Eusboost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling,” Pattern Recognition, vol. 46, no. 12, pp. 3460–3471, December 2013.
-  H. Guo and H. L. Viktor, “Learning from imbalanced data sets with boosting and data generation: the databoost-im approach,” ACM Sigkdd Explorations Newsletter, vol. 6, no. 1, pp. 30–39, 2004.
-  X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539–550, 2009.
-  J. R. Quinlan, “Improved use of continuous attributes in c4. 5,” Journal of artificial intelligence research, vol. 4, pp. 77–90, 1996.
-  P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine learning, vol. 63, no. 1, pp. 3–42, 2006.
-  Y. Yao, L. Rosasco, and A. Caponnetto, “On early stopping in gradient descent learning,” Constructive Approximation, vol. 26, no. 2, pp. 289–315, 2007.
-  P. Bühlmann and B. Yu, “Boosting with the l 2 loss: regression and classification,” Journal of the American Statistical Association, vol. 98, no. 462, pp. 324–339, 2003.
-  W. Jiang, “Process consistency for adaboost,” Annals of Statistics, pp. 13–29, 2004.
-  J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. García, L. Sánchez, and F. Herrera, “Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework,” Journal of Multiple-Valued Logic and Soft Computing, vol. 17, no. 2-3, pp. 255–287, 2011.