Neural Network Based Undersampling Techniques

08/18/2019
by   Md. Adnan Arefeen, et al.
0

Class imbalance problem is commonly faced while developing machine learning models for real-life issues. Due to this problem, the fitted model tends to be biased towards the majority class data, which leads to lower precision, recall, AUC, F1, G-mean score. Several researches have been done to tackle this problem, most of which employed resampling, i.e. oversampling and undersampling techniques to bring the required balance in the data. In this paper, we propose neural network based algorithms for undersampling. Then we resampled several class imbalanced data using our algorithms and also some other popular resampling techniques. Afterwards we classified these undersampled data using some common classifier. We found out that our resampling approaches outperform most other resampling techniques in terms of both AUC, F1 and G-mean score.

READ FULL TEXT VIEW PDF

Authors

page 7

09/23/2020

I-SiamIDS: an improved Siam-IDS for handling class imbalance in network-based intrusion detection systems

NIDSs identify malicious activities by analyzing network traffic. NIDSs ...
03/13/2019

Predicting class-imbalanced business risk using resampling, regularization, and model ensembling algorithms

We aim at developing and improving the imbalanced business risk modeling...
08/30/2019

Credit Card Fraud Detection Using Autoencoder Neural Network

Imbalanced data classification problem has always been a popular topic i...
08/22/2019

LoRAS: An oversampling approach for imbalanced datasets

The Synthetic Minority Oversampling TEchnique (SMOTE) is widely-used for...
03/22/2020

Deep Synthetic Minority Over-Sampling Technique

Synthetic Minority Over-sampling Technique (SMOTE) is the most popular o...
12/03/2020

ReMix: Calibrated Resampling for Class Imbalance in Deep learning

Class imbalance is a problem of significant importance in applied deep l...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Many real-life problems like diagnosis of diseases [1], weather prediction [2], fraud detection [3] etc. can be modelled as classification problems and can be tackled by developing machine learning models. However, in most cases, it is found that the data obtained are not balanced, that is it is not possible to collect the same number of samples for all the classes, thereby making the resulting data set class imbalanced. This problem of imbalance poses serious challenges towards developing machine learning models as follows. The models become biased towards the majority class and hence mostly fail to detect minority classes. Therefore, in such a scenario, despite obtaining a good accuracy, we do not obtain good scores in terms of other metrics of performance like F1 score [4], AUC score [5], G-mean score [6] etc. Since this issue of class imbalance is challenging and damaging, significant attention has been given to solve this issue in the literature [7]. The most common methods among these use resampling techniques to bring balance in the dataset. Resampling can be done by reducing the number of majority class samples. This technique is popularly known as undersampling. Some common undersampling techniques include cluster centroids [8], tomek’s links [9], neighbourhood cleaning rule [10] etc. Resampling can also be done by increasing the number of minority class samples by either duplicating some data or generating new data. This technique is called oversampling. SMOTE [11] and several variants of SMOTE [12] [13], ADASYN [14] etc. are some frequently used oversampling techniques.

In spite of significant works in this area in the literature, there are little scope for much improvement. With this backdrop, we revisit neural network based approaches for under-sampling. Neural networks have been successfully used for tasks like image recognition, natural language processing etc. in recent years. We explored the possibility of using the potentials of neural networks to capture intricate patterns within data to solve the issue of class imbalance.

Ii Related Work

Class imbalance, being a challenging problem, has attracted many researcher’s attention throughout the past and recent years. To bring balance in the imbalanced data, strategies like oversampling and undersampling of data were employed. These researches were conducted as early as 1972. A popular algorithm for undersampling, Edited Nearest Neighbor (ENN) rule was proposed in the paper [15]. ENN works by removing the data points whose class label does not match the majority of its k nearest neighbors. Another popular algorithm for undersampling, Tomek links removal (TLL) was introduced in [9]. This algorithm works by detecting pair of data points, called Tomek link, that are each other’s nearest neighbor but have different class labels. Undersampling can be done by either removing all Tomek links or by removing the majority class data belonging to the Tomek link. The NearMiss (NM) methods perform undersampling by removing data points from majority class based on their distances between each other [16]

. In NearMiss-1, the points in majority class whose mean distance to the k-nearest points in minority class is lowest are retained, where k is a tunable hyperparameter. Whereas, NearMiss-2 retains those points from majority class whose mean distance with k farthest points in minority class is lowest. In the final version of NearMiss, NM-3, for every data point in minority class, k nearest data points in minority class are retained. In addition to these undersampling techniques, there is another undersampler called clustered centroids

[8]

which makes use of k-means clustering to balance an imbalanced dataset by reducing the number of majority samples.

Iii Methods

We use an auto-encoder and a simple artificial neural network for training the minority class. Figure 1 and Figure 2 depict two such models. We fitted the minority data using one of the two models. A threshold value was set to choose the kind of neural network to be used to train the minority samples. We have set the threshold value to 30. If the number of input attributes are more than 30, we have fitted the minority samples using an autoencoder; otherwise, we fitted those with a simple neural networks with 2/3 hidden layers. Notably, solving the issue of over-fitting was not a major concern for our task. The reason behind this is, we have to generate a minority sample with approximately 100% accuracy. If we can not fit the minority samples well, we may loose information on predicting majority samples. If the model is not strong enough, it may propagate error when predicting majority samples.

[height=5.5] [count=4, bias=false, title=Input
layer, text=[count=5, bias=false, title=Hidden
layer 1, text=[count=5, bias=false, title=Hidden
layer 2, text=[count=4, title=Output
layer, text=

Figure 1:

Simple Neural Network to generate input. The nodes shown in green color are inputs. Two hidden layers are shown in blue color. The output layer is shown in red color. The line between each node represents connection between each layer. The network is fully connected. There are five neurons aka nodes for each hidden layer shown in the figure.

[height=10, layertitleheight=0, nodespacing=0.8cm, layerspacing=3cm] [count=8, bias=false, title=Input
layer, text=[count=6, bias=false, title=Hidden
layer 1, text=[count=4, bias=false, title=Hidden
layer 2, text=[count=6, bias=false, title=Hidden
layer 3, text=[count=8, title=Output
layer, text=

Figure 2: An autoencoder neural network model. The inputs are regenerated by first encoding and then by decoding the encoded representation. The output of the middle layer is the encoded representation of the inputs. The models trains the inputs through an unsupervised approach.

Iii-a Undersampling Algorithm

Iii-A1 Algorithm 1: Hard Neural Network Based Undersampling

Suppose, we have minority samples and majority samples in the dataset under consideration. In this algorithm, we train a neural network (autoencoder or feedforward, decided based on the value of a predefined threshold as discussed in the previous section) to learn the values of features of the minority samples and then we use the same neural network to predict the features of the majority samples. Then we calculate the euclidean distance between the predicted and the real values of the features. In a list, we store the values of these euclidean distances mapped by the indices of the corresponding majority class samples. We then sort the list in descending order based on the values of the euclidean distances calculated. From this sorted list we choose first data samples. The final dataset obtained is the combination of minority class data from the original dataset and majority class data chosen by our approach. So in effect we choose those samples from the majority class that are far in terms of euclidean distance from the predicted values. In other words, our under-sampling approach actually removes the majority class samples which are present in the vicinity of the minority class samples and retains the majority class samples which are located further from the minority class samples. Hence the decision boundary becomes more defined and the resulting balanced dataset becomes more separable. As a consequence, this algorithm outperforms most other undersampling algorithms for most datasets. However, we noted that, this algorithm performs the best when there is no overlap between data points, as will be evident in section 5 when we will generate some artificial data points and observe the performance of the algorithm on those data. For overlapping data, we have proposed another algorithm in the next subsection.

1: number of samples of the minority class
2: number of samples of the majority class
3: number of attributes
4: Samples of the Majority class
5: Samples of the Minority class
6:if  then
7:     
8:else
9:     
10:end if
11:
12:for  each  do
13:     
14:     
15:     
16:     
17:end for
18: Sort the indices of samples according to descending order of distance
19: select first number of indices from the
20:
21:for each  do
22:     
23:end for
24:
Algorithm 1 Hard Under-sampling Using Neural Network

Iii-A2 Algorithm 2: Soft Neural Network Based Undersampling

As discussed at the end of the previous subsection, our proposed Hard Neural Network Based Undersampling algorithm (NUS-1) does not perform well when there is overlap between data points in the dataset. To resolve this issue, we have proposed a new algorithm in this subsection called the soft neural network based undersampling. The soft neural network based undersampling (NUS-2) differs from hard neural network based undersampling in how the majority samples are selected. We choose exactly the first samples from majority class from the indices which are far from its predicted values. That is why we called the algorithm Hard Neural Network Based Undersampling. At first we predict the minority samples by the model that was fitted on the samples from the minority class. The maximum euclidean distance is calculated. Besides calculating the maximum distance, we also calculate the average distances of half of the samples which are greater in value than the other half of the samples. After that, we predict the samples of the majority class with the same model. This time we choose the samples of majority class as follows. We feed one sample to the model, generated its clone by the model and calculated euclidean distance between the two. If the distance is higher than the maximum distance or the half-average distance of the samples from the minority class, we include it in the final dataset as a sample of majority class. The soft neural network based undersampling algorithm performs better than all other undersampling algorithms for sampling overlapping data, as will be observed in section 5 when we will see the effect of different undersampling algorithms on artificially generated overlapping data.

1: number of samples of the minority class
2: number of samples of the majority class
3: number of attributes
4: Samples of the Majority class
5: Samples of the Minority class
6:if  then
7:     
8:else
9:     
10:end if
11:
12:for  each  do
13:     
14:     
15:     
16:     
17:end for
18: Sort the indices of samples according to descending order of distance
19: Average distance of half of the minority samples whose indices are in first half of
20: Maximum distance calculated among the minority samples’ with their prediction
21:for each sample in  do
22:     Predict the majority sample’s attribute using the trained model
23:     Calculate euclidean distance between real and predicted attributes of the sample
24:     Map this distance with index of the sample
25:end for
26: Sort the indices of samples according to descending order of distance
27:
28:for each index in  do
29:     
30:     if  or  then
31:         
32:     end if
33:end for
34:
35:for each index in selectedIndices do
36:     Append()
37:end for
38:
Algorithm 2 Soft Under-sampling Using Neural Network

Iv Results Analysis

Iv-a Overview of the experiments

We have designed our experiments as follows. We under-sample the dataset under consideration using different undersampling algorithms. Subsequently, the under-sampled dataset is fed to a number of classifiers and we evaluate the classification results thereof. In Table I, we list the classifiers and the undersampling algorithms we used.

Undersampling Algorithms Classfier Algorithms
Edited Nearest Neighbour (ENN) [15] Random forest (RF) [17]

All KNN (AKNN) 

[9]
Gradient-boosting (GradBoost) [18]
Near Miss (NM-1 NM-2 NM-3) [16] K-nearest neighbour  [19]
Neighbourhood Cleaning Rule (NCR) [10] Stochastic gradient descent (SGD) [20]
Random Undersampling (RUS) Logistic Regresson (LR)  [21]
Tomek Link (TLL) [9]
Table I: Under-sampling Algorithms

The result analysis section is organised as follows. First, we gave a little description of the dataset used in this paper. Then, we demonstrated the metrics used in the experiment for comparison. After that, we showed the results generated by various classifiers such as Gradient Boosting Classifier (GradBoost), Stochastic Gradient Descent Classifier (SGD), K-nearest neighbour classifier (KNN), Random Forest(RF) and Logistic Regression (LR). We have used scikit-learn, scipy, numpy, pandas packages to implement all these algorithms and for data conversion 

[22, 23, 24, 8]

. We have used keras package to implement the neural network and the auencoder 

[25]. For graphical representation, we have used matplotlib package [24]. We made the dataset under sampled by different undersampling algorithms such as Edited Nearest Neighbour(ENN) [15], ALL KNN, Near Miss algorithm (Version- 1, 2 & 3) [16], Tomek link Undersampler (TLL) [9], Random Undersampler (RUS) and the proposed 2 algorithms Neural Network Based Undersampling 1 & 2 (NUS-1 & NUS-2) also called hard undersampling and soft undersampling algorithms using neural network. Later, these undersampled data with binary class were classified by the classifiers stated above. We showed the metric value produced by each classifier for comparison.

Iv-B Evaluation Criteria

For evaluating the performance of our proposed algorithm, we use some ROC (Receiver Operating Characteristics) curve 

[26] based performance metrics. Let +,- represent positive and negative class labels. Table II

called confusion matrix represents performance of classification algorithm. Based on the confusion matrix in Table

II the performance metrics as defined in this section are used to evaluate learning of imbalanced data sets by our proposed algorithms.

Predicted
+ -

Actual

+ True Positive (TP) False Negative (FN)
- False Positive (FP) True Negative (TN)
Table II: Confusion Matrix

For comparing the performance of different undersampling algorithms on classification, we use the metric Area under the Receiver Operating Characteristics (ROC) curve[26], the area under ROC curve is popularly known as AUC. AUC value measures the degree of separability between classes. Higher value of AUC indicates that the model is more capable of distinguishing the classes than a model with lower AUC value. The problem with imbalanced dataset is that any machine learning algorithm trained on these data becomes more biased towards the majority class. In addition, overlapping of samples from different classes also poses a problem to the performance of the model because it can not distinguish between classes. This phenomenon is reflected in lower AUC value during evaluation. Under-sampling potentially can solve the problem of imbalance by removing some samples from the majority class and thus by making the dataset more balanced. AUC value becomes higher when trained with these balanced data. In Table  V,  VIII and  XI and  XIV, we showed the AUC values of different machine learning models on some originally imbalanced datasets [27] resampled by several under-sampling techniques. The G-mean is defined as the square root of the product of true positives (TP) and false positives (FP). The equation is as follows.

(1)

The F1 measure is another popular performance metric to evaluate the performance of classification algorithms which is defined as follows.

(2)

The terms precision and recall in this formula refer to the ratio of true positives (TP) and false positives (FP) respectively to the total number of samples, defined as follows:

Iv-C Description of Dataset

We have used four real world datasets to do experiment on the proposed algorithms. All of them are from UCI machine learning repository [28]. The imbalanced ratio is defined as . The description of the data sets are available in Table III.

Dataset #attribute #min #maj Ratio
Ionosphere 34 126 225 1.78
Balance 4 49 576 11.8
Pima 8 268 500 1.9
Satimage 36 626 5809 9.27
Table III: Description of Dataset

The number of majority samples selected by each under sampling algorithms are described in Table IV.

Dataset ENN AKNN NM1 NM2 NM3 NUS1 NUS2 CC NCR TLL RUS
Ionosphere 216 215 126 126 99 126 105 126 146 225 126
Balance 452 427 49 49 49 49 161 49 544 571 49
Pima 279 249 268 268 268 268 204 268 261 450 268
Satimage 5319 5213 626 626 626 626 3045 626 5449 5770 626
Table IV: Number of Majority Samples Chosen by each undersampler

In almost all cases, we found that, our proposed undersamplers, NUS1 and NUS2 outperform all other undersamplers in case of almost all training algorithms. NUS1 and NUS2 resample the data in such a way that they become more separable as noted from Figures 3 and 4. This leads to higher AUC, G-mean and F1 values and hence better performance. It is to be noted that we have used a number of classifiers to verify that the proposed undersampling algorithms are not classifier dependent.

Balance Dataset
Method GradBoost SGD KNN RF LR
ENN
AKNN
NM1
NM2
NM3
NUS1
NUS2
CC
NCR
TLL
RUS
Table V: AUC values of Balance dataset using various classifiers
Balance Dataset
Method GradBoost SGD KNN RF LR
ENN
AKNN
NM1
NM2
NM3
NUS1
NUS2
CC
NCR
TLL
RUS
Table VI: G-Mean values of Balance dataset using various classifiers
Balance Dataset
Method GradBoost SGD KNN RF LR
ENN
AKNN
NM1
NM2
NM3
NUS1
NUS2
CC
NCR
TLL
RUS
Table VII: F1 values of Balance dataset using various classifiers
Pima Dataset
Method GradBoost SGD KNN RF LR
ENN
AKNN
NM1
NM2
NM3
NUS1
NUS2
CC
NCR
TLL
RUS
Table VIII: AUC values of Pima dataset using various classifiers
Pima Dataset
Method GradBoost SGD KNN RF LR
ENN
AKNN
NM1
NM2
NM3
NUS1
NUS2
CC
NCR
TLL
RUS
Table IX: G-Mean values of Pima dataset using various classifiers
Pima Dataset
Method GradBoost SGD KNN RF LR
ENN
AKNN
NM1
NM2
NM3
NUS1
NUS2
CC
NCR
TLL
RUS
Table X: F1 values of Pima dataset using various classifiers
Satimage Dataset
Method GradBoost SGD KNN RF LR
ENN
AKNN
NM1
NM2
NM3
NUS1
NUS2
CC
NCR
TLL
RUS
Table XI: AUC values of Satimage dataset using various classifiers
Satimage Dataset
Method GradBoost SGD KNN RF LR
ENN
AKNN
NM1
NM2
NM3
NUS1
NUS2
CC
NCR
TLL
RUS
Table XII: G-Mean values of Satimage dataset using various classifiers
Satimage Dataset
Method GradBoost SGD KNN RF LR
ENN
AKNN
NM1
NM2 5
NM3
NUS1
NUS2
CC
NCR
TLL 2
RUS
Table XIII: F1 values of Satimage dataset using various classifiers
Ionosphere Dataset
Method GradBoost SGD KNN RF LR
ENN
AKNN
NM1
NM2
NM3
NUS1
NUS2
NCR
TLL
RUS
CC
Table XIV: AUC values of Ionosphere dataset using various classifiers
Ionosphere Dataset
Method GradBoost SGD KNN RF LR
ENN
AKNN
NM1
NM2
NM3
NUS1
NUS2
CC
NCR
TLL
RUS
Table XV: G-mean values of Ionosphere dataset using various classifiers
Ionosphere Dataset
Method GradBoost SGD KNN RF LR
ENN
AKNN
NM1
NM2
NM3
NUS1
NUS2
NCR
TLL
RUS
CC
Table XVI: F1 scores of Ionosphere dataset using various classifiers

V Undersampling on Artificial Dataset

Th datasets on which we experimented so far have lots of features, which makes it difficult to visualize actually how the undersamplers undersample those datasets. Hence we have used two artificial datasets to visualise the effect of different under-samplers using scikit-learn package [22]

. The first dataset consists of two features, which makes it easy to plot the dimensions and visualize the data. There are 1000 majority samples and 100 minority samples in the dataset. So, the ratio of majority samples to minority samples is 10:1. The centers of two clusters are [0.0 0.0] and [2.0 2.0] respectively. The standard deviation of the cluster samples from its center are 1.5 and 0.5 each. The effect of each undersampler is shown in the Figure 

3.

Next, we generated the second dataset where the majority and minority samples are overlapping in nature.For the second dataset, the ratio of majority to minority is . In this case, the number of majority samples were same as before but the number of minority samples were . We choose the center of the two classes to be [0.0 0.0] and [0.02 0.05] respectively to introduce the overlapping criteria. The standard deviation of the two cluster samples from the center were respectively. The result of each sampler is shown in Figure 3 and Figure 4.

Figure 3: Artificial dataset resampling with various undersamplers
Figure 4: Artificial overlapped dataset resampled by various undersamplers

Vi Comparison between the proposed algorithms

It is observed from the classification results and the figures that refered to the effect of each under-sampler in the data that NUS1 performs well when there exists less overlapping in data. NUS1 algorithm actually retains those majority data points which are most distant from most of the minority data points. But in case of overlapped data, there could be some minority samples overlapped with the retained majority data. In case of non-overlapped data, this problem is minimal. Hence, NUS1 makes balanced dataset linearly separable. On the other hand, NUS2 finds the perimeter of minority samples by calculating the average distance of minority samples to its generator samples generated by the model. Then it retains those majority samples that are outside of the perimeter. By this way, overlapping is removed. Hence NUS2 performs better in classifying overlapping data. However, the choosing of distance whether maximum or average is a tunable parameter. We can indirectly verify the nature of the data by these two proposed methods.

Vi-a Case study

Now we observe a particular case which may arise due to a certain distribution of the data. It may happen that majority class data consists of outliers or data points that are at far distances from the minority data points and also the ratio of majority to minority is very high. In this case, the outliers from the majority data points should be removed first before implementing the proposed hard and soft undersampling algorithms. In Figure  

5 we have generated an artificial dataset using scikit-learn [22] package. The ratio of majority to minority is . The two proposed algorithms NUS-1 and NUS-2 always select the 50 points that are located far from minority class at the time of undersampling. In case of outlier, it may happen that the algorithms always choose the outlier data points at the time of undersampling. We have shown the data and effect of undersampling algorithms on data points in Figure  5.

Figure 5: NUS-1 and NUS-2 both are trying to choose far samples from minority class. In the figure, the coordinate of the centers are chosen as , respectively. The standard deviation from center are for the first and for the later one. We have used make_blob function from scikit-learn [22] to generate the data points.

Vii Concluding remarks

In this paper, we proposed two algorithms to solve the class imbalance problem. The main target of this paper is to balance the data i.e. bring down the number of majority samples to the number of minority samples. This approach might result into some drawbacks. If the majority to minority ratio is vary high, there is a high probability of loosing information from majority class. In this scenario, we can use the accuracy of predicting majority samples as a parameter to choose which batch of majority samples should be considered to mitigate the loss. Future works may address this issue.

References

  • [1] Bartosz Krawczyk, Mikel Galar, Łukasz Jeleń, and Francisco Herrera. Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Applied Soft Computing, 38:714–726, 2016.
  • [2] Sun Choi, Young Jin Kim, Simon Briceno, and Dimitri Mavris. Prediction of weather-induced airline delays based on machine learning algorithms. In 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC), pages 1–6. IEEE, 2016.
  • [3] Wei Wei, Jinjiu Li, Longbing Cao, Yuming Ou, and Jiahang Chen. Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web, 16(4):449–475, 2013.
  • [4] CJ Van Rijsbergen. Information retrieval 2nd edition butterworths. London available on internet, 1979.
  • [5] Jin Huang and Charles X Ling. Using auc and accuracy in evaluating learning algorithms. IEEE Transactions on knowledge and Data Engineering, 17(3):299–310, 2005.
  • [6] Miroslav Kubat, Stan Matwin, et al. Addressing the curse of imbalanced training sets: one-sided selection. In Icml, volume 97, pages 179–186. Nashville, USA, 1997.
  • [7] Yanmin Sun, Andrew KC Wong, and Mohamed S Kamel. Classification of imbalanced data: A review.

    International Journal of Pattern Recognition and Artificial Intelligence

    , 23(04):687–719, 2009.
  • [8] Guillaume Lemaître, Fernando Nogueira, and Christos K. Aridas. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17):1–5, 2017.
  • [9] Ivan Tomek. A generalization of the k-nn rule. IEEE Transactions on Systems, Man, and Cybernetics, (2):121–126, 1976.
  • [10] Jorma Laurikkala. Improving identification of difficult small classes by balancing class distribution. In Conference on Artificial Intelligence in Medicine in Europe, pages 63–66. Springer, 2001.
  • [11] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
  • [12] Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. Borderline-smote: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing, pages 878–887. Springer, 2005.
  • [13] Gustavo EAPA Batista, Ronaldo C Prati, and Maria Carolina Monard. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter, 6(1):20–29, 2004.
  • [14] Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pages 1322–1328. IEEE, 2008.
  • [15] Dennis L Wilson. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, (3):408–421, 1972.
  • [16] Inderjeet Mani and I Zhang. knn approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets, volume 126, 2003.
  • [17] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
  • [18] Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
  • [19] RO Duda and PE Hart. Pattern classification and scene analysis–john wiley & sons. New York, NY, 1973.
  • [20] Tong Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on Machine learning, page 116. ACM, 2004.
  • [21] Raymond E Wright. Logistic regression. 1995.
  • [22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  • [23] Stefan Van Der Walt, S Chris Colbert, and Gael Varoquaux. The numpy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2):22, 2011.
  • [24] J. D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3):90–95, 2007.
  • [25] François Chollet et al. Keras. https://keras.io, 2015.
  • [26] Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861–874, 2006.
  • [27] Zejin Ding. Diversified ensemble classifiers for highly imbalanced data learning and their application in bioinformatics. 2011.
  • [28] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.