Dynamic Ensemble Selection (DES) has become an important research topic in the last few years cruz2018dynamic . Given a test sample and a pool of classifiers, DES techniques select one or more competent classifiers for the classification of that test sample. The most important part in DES techniques is how to evaluate the competence level of each base classifier for the classification of a given test sample cruz2016prototype . In general, DES techniques evaluate the competence level of base classifiers for the classification of a test sample, , based on the performance of the base classifier in a local region surrounding the test sample, named region of competence. Most DES techniques define the region of competence of test samples using the K-Nearest Neighbors of the test sample in the validation set, we refer to this validation set as the dynamic selection dataset () dcs:2014 .
Despite being very effective in several classification tasks, DES techniques can select classifiers that classify all samples in the region of competence of a test sample to the same class, even when the test sample is located close to a decision border, having neighbors belonging to different classes (indecision region) dfp:2017 .
Figure 1 represents a query sample, , located in a indecision region. In this example, the decision boundary of classifier crosses the region of competence of , and it predicts different class labels for the samples belonging to this region. It also correctly classifies at least one sample from each class. On the other hand, does not cross the region of competence of . However, since it correctly classifies the same number of samples as , a DES algorithm could select as a local competent classifier, instead of , misclassifying the query.
To deal with this issue, Oliveira et al. dfp:2017 proposed the Frienemy Indecision Region Dynamic Ensemble Selection (FIRE-DES), a DES framework that pre-selects classifiers with decision boundaries crossing the region of competence when the test sample is located in an indecision region. Given a test sample , FIRE-DES decides if it is located in an indecision region. If so, it uses the Dynamic Frienemy Pruning (DFP) to pre-select classifiers with decision boundaries crossing the region of competence of . Then, only the pre-selected pool is passed down to a DES technique to select the final ensemble of classifiers.
However, the FIRE-DES does not consider whether or not the region of competence is a good representation of the type of region in which the test sample is located. For instance, the FIRE-DES can mistake a safe region as being an indecision region due to the presence of noise in . In this case, the DFP can remove local competent classifiers from the pool as they do not correctly classify the noise instance, leaving only the base classifiers that modeled the noise in the local region for the DES step.
In addition, when dealing with small sized datasets, some regions of the feature space may not be well populated. In such cases, the region of competence of can contain samples belonging to a single class (safe region) even though may be located close to the class borders (true indecision region). In such cases, the FIRE-DES algorithm will mistake that is located in a safe region. Hence, the DFP algorithm will not be employed to remove incompetent classifiers. However, the query is located in a true indecision region since it is close to the decision border of classes, regardless of the classes represented in its region of competence.
In this paper, we propose the FIRE-DES++, an enhanced FIRE-DES framework that tackles the noise sensitivity and indecision region restriction drawbacks of the previous framework. The main differences between the FIRE-DES++ to the original version are: (1) The FIRE-DES++ applies a prototype selection (PS) technique in order to remove noise from the validation set (). Hence, the FIRE framework will not mistake a noisy region for an indecision region when estimating the regions of competence. (2) During the test phase, the FIRE-DES++ employs a K-Nearest Neighbors Equality (KNNE) knne:2011
to define the region of competence. The KNNE is a variation of the KNN technique which selects the same amount of samples from each class. By using the KNNE, test instances that are located close to the decision borders (in a true indecision region) will never be mistaken as belonging to a safe region since its region of competence will always be composed of samples from different classes. Thus, solving the indecision region restriction drawback of the FIRE-DES framework. Like FIRE-DES, FIRE-DES++ can be used with any dynamic selection technique based on the nearest neighbors to estimate the competence level of base classifiers.
The experiments were conducted over 64 datasets from the Knowledge Extraction based on Evolutionary Learning (KEEL) repository keel . We evaluated FIRE-DES++ on 8 dynamic selection techniques: Overall Local Accuracy (OLA) dcs_la:1996 , Local Class Accuracy(LCA) dcs_la:1996 , A Priori selection dcs_la:1999 , A Posteriori selection dcs_la:1999 , Multiple Classifier Behavior (MCB) mcb:2001 , Dynamic Selection KNN dsknn:2006 and the K-Nearest Oracles Union (KNU) and Eliminate (KNE) des:2008 . We also compared FIRE-DES++ with the better performing dynamic selection technique according to a recent survey cruz2018dynamic : Randomized Reference Classifier (RRC) rrc:2011 , META-DES metades:2015 , and META-DES.Oracle metadesoracle:2017 as well as several static ensemble approaches.
This paper is organized as follows: Section 2 presents the problem statement, Section 3 presents the proposed framework, Section 4 presents the experimental study, and Section 5 concludes the paper.
2 Problem Statement
The Frienemy Indecision Region Dynamic Ensemble Selection (FIRE-DES) framework works as an online pruning mechanism to pre-select base classifiers before applying the dynamic ensemble selection techniques. Given a new input query to the system, , the FIRE-DES framework analyze its region of competence to decide whether or not it is located in an indecision region (region of competence with samples from different classes). If the sample is located in a safe region, i.e., the whole region of competence is composed of samples belonging to the same class, all base classifiers are passed down to the dynamic selection technique. However, when the query is located on an indecision region, the framework applies the Dynamic Frienemy Pruning (DFP) technique to pre-select base classifiers that are able to correctly classify at least a pair of samples belonging to different classes in the region of competence. This pair of samples is called frienemy. Two instances and are considered frienemies if they are located in the region of competence of , and have different class labels.
Ideally, a local competent classifier would be able to distinguish all frienemies pair in the region of competence, thus being able to separate between the two classes locally. The DFP is applied to pre-select only the base classifiers that correctly classify at least one pair of frienemies. Then, only the pre-selected base classifiers are passed down to the DES algorithm for the competence estimation and classification. In the example presented in Fig 1, the DFP would remove since it does not correctly classify a single pair of frienemies. That way, although and may have the same local competence level, would not be taken into consideration by the DS algorithm. Hence, the would be selected predicting the correct label of the query. In a case where no base classifier correctly classifies a single pair of frienemies, all base classifiers are considered for competence estimation.
Although the FIRE-DES framework can be used to significantly improve the performance of several DES techniques dfp:2017 , it suffers from two main drawbacks: the noise sensitivity, and indecision region restriction.
2.2 Drawback 1: Noise Sensitivity
The noise sensitivity drawback is important because DES techniques are highly sensitive to noise, outliers, and high level of overlap between classes incruz2016prototype ; dsfa:2011 . Figure 2(a) shows a test sample () with true class located in a noisy region, and three classifiers , , and . In this figure, the region of competence () of the test sample is composed of the samples A, B, C, and N (sample is noise). In the example from Figure 2(a), the classifier correctly classifies 4 samples in (A, B, C, and the noise instance N), the classifier correctly classifies 2 samples in (B, and C), and the classifier correctly classifies 3 samples in (A, B, and C).
The Overall Local Accuracy (OLA) dcs_la:1996 DES technique estimates the competence of classifiers using their accuracy in the region of competence, that is, the more samples a classifier correctly classifies, the more competent it is. OLA selects only the most competent classifier for the classification of the test sample.
In Figure 2(a), OLA selects , the classifier that correctly classify most samples in , even though was only considered the best because of a noisy sample (N). This selection leads to the misclassification of the test sample as . Also in this example, the FIRE-DES will mistake the noisy region (region with noisy samples) for an indecision region (region composed of samples from different classes), and pre-select classifiers that correctly classify at least one pair of samples from different classes (frienemies), in this case , also misclassifying the test sample as .
2.3 Drawback 2: Indecision Region Restriction
Figure 2(b) shows the scenario from Figure 2(a) without the noisy sample . Figure 2(b) shows a test sample () with true class located in a true indecision region (close to the borders), and three classifiers , , and . In this figure, the region of competence () of the test sample is composed of the samples A, B, C, and D all from class . In the example from Figure 2(b), the classifier correctly classify 3 samples in (A, B, and C), the classifier correctly classify 2 samples in (B, and C), and the classifier correctly classify 4 samples in (A, B, C, and D).
In Figure 2(b), OLA selects the classifier that correctly classify the most samples in , that is, , even though classify all samples in the region of competence of the test sample as being from the same class , misclassifying the test sample.
In the example from Figure 2(b), the FIRE-DES does not apply the DFP because it considers as being located in a safe region, even though it is located in a true indecision region. Therefore, FIRE-DES with OLA also misclassifies the test sample as being from the class . This scenario is very likely to happen when dealing with small sized as well as imbalanced datasets, in which one of the classes may not contain enough examples in the local region.
3 The proposed framework
In this section, we propose an enhanced Frienemy Indecision Region Dynamic Ensemble Selection (FIRE-DES++). FIRE-DES++ is divided into four phases (Figure 3): overproduction, filtering, region of competence definition and selection. The main differences between the original FIRE-DES framework and the proposed FIRE-DES++ are the addition of the filtering phase to deal with the noise sensitivity drawback, and the region of competence definition phase, in which the KNN-Equality is applied to guarantee that all classes are represented in the region of competence. Algorithms 1 and 2 present the training and test stages of the FIRE-DES++ framework, respectively.
Overproduction phase, where the pool of classifiers is generated using the training set (). The overproduction phase is performed only once in the training stage.
Filtering phase, where a Prototype Selection (PS) ps-taxonomy:2012 technique is applied to the validation set , removing noise and outliers, and reducing the level of overlap between classes in . The improved validation set is named . The filtering phase is performed only once in the training stage.
Region of competence definition (RoCD) phase, there the framework defines the region of competence () using the K-Nearest Neighbors Equality (KNNE) knne:2011 to select samples from the improved validation set . The KNNE is a nearest neighbor approach that selects an equal number of samples from each class, avoiding the definition of a region of competence with samples of a single class. The RoCD phase is performed in the testing stage for each new test sample.
Selection phase, where the ensemble of classifiers for the classification of each new test sample is selected. Given a test sample , this phase pre-selects base classifiers with decision boundaries crossing the region of competence of (), if such classifier exists, using the Dynamic Frienemy Pruning (DFP) dfp:2017 . The DFP pre-selects classifiers that correctly classify at least one pair of samples from different classes ("frienemies") in the region of competence. The DFP avoids the selection of classifiers that classify all samples in the region of competence as being from the same class. After the pre-selection, any DES technique is applied to perform to select the final ensemble of classifiers (). Finally, the framework uses a combination rule to combine the predictions of the selected classifiers into a single prediction.
In Figure 3, is the training set, Generation is an ensemble generation process (i.e. Bagging bagging:1996 ), and is the generated pool of classifiers; is the test set, is the test sample; is the validation set, Filtering is the process of filtering using a prototype selection algorithm which results in the improved validation set , Region of Competence Definition is the process of selecting the region of competence of using the filtered validation set , is the region of competence of ; Dynamic Frienemy Pruning is the Dynamic Pruning step, is the pre-selected ensemble of classifiers, Dynamic Selection is the Dynamic Selection step; is the ensemble of selected classifiers, Combination is the process of combining the prediction of the classifiers in , and class() is the final prediction of .
The phases of FIRE-DES++ complement each other as the filtering phase tackles the noise sensitivity drawback, removing noise and reducing the level of overlap between classes; the region of competence definition phase tackles the indecision region restriction drawback, as it ensures that all classes are represented in the region of competence of the test sample; and, finally, the selection phase pre-selects classifiers with decision boundaries crossing the region of competence, without having to consider the effect of noise (since noise is removed in the filtering phase), or deciding if a test sample is located in an indecision region or not (as the region of competence definition phase always selects regions of competence composed of samples of different classes). The phases of FIRE-DES++ are detailed in the following subsections.
The overproduction phase uses any ensemble generation technique to generate the pool of classifiers trained with the training set . Since the focus of this work is on dynamic selection, the Bagging technique bagging:1996 bagging:1998 is used to generate the pool of classifiers, following the approach used in dfp:2017 .
3.2 Filtering phase
The filtering phase tackles the noise sensitivity drawback (Section 2.2), as removing noise from , preventing FIRE-DES from estimating the competence level of base classifiers using noisy data. This step is conducted by applying a PS technique to the validation set (), resulting in an improved validation set () with less noise, and less overlap between classes.
In ps-taxonomy:2012 , the authors presented a taxonomy of prototype selection, classifying prototype selection techniques into three categories: (1) Condensation techniques, that remove samples in the center of classes, maintaining the borderline samples. (2) Edition techniques, that remove sample in the borders of classes, maintaining safe samples (located in the center of classes). (3) Hybrid techniques, that combine condensation and edition approaches.
We expect the filtering phase to cause a high performance gain to the FIRE-DES++ framework, as in cruz2016prototype , the authors show that state-of-the-art techniques fail to obtain a good approximation of the decision boundaries of classes when noise is added to , and also demonstrate that using PS increases the classification performance of DES techniques.
Two PS techniques are considered: the Relative Neighborhood Graph (RNG) rng:1997 and the Edited Nearest Neighborhood (ENN) enn:1972 . These two PS techniques were the best approaches for dynamic selection purposes according to dsfa2:2017 . Furthermore, since our experimental study is focused on small datasets with different levels of class imbalance, only samples of the majority class are removed from the validation set. Therefore, they also help to alleviate class imbalance problems when performing dynamic selection roy2018study .
3.2.1 Relative Neighborhood Graph (RNG)
The RNG technique uses the concept of Proximity Graph (PG) to select prototypes. RNG builds a PG, G = (V, E), in which the vertices are samples (V = ) and the set of edges E contains an edge connecting two samples if and only if satisfy the neighborhood criterion in Equation 1:
where is the Euclidean distance between two samples, and is the validation set . The corresponding geometric is defined as the disjoint intersection between two hyperspheres centered in and , and radius equal to . Two samples are relative neighbors if and only if this intersection does not contain any other sample from . The relative neighborhood of a sample is the set of all its relative neighbors. After building the PG and defining all graph neighbors, all samples with class label different from the majority of their respective relative neighbors are removed from .
Algorithm 3 presents the pseudo-code of the RNG technique used in this work. Given the validation set , all samples are added in the filtered validation set (Line 1), and the proximity graph of the samples in are stored in (Line 2). Now, for each sample , the relative neighbors () of are selected, and, if the most common class label in is different from the class label of , and is not from the minority class, is removed from the filtered validation set (Line 3 - 10). Finally, the filtered validation set is returned (Line 11).
3.2.2 Edited Nearest Neighbors (ENN)
The ENN is an edition prototype selection technique well-known for its efficiency in removing noise and producing smoother classes boundaries. The ENN is used with the changes proposed in ncr:2001 , (implemented in imblearn:2017 ), where only majority class samples are removed in order to reduce the class imbalance.
Algorithm 4 presents the pseudo-code of the ENN technique used in this work. Given the validation set , all samples are added in the filtered validation set (Line 1), and for each sample , if is misclassified by its nearest neighbors in and is not from the minority class, is removed from the filtered validation set (Line 2 - 8). Finally, the filtered validation set is returned (Line 9).
3.3 Region of competence definition phase
In order to solve the indecision region drawback (Section 2.3), the FIRE-DES++ employs the K-Nearest Neighbors Equality (KNNE) instead of the traditional KNN algorithm in order to define the region of competence, , for each new query, . The KNNE is a variation of the KNN technique which selects the same amount of samples from each class knne:2011 .
The advantage of using the KNNE instead of the original KNN method employed by the previous FIRE-DES algorithm is that we ensure all classes are represented in the region of competence. Thus, test instances that are located close to the decision borders (i.e., in a true indecision region) will never be mistaken as belonging to a safe region. Moreover, the uses of KNNE complements the filtering stage of the FIRE-DES++ framework. By reducing the overlap between the classes, the filtering phase may remove important samples that are close to the class borders ps-taxonomy:2012 ; cruz2016prototype , which could make indecision regions being mistaken as safe regions. By using the KNNE, the FIRE-DES++ framework guarantees that the DFP mechanism will be employed in such scenarios.
The region of competence, , is then passed down to the selection phase.
3.4 Selection phase
In the selection phase, first, the framework pre-selects classifiers using the DFP. Next, a dynamic selection technique is employed, over the pre-selected pool, to select the final ensemble , that is used for the classification of .
3.4.1 Dynamic frienemy pruning
The Dynamic Frienemy Pruning (DFP) dfp:2017 aims to pre-select competent classifiers (classifiers with decision boundaries crossing the region of competence) for the classification of each new test sample, before the final selection of classifiers. The DFP algorithm uses the frienemy samples concept: Given a test sample and its region of competence , two samples and are frienemy samples in regards to if, is in , is in , and and are from different classes. Figure 4 shows a test sample and its region of competence (samples , , , and ). In this example, the frienemy samples are the pairs of samples of opposite classes , named , , , , , .
For each new test sample, if the test sample is located in an indecision region, the DFP algorithm pre-selects classifiers with decision boundaries crossing the region of competence. That is, if the test sample have samples of different classes in the region of competence, DFP pre-selects classifiers that correctly classify at least one pair of frienemy samples (if such classifier exists).
Algorithm 5 presents the DFP pseudo-code. Given the region of competence () of the test sample, and the pool of classifiers (), DFP creates an empty list in which the pre-selected classifiers will be stored (Line 1), finds the pairs of frienemy samples () in (Line 2), and, for each classifier in , is included in if correctly classify at least one pair of frienemies (Lines 3 - 8). If no classifier is pre-selected, DFP includes all classifiers in into (lines 9 - 11). Finally, is returned (Line 12).
3.5 Dynamic Selection
In this step, the pruned pool and the region of competence, , are passed down to a DES technique which selects an ensemble , from , containing the most competence base classifiers for the classification of .
Figure 5 shows the same scenario from Figure 2, but without the noisy sample , and using the KNNE to define the region of competence of the test sample. First, the FIRE-DES++ removes noise from the validation set (the example from Figure 2(a) is turned into the example from Figure 2(b)), tackling the noise sensitivity drawback of FIRE-DES. Then, the framework uses the KNNE to define the region of competence, selecting an equal amount of samples from different classes (the example from Figure 2(b) is turned into the example from Figure 5), tackling the indecision region restriction drawback of FIRE-DES. The region of competence now is composed of the samples A, B, E, and F (instead of A, B, C, F) due to the use of KNNE.
In this example, the classifier now correctly classifies 2 samples in , the classifier now correctly classifies 3 samples in , and the classifier now correctly classifies 2 samples in . The OLA technique now selects , correctly classifying the test sample.
By applying the DFP in this example (after the PS technique and the KNNE), FIRE-DES++ pre-selects the classifier as it is the only classifier that correctly classifies at least one pair of frienemies, correctly classifying the test sample as being from the class . In this example, FIRE-DES++ performed optimal classification for OLA and the same concept can be extended to other DES techniques.
In this section, we evaluate FIRE-DES++ using different dynamic selection techniques. We evaluate the impact of the filtering phase using the PS techniques, the region of competence definition phase using the K-Nearest Neighbors Equality (KNNE), and the selection phase, using the Dynamic Frienemy Pruning (DFP). We also compare the filtering phase using the ENN and RNG.
4.1 Dynamic Selection Techniques
We used 8 dynamic classifier selection techniques from the literature. (Table 1): Overall Local Accuracy (OLA), Local Class Accuracy (LCA), A Priori (APRI), A Posteriori (APOS), Multiple Classifier Behavior (MCB), Dynamic Selection KNN (DSKNN), K-Nearest Oracles Union (KNU), and K-Nearest Oracles Eliminate (KNE). These eight techniques were selected since they are the most well-known dynamic selection techniques, having the highest number of citations according to Google Scholar. Moreover, they are all based on the KNN to estimate the region of competence. So they are suitable to be used in the FIRE-DES++ framework. A step-by-step explanation of such techniques can be found in the following surveys dcs:2014 ; cruz2018dynamic .
In addition, we compare the proposed FIRE-DES++ with the three dynamic ensemble selection frameworks that achieved the best classification performance in cruz2018dynamic : Randomized Reference Classifier (RRC) rrc:2011 , META-DES metades:2015 , and META-DES.Oracle metadesoracle:2017 . They are briefly described below:
RRC: Instead of estimating the competence of the base classifiers in the neighborhood of the query, this method uses all samples in , and weights the influence of each example using a Gaussian potential function so that samples closer to the query have a higher influence in the competence estimation than the more distant ones. The source of competence is estimated based on the concept of randomized reference classifier (RRC) proposed in rrc:2011 . The base classifiers that presented a competence level higher than the random classifier are selected to compose the ensemble for an input .
META-DES: The META-DES is a dynamic ensemble selection framework that model the competence estimation as a meta-problem. Each measure used to estimate the local competence of a base classifier is encoded as a meta-feature. Five sets of meta-features for the estimation of the classifier competence are considered. Then, a meta-classifier is trained, based on the training data, to predict whether or not a base classifier is competent enough for the classification of a new input .
META-DES.Oracle: The META-DES.Oracle is an extension of the META-DES framework based on the concept of Oracle, that is an ideal dynamic selection scheme which always selects the classifiers that predict the correct label for the current sample if such classifier exists kuncheva:2004 . In this case, the Oracle definition is used in an optimization scheme, so that the meta-classifier can achieve results that are closer to the Oracle, improving the dynamic selection of base classifiers.
These state-of-the-art frameworks are not based exclusively on the KNN for the competence level estimation. Hence, neither the KNNE nor the DFP can be applied to these techniques.
|Overall Local Accuracy (OLA)||Accuracy||Woods et al. dcs_la:1996|
|Local Class Accuracy (LCA)||Accuracy||Woods et al. dcs_la:1996|
|A Priori (APri)||Probabilistic||Giacinto et al. dcs_la:1999|
|A Posteriori (APos)||Probabilistic||Giacinto et al. dcs_la:1999|
|Multiple Classifier Behavior (MCB)||Behavior||Giacinto et al. mcb:2001|
|Dynamic Selection KNN (DSKNN)||Diversity||Santana et al. dsknn:2006|
|K-Nearests Oracles Union (KNU)||Oracle||Ko et al. des:2008|
|K-Nearests Oracles Eliminate (KNE)||Oracle||Ko et al. des:2008|
|Randomized Reference Classifier (RRC)||Probabilistic||Woloszynski et al. rrc:2011|
|META-DES||Meta-learning||Cruz et al. metades:2015|
|META-DES.Oracle||Meta-learning||Cruz et al. metadesoracle:2017|
The experiments were conducted using the Python 3.5 language with the scikit-learn library pedregosa2011scikit for the training of the base classifiers. The dynamic ensemble selection techniques were evaluated using the DESlib library cruz2018deslib , which contains fast implementation of all dynamic ensemble selection techniques evaluated in this work. The library is publicly available on GitHub: https://github.com/Menelau/DESlib.
The size of the region of competence (neighborhood size) was equally set to 7 for all dynamic selection technique (as suggested in cruz2018dynamic ). This is the only hyper-parameter required for the majority of dynamic selection methods. The only exception is the DS-KNN technique, which requires to predefine the number of selected base classifiers. In this case, the number of base classifiers selected using accuracy () and diversity () was set to of the whole pool as suggested in dsknn:2006 .
For the state-of-the-art techniques, the RRC has no hyper-parameter to set. The META-DES framework has two additional hyper-parameters: The number of samples selected using output profiles and the sample selection threshold . The values of the hyper-parameters and for the META-DES framework were set to 5 and 80% according to the results presented in metades:2015 ; metadesoracle:2017 .
We conducted the experiments on 64 datasets from the Knowledge Extraction based on Evolutionary Learning (KEEL) repository keel . This experimental study is focused on small datasets with different levels of class imbalance. So, the framework is evaluated under a diverse set of classification problems. Table 2 shows the characteristics of the datasets used in this experiment: label, name, number of features, number of samples and the Imbalanced Ratio (IR). The IR is a common metric used by several authors branco2016survey ; diez2015diversity to characterize the imbalanced level of a distribution. It is calculated by the number of instances of the majority class per instance of the minority class.
For each dataset, the experiments were carried out using a stratified 5-fold cross validation (1 fold for test and 4 folds for training). For the sake of simplicity, we use the 5-fold partitions provided in the KEEL website. Thus, making it easier to replicate the results of this paper. The process of creating the dynamic selection dataset (DSEL) was guided by the experiments conducted in roy2018study . Due to the low sample size, the whole training set is used for the generation of DSEL. There is an overlap between the training bootstraps and DSEL. However, due to the randomized nature of the Bagging technique as well as the application of the PS techniques its distribution is not exactly the same. Moreover, as reported by dietrich2003decision a small overlap between both datasets can be suitable for dealing with small sized datasets.
Similar to our previous works dfp:2017 , the pool of classifiers
was composed of 100 Perceptrons generated using the Bagging techniquebagging:1996 . The training process was conducted using the scikit-learn library pedregosa2011scikit . The learning rate and number of iterations used for the training were set to and
. The activation function is the Heaviside function, which predicts 0 if the sample is on one side of the hyperplane and 1 otherwise. Moreover, each Perceptron was calibrated to estimate posterior probabilities using Platt’s sigmoid modelplatt1999probabilistic provided in the scikit-learn library through the CalibratedClassifierCV class.
For evaluation metric, we used the Area Under the ROC Curve (AUC)auc:1997 . We used the AUC because this metric has been widely used to evaluate the performance of classifiers on imbalanced data cip:2013 .
Furthermore, we used the Wilcoxon Signed Rank Test wilcoxon:1945 and the Sign Test sheskin2003handbook to conduct a pairwise comparison between techniques over all datasets. These methods were used since they were suggested by signedtest:2006 ; benavoli2016should
. The Wilcoxon Signed Rank Test is a non-parametric alternative to the paired t-test. The Sign test works upon the number of wins, ties and losses obtained by an algorithm over the baseline. The algorithm is deemed statistically better if its number of wins plus half of the number of ties is higher than a critical value.
Comparison between multiple techniques over all datasets is conducted using the Friedman test with the Bonferroni-dunn post-hoc test as suggested by Demsar signedtest:2006 . The Friedman test is a non-parametric equivalent of the repeated-measures ANOVA. It ranks the algorithms for each data set separately, the best one getting the rank of 1, the second best rank 2 and so on. In case of a tie, i.e., two methods presented the same classification accuracy for the dataset, their average ranks were summed and divided by two. However, the Friedman test only tells that there is a difference between the classifiers, but does not present which methods differ. For this reason, the Bonferroni-dunn post-hoc test is employed to find out which techniques actually differs.
4.4 Filtering Phase: RNG vs. ENN
In this section, we evaluate FIRE-DES++ using RNG and ENN for the filtering phase. Both techniques follow the same approach of maintaining all samples of the minority class. In other words, a sample is only considered a noise and removed if it belongs to the majority class. This comparison is important for verifying whether the FIRE-DES++ is sensitive to changes in PS techniques in the filtering phase, and also for finding the PS technique that causes the highest classification performance gain in FIRE-DES++.
Figure 6 shows the scatter plot of average AUC of FIRE-DES++ using the ENN (vertical axis) and the RNG (horizontal axis). In this figure, all markers are above the diagonal line, meaning that using the ENN was, on average, better than using the RNG for all DES techniques in the proposed framework.
Using the Wilcoxon Signed Rank Test (), we can confirm that using the proposed framework with the ENN is statistically better than RNG for the majority of DES techniques: OLA (p-value = ), LCA (p-value = ), APRI (p-value = ), MCB (p-value = ), DSKNN (p-value = ), KNU (p-value = ), and KNE (p-value = ). The only exception is for the APOS technique (p-value = ). Thus, we only consider FIRE-DES++ using ENN for the rest of this paper.
4.5 Comparison among different scenarios
In this section, we analyze eight different scenarios for the dynamic selection techniques (Table 3). Each Scenario corresponds to a different combination of the three modules present in the FIRE-DES++ framework: DFP, ENN, and KNNE. Scenario I corresponds to the original dynamic selection techniques (i.e., no additional step is performed). Scenario IV corresponds to the FIRE-DES framework, in which only the DFP method is applied without using the modifications proposed in this paper (ENN and KNNE). Scenario VIII corresponds to the FIRE-DES++, in which the DFP, ENN and KNNE are all employed in the framework.
For each scenario, we evaluated the classification performance of each DES technique over the 64 datasets, a total of 512 experiments (64 datasets 8 DS techniques) per scenario. We performed the Friedman test to have a comparison between the eight scenarios considering all datasets. For each dataset and dynamic selection technique, we ranked each scenario from rank 1 to rank 8 (rank 1 being the best), and used the Friedman test to calculate their average rank (Table 4). The result of the Friedman test was , indicating that there is statistical difference between the scenarios. In order to know where the difference lies, the Bonferroni-Dunn post-hoc test is conducted. The result of the post-hoc analysis is presented using a critical difference diagram (Figure 7). Scenarios significantly different have a difference in ranking higher than the critical difference ().
|Algorithm||Avg. Rank||Algorithm||Mean AUC|
|Scenario VIII||3.75||Scenario VIII||82.95|
|Scenario III||3.84||Scenario VII||82.70|
|Scenario VII||3.95||Scenario III||82.13|
|Scenario V||4.23||Scenario V||82.11|
|Scenario I||4.93||Scenario VI||81.57|
|Scenario VI||4.97||Scenario II||81.37|
|Scenario II||4.99||Scenario IV||81.18|
|Scenario IV||5.30||Scenario I||80.61|
Figure 7 shows that FIRE-DES++ (Scenario VIII) achieved the lowest average ranking (), statistically outperforming Scenarios I, II, IV, V, and VI. Scenarios VI (DFP+KNNE) and VII (DFP+ENN) obtained lower average rank when compared to scenario IV (DFP alone). The reason for Scenario IV obtaining the highest average rank in this analysis is due to the fact that it never obtained the best result (lowest rank) for any combination of 64 datasets 8 DES techniques. There is always a better alternative either by using DFP+ENN to solve the noise sensitivity drawback (Section 2.2), DFP+KNNE to solve the indecision region definition drawback (Section 2.3 2.3) or using them all together. Thus, we can conclude the addition of ENN and KNNE really helps in improving the performance of the FIRE-DES framework.
Figure 8 shows the performance gain (AUC) obtained by adding each step of the proposed FIRE-DES++ framework in relation to the regular DES techniques. The regular DES techniques corresponds to Scenario I (Table 3), while the DFP, DFP+KNNE, DFP+ENN, and DFP+KNNE+ENN corresponds to Scenarios IV, VI, VII, and VIII respectively. This figure shows that the three phases combined (DFP, KNNE, and ENN) causes the highest performance gain (2.34), followed by DFP and ENN combined (2.09), DFP and KNNE combined (0.96), and finally DFP alone (0.57). These results indicate that the filtering and the region of competence definition phases in the FIRE-DES++ framework cause performance gain over FIRE-DES, with the performance best being the use of both the ENN and KNNE combined.
Thus, we can conclude that all steps of FIRE-DES++ are important. Each step helps in improving the performance of the DES techniques. Furthermore, using all three combined leads to the highest overall improvement in classification performance.
4.6 Comparison with FIRE-DES
In this section, we compare FIRE-DES++ and FIRE-DES for each DES technique considered in this work. The goal of this analysis is to investigate whether FIRE-DES++ significantly improves the performance of FIRE-DES as well as to identify which DES techniques are more benefited from the proposed framework.
The average rank and AUC for each DES techniques is shown on Table 5. Figure 9 presents the CD diagram comparing FIRE-DES++ (FOLA++, FLCA++, FAPRI++, FAPOS++, FMCB++, FDSKNN++, FKNU++, and FKNE++) with FIRE-DES (FOLA, FLCA, FAPRI, FAPOS, FMCB, FDSKNN, and FKNE) using the Bonferroni-Dunn post-hoc test. We can see that FIRE-DES++ outperformed FIRE-DES for 7 out of 8 DES techniques. The only exception was for the LCA method, in which the FLCA and FLCA++ had statistically equivalent results.
In addition, Figure 10
presents a pairwise comparison of FIRE-DES++ and FIRE-DES for each DES technique. This comparison used the Sign test calculated on the computed wins, ties and losses of FIRE-DES++. The null hypothesiswas that using the FIRE-DES++ did not make any difference compared to FIRE-DES, and a rejection of meant that FIRE-DES++ significantly outperformed FIRE-DES. In this evaluation, we considered three levels of significance . To reject , the number of wins plus half of the number of ties needs to be greater or equal to a critical value (Equation 2):
where (the number of experiments), is the critical value for each significance level , respectively.
|Algorithm||Avg. Rank||Algorithm||Mean AUC|
Figure 10 shows that FIRE-DES++ caused a significant performance gain over FIRE-DES based on the Sign test. For a confidence level (first 2 lines left to right), FIRE-DES++ significantly improved the performance of 7 out of 8 techniques. In addition, with a more restrict confidence level , the proposed FIRE-DES++ presented statistically better results for the A Priori, A Posteriori, MCB, OLA, DSKNN and KNE. Only for the LCA technique the FIRE-DES++ did not significantly improve over the FIRE-DES framework. However, the FLCA++ still obtained a higher number of wins (35) than losses (29). Thus, we can conclude that by the addition of ENN filter and the KNNE, the FIRE-DES++ can significantly improve the performance of a diverse set of dynamic selection techniques.
In addition, we measured the processing time of the original FIRE-DES framework and the proposed FIRE-DES++ framework. The processing time was calculated by computing the average processing time over the 64 datasets. The average running time of the proposed FIRE-DES++ framework was about 10% slower than the original FIRE-DES framework. Therefore, we can conclude that the FIRE-DES++ significantly improves the performance of DES techniques with a minimal increase in the computational time.
4.7 Comparison with state-of-the-art
In this section we compare the results of the FIRE-DES++ with the state-of-the-art dynamic ensemble selection frameworks (Table 1) as well as static ensemble methods. The following static ensemble methods were considered: Bagging bagging:1996 , AdaBoost freund1995desicionbreiman2001random , Extremely Randomized Forest geurts2006extremely
, Gradient Boosted Treesfriedman2002stochastic and Random Balance ensembles diez2015random . Each technique was evaluated with a total of 100 base classifiers. The hyper-parameters of such techniques were set with the values suggested in delgado14a .
For the sake of simplicity, only the FKNE++ was considered in this analysis since it performed better in the previous experiments. Table 6 presents the average AUC and ranking of FKNE++, the state-of-the-art DES frameworks and the static ensemble methods. The FKNE++ obtained the lowest average rank (2.84), and the second best average AUC, 85.17 vs 85.37 obtained by the Random Balance ensemble.
|Algorithm||Avg. Rank||Algorithm||Mean AUC|
|Extreme Forest||6.68||Extreme Forest||78.00|
Moreover, Figure 11 presents the results of the rank analysis using critical difference diagram. The critical value was computed using the Bonferroni-Dunn test with a confidence level (). We can see that the FKNE++ statistically outperformed all state-of-the-art DES framework based on the rank analysis. Using the Wilcoxon Signed Rank Test () for a more robust pairwise analysis, we also observed that FKNE++ statistically outperformed all three state-of-the-art DES frameworks: META-DES (p-value ), META-DES.Oracle (p-value ) and RRC (p-value ). Thus, we can conclude the proposed FIRE-DES++ presents a significant performance gain over the state-of-the-art DES frameworks for these datasets.
The FKNE++ also statistically outperformed the majority of static ensemble combination methods. The only exception being the Random Balance technique. This could be explained by the fact the Random Balance was proposed to deal specifically with small sized and imbalanced data diez2015random , which comprises the 64 datasets in this study. Moreover, this technique achieved the state-of-the-art performance for such datasets in several comparative studies diez2015diversity ; roy2018study . Hence, the FKNE++ is competitive with the state-of-the-art methods for dealing with small sized and imbalanced datasets.
In this paper, we presented 2 drawbacks of the Frienemy Indecision REgion Dynamic Ensemble Selection (FIRE-DES) framework: (1) noise sensitivity drawback: the classification performance of FIRE-DES is strongly affected by noise, as it mistakes noisy regions for indecision regions and applies the pre-selection of classifiers. (2) indecision region restriction drawback: FIRE-DES uses the region of competence to decide if a test sample is located in an indecision region, and only pre-selects classifiers when the region of competence of the test sample is composed of samples from different classes, restricting the number of test samples in which the pre-selection is applied for its classification.
To tackle these drawbacks of FIRE-DES, we use the Edited Nearest Neighbors (ENN) enn:1972 to remove noise from the validation set (tackling the noise sensitivity drawback), and we use the K-Nearest Neighbors Equality (KNNE) knne:2011 to define the region of competence selecting the nearest neighbors from each class (tackling the indecision region restriction drawback). We named this new framework FIRE-DES++.
We compared the results FIRE-DES++ with DES and FIRE-DES with 8 dynamic selection techniques over 64 datasets. The experimental results show that the FIRE-DES++ significantly outperform FIRE-DES for 7 out of 8 DES techniques. Moreover, results also show that each individual phase of the new framework, filtering and region of competence definition, helps in significantly improving generalization performance of DES techniques.
We also compared the performance of the FIRE-DES++ with the state-of-the-art DES frameworks and ensemble methods. The results showed that the proposed framework significantly outperformed all three state-of-the-art DES frameworks with statistical confidence as well as the majority of the state-of-the-art ensemble methods. Furthermore, the FIRE-DES++ is equivalent to the Random Balance method which is considered the state-of-the-art ensemble algorithm for dealing with the KEEL imbalanced datasets according to diez2015diversity .
Future works on this topic will involve extending the FIRE-DES++ framework for handling multi-class classification problems; evaluating the use of different types of base classifier as well as other ensemble generation methods in the framework, and performing a complete study on the FIRE-DES++ together with data preprocessing techniques for dealing with imbalanced classification problems.
The authors would like to thank CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, in portuguese), CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico, in portuguese) and FACEPE (Fundação de Amparo à Ciência e Tecnologia do Estado de Pernambuco, in portuguese).
- (1) R. M. Cruz, R. Sabourin, G. D. Cavalcanti, Dynamic classifier selection: Recent advances and perspectives, Information Fusion 41 (2018) 195–216.
- (2) R. M. Cruz, R. Sabourin, G. D. Cavalcanti, Prototype selection for dynamic classifier and ensemble selection, Neural Computing and Applications (2016) 1–11.
- (3) A. S. Britto, R. Sabourin, L. E. Oliveira, Dynamic selection of classifiers—a comprehensive review, Pattern Recognition 47 (11) (2014) 3665–3680.
- (4) D. V. Oliveira, G. D. Cavalcanti, R. Sabourin, Online pruning of base classifiers for dynamic ensemble selection, Pattern Recognition 72 (2017) 44–58.
- (5) B. Sierra, E. Lazkano, I. Irigoien, E. Jauregi, I. Mendialdua, K-nearest neighbor equality: Giving equal chance to all existing classes, Information Sciences 181 (23) (2011) 5158–5168.
- (6) J. Alcalá, A. Fernández, J. Luengo, J. Derrac, S. García, L. Sánchez, F. Herrera, KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework, Journal of Multiple-Valued Logic and Soft Computing 17 (2-3) (2010) 255–287.
- (7) K. Woods, W. P. Kegelmeyer, K. W. Bowyer, Combination of multiple classifiers using local accuracy estimates, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (4) (1997) 405–410.
- (8) G. Giacinto, F. Roli, Methods for dynamic classifier selection, in: International Conference on Image Analysis and Processing, IEEE, 1999, pp. 659–664.
- (9) G. Giacinto, F. Roli, Dynamic classifier selection based on multiple classifier behaviour, Pattern Recognition 34 (9) (2001) 1879–1881.
A. Santana, R. G. Soares, A. M. Canuto, M. C. de Souto, A dynamic classifier selection method to build ensembles using accuracy and diversity, in: Brazilian Symposium on Neural Networks, IEEE, 2006, pp. 36–41.
- (11) A. H. Ko, R. Sabourin, A. S. Britto Jr, From dynamic classifier selection to dynamic ensemble selection, Pattern Recognition 41 (5) (2008) 1718–1731.
- (12) T. Woloszynski, M. Kurzynski, A probabilistic model of classifier competence for dynamic ensemble selection, Pattern Recognition 44 (10) (2011) 2656–2668.
- (13) R. M. Cruz, R. Sabourin, G. D. Cavalcanti, T. I. Ren, META-DES: A dynamic ensemble selection framework using meta-learning, Pattern Recognition 48 (5) (2015) 1925–1935.
R. M. Cruz, R. Sabourin, G. D. Cavalcanti, META-DES.Oracle: Meta-learning and feature selection for dynamic ensemble selection, Information Fusion 38 (2017) 84–103.
- (15) R. M. Cruz, G. D. Cavalcanti, T. I. Ren, A method for dynamic ensemble selection based on a filter and an adaptive distance to improve the quality of the regions of competence, in: International Joint Conference on Neural Networks, IEEE, 2011, pp. 1126–1133.
- (16) S. Garcia, J. Derrac, J. Cano, F. Herrera, Prototype selection for nearest neighbor classification: Taxonomy and empirical study, IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (3) (2012) 417–435.
L. Breiman, Bagging predictors, Machine learning 24 (2) (1996) 123–140.
- (18) M. Skurichina, R. P. Duin, Bagging for linear classifiers, Pattern Recognition 31 (7) (1998) 909–930.
- (19) J. S. Sánchez, F. Pla, F. J. Ferri, Prototype selection for the nearest neighbour rule through proximity graphs, Pattern Recognition Letters 18 (6) (1997) 507–513.
- (20) D. L. Wilson, Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man and Cybernetics (3) (1972) 408–421.
- (21) R. M. Cruz, R. Sabourin, G. D. Cavalcanti, Analyzing different prototype selection techniques for dynamic classifier and ensemble selection, in: International Joint Conference on Neural Networks (IJCNN), IEEE, 2017, pp. 3959–3966.
- (22) A. Roy, R. M. Cruz, R. Sabourin, G. D. Cavalcanti, A study on combining dynamic selection and data preprocessing for imbalance learning, Neurocomputing 286 (2018) 179–192.
J. Laurikkala, Improving identification of difficult small classes by balancing class distribution, Artificial Intelligence in Medicine (2001) 63–66.
- (24) G. Lemaıtre, F. Nogueira, C. K. Aridas, Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning, Journal of Machine Learning Research 18 (17) (2017) 1–5.
- (25) L. I. Kuncheva, Combining pattern classifiers: methods and algorithms, John Wiley & Sons, 2004.
- (26) F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, Journal of machine learning research 12 (Oct) (2011) 2825–2830.
- (27) R. M. Cruz, L. G. Hafemann, R. Sabourin, G. D. Cavalcanti, Deslib: A dynamic ensemble selection library in python, arXiv:1802.04967.
- (28) P. Branco, L. Torgo, R. P. Ribeiro, A survey of predictive modeling on imbalanced domains, ACM Computing Surveys (CSUR) 49 (2) (2016) 31.
- (29) J. F. Díez-Pastor, J. J. Rodríguez, C. I. García-Osorio, L. I. Kuncheva, Diversity techniques improve the performance of the best imbalance learning ensembles, Information Sciences 325 (2015) 98–117.
- (30) C. Dietrich, G. Palm, F. Schwenker, Decision templates for the classification of bioacoustic time series, Information Fusion 4 (2) (2003) 101–109.
J. Platt, et al., Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, Advances in large margin classifiers 10 (3) (1999) 61–74.
- (32) A. P. Bradley, The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognition 30 (7) (1997) 1145–1159.
- (33) V. López, A. Fernández, S. García, V. Palade, F. Herrera, An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics, Information Sciences 250 (2013) 113–141.
- (34) F. Wilcoxon, Individual comparisons by ranking methods, Biometrics bulletin 1 (6) (1945) 80–83.
- (35) D. J. Sheskin, Handbook of parametric and nonparametric statistical procedures, crc Press, 2003.
- (36) J. Demšar, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research 7 (1) (2006) 1–30.
- (37) A. Benavoli, G. Corani, F. Mangili, Should we really use post-hoc tests based on mean-ranks, Journal of Machine Learning Research 17 (5) (2016) 1–10.
Y. Freund, R. E. Schapire, A desicion-theoretic generalization of on-line learning and an application to boosting, in: European conference on computational learning theory, Springer, 1995, pp. 23–37.
- (39) L. Breiman, Random forests, Machine learning 45 (1) (2001) 5–32.
- (40) P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees, Machine learning 63 (1) (2006) 3–42.
- (41) J. H. Friedman, Stochastic gradient boosting, Computational Statistics & Data Analysis 38 (4) (2002) 367–378.
- (42) J. F. Díez-Pastor, J. J. Rodríguez, C. García-Osorio, L. I. Kuncheva, Random balance: ensembles of variable priors classifiers for imbalanced data, Knowledge-Based Systems 85 (2015) 96–111.
M. Fernández-Delgado, E. Cernadas, S. Barro, D. Amorim,
Do we need hundreds of
classifiers to solve real world classification problems?, Journal of Machine
Learning Research 15 (2014) 3133–3181.