1 Introduction
Classification is a fundamental data analysis procedure, which is ubiquitously used across different fields. Thousands of classification algorithms (classifiers) have been developed during the past decades [1]. These classifiers range from simple models such as knearest neighbors (kNN) [2]
to more sophisticated models such as support vector machine (SVM)
[3]and random forests (RF)
[4].Despite the advances on the development of new classifiers, no single classification algorithm can always achieve the best performance on all data sets [1]. This indicates that different classifiers are complementary to each other in different contexts. Therefore, it is still necessary to develop new and alternative classifiers based on some principles that remain unexplored.
The motivation behind this research is based on the following observations. First, existing nonlazy classifiers typically formulate the classification problem as an optimization problem. Such optimizationbased learning strategies can always generate the target classifiers, regardless of the statistical significance of learnt models. Second, classifiers such as logistic regression are able to provide probability values for categorizing an unknown test instance. However, it is not an easy task to determine a universal probability threshold to ensure that the classification of the test instance into the corresponding class is statistically significant. Last but not least, existing classifiers cannot control the number of misclassified test instances in terms of metrics such as false discovery rate (FDR). Such capability is quite important in the scenario of biological data analysis, in which the prediction results will be further validated by wetlab experiments that can be costly and timeconsuming
[5]. Thus, we need to add some notion of statistical significance to classifiers.In fact, the classification problem has already been formulated as a hypothesis testing issue in [6]. More recently, several research efforts [7], [8] further extend the initial formulation in [6] from different aspects. However, the following observations motivate this research. First of all, existing testingbased classification methods deserve certain theoretical drawbacks, as discussed and summarized in Section 2. Second, only simulation data sets and several small real data sets have been empirically tested, making it difficult to convince people on the practical usage of such testingbased formulation. Third, the connection between this new formulation and existing classification methods have never been discussed. Finally, the potential benefit of the testingbased classification model remains unexplored.
Based on the above observations, we present a new testingbased classification formulation, in which the null hypothesis is that, informally, the test instance doesn’t belong to any class. To precisely define the null hypothesis, we focus on the classification problem in a twoclass setting. First, we can calculate the distance between the test instance and each training instance in the training data set. In this way, we will generate two sets of distances for one test instance that needs to be classified. Then, the hypothesis testing issue can be casted as a twosample testing problem [9], in which each sample corresponds to a set of distances. In this formulation, the null hypothesis is that two sets of distances are drawn from the same cumulative distribution.
Twosample testing is a fundamental problem in statistics. We employ the classical WilcoxonMannWhitney (WMW) test for quantifying the statistical significance in terms of pvalues. To alleviate the effect of outlying and irrelevant training instances, we further apply the WMW test to two distance sets that are generated from kNNs of the test instance.
The testingbased classification formulation has several salient features. First of all, it can provide pvalues for each test instance to quantify the statistical significance of classifying this instance to certain classes. Accordingly, we can detect outlying test instances that do not belong to any class if the pvalues with respect to all classes are larger than the significance level threshold. Second, we can control the FDR of test instances that are assigned to each class based on their pvalues.
We evaluate our method on forty data sets from the UCI [10] repository and the KEELdataset repository [11] with respect to the standard classification task. The experimental results show that our method is able to achieve the same level performance as the stateoftheart classifiers. Meanwhile, it can handle outlying test instances and control the FDR of test instances assigned to each class in a natural manner.
The main contributions of this paper can be summarized as follows.
(1) The binary classification issue is formulated as a twosample testing problem. Since twosample testing is a fundamental problem in statistics and many wellknown tests are available in the literature, it can be expected that we may introduce many effective testingbased classifiers in the near future.
(2) The classification model that integrates hypothesis testing and the kNN method is presented. This formulation can alleviate the effect of outlying and irrelevant training instances to improve the classification accuracy significantly.
(3) A comprehensive performance comparison over 40 real data sets is conducted. The experimental results demonstrate the fact that the testingbased classifier is able to achieve the same level performance as standard classifiers such as SVM and decision tree.
(4) Some interesting connections between our testingbased classifiers and existing classification methods are presented.
(5) The advantage of the testingbased classification model on handling outliers and controlling the Type I error rate in terms of FDR is empirically investigated.
The rest of this paper is organized as follows. Section 2 discusses some previous works that are related to our method. Section 3 presents the details of our method. Section 4 reports experimental results on 40 real data sets. Section 5 discusses the relationship between our method and other approaches. Finally, Section 6 concludes this paper.
2 Related Work
2.1 Instancebased learning
Instancebased learning is a lazy learning scheme in which the training instances are simply stored. When a new instance is encountered, a set of similar training instances are retrieved to classify the unknown testing instance. The most basic instancebased method is the knearest neighbor algorithm (kNN) [2] [12], which assigns a new instance to the most common class among its kNNs in training instances.
Essentially, our method can be considered as an instancebased learning approach since the twosample test is conducted on the distance sets generated from all training instances or kNNs. This indicates that it is feasible to apply techniques developed for instancebased learning during the past decades (e.g. [13], [14], [15]) to further improve our method.
2.2 Classification based on hypothesis testing
Liao & Akritas [6] introduce a classification method based on hypothesis testing, which is abbreviated to TBC. Suppose there are two classes (positive vs. negative) in the training set, i.e., a binary classification problem, the issue is to allocate a new instance to one of the two classes. The basic idea of TBC is that, if is placed into the wrong class, then the difference of two samples will be blurred. To implement this idea, two tests with respect to the equality of the means of two samples are conducted, in which is placed into the set of positive instances and the set of negative instances, respectively. Accordingly, we will obtain two pvalues and , where () is generated from the test in which is assumed to belong to the positive (negative) class. If , then is classified as a positive instance. Otherwise, will be classified as a negative instance. This method works well when the theoretical pvalues can be computed and compared. However, TBC has two problems. First, when the number of features of data set is larger than the sample size of one class, the pvalues cannot be computed at all because of the singularity of the sample covariance matrix. Second, when the instances from two class are well separated, the pvalues will equal to zero.
Ghimire & Wang [7] improve the TBC method by introducing a minimum distance into the method and come up with a new classifier for image pixels. Their new method works well in the context of image pixel classification.
Modarres [16], [17], [18] studies the properties of squared Euclidean interpoint distances (IPDs) between different samples which are taken from multivariate Bernoulli, multivariate Poisson and multinomial distributions. And he also discusses some applications based on IPDs within one sample and across two samples in different distributions.
Afterwards, Guo & Modarres [8]
develop a classification method based on hypothesis testing, which is abbreviated to IDC. It is capable of classifying high dimensional instances by employing testing methods based on the IPDs between different instances. Several different test statistics based on IPDs have been discussed in
[8] and we will take the Baringhaus and Franz (BF) statistic as the example. Given two sets of training instances, i.e., one positive set and one negative set , IDC first computes the average IPDs within , within and between and , which are denoted by , and respectively. Then, it calculates . Similarly, and can be obtained by placing into and , respectively. Note that () can be used to measure the change in the value of BF when is assigned to (). Therefore, if , is classified as a positive instance; otherwise, will be labelled as negative instance.2.3 Asymmetric classification error control
In binary classification, most classifiers are constructed to minimize the overall classification error, which is a weighted sum of type I error (misclassifying a negative instance as a positive one) and type II error (misclassifying a positive instance as a negative one). However, in many realistic applications, different types of errors are often asymmetric, which have different costs and need to be treated with different weights.
The costsensitive classification (CSC) method [19], [20] can solve this problem to some extent. It takes the misclassification costs into consideration and aims to minimize the total cost of both errors. Another method is the NeymanPearson (NP) classification [21], which is inspired by classical NP hypothesis testing. It is a novel statistical framework for handling asymmetric type I/II error priorities and can seek a classifier that minimizes the type II error while maintaining the type I error below a userspecified level [22], [23]. CSC and NP classification are fundamentally different approaches that have their own pros and cons [21]. A main advantage of the NP classification is that it is a general framework that allows users to control type I classification error under with a high probability.
It is very easy to control the type I error in terms of FDR in our formulation since the pvalues of each test instance with respect to different classes will be generated in the classification phase. In other words, such testingbased classification formulation provides a unified framework for controlling the asymmetric classification error in a natural way.
3 Methods
3.1 Twosample testing
Given two independent random samples and , where is drawn from the population and is drawn from the population, the general twosample testing problem is concerned with the null hypothesis that the two samples are drawn from identical populations [9]:
where and
are the cumulative distribution functions for the
population and the population, respectively.3.2 Problem formulation
We consider the binary classification problem, in which the training set is composed of two disjoint sets and . and are called the positive training set and the negative training set, respectively. Given a test instance , the classification task is to decide its class label (positive vs. negative).
We formulate the binary classification problem as a twosample testing problem. In this formulation, the first sample is a set of n observations, where the ith observation is the distance between the test instance and the ith training instance in , i.e. . Similarly, each observation in the second sample is the distance between the test instance and each training instance in , i.e. .
To conduct the standard classification task, we may test the null hypothesis against two alternative hypotheses and to obtain two onesided pvalues ( and ). If , we will label as a positive instance. Otherwise, we will classify as a negative instance.
To handle the multiclassification problem with classes (), we can explore the onevsrest strategy by regarding the set of instances from one class as the positive training set and using the set of instances from the remaining classes as the negative training set. For each of binary classification problems, we first conduct the twosample testing to generate a onesided pvalue for the corresponding class. Then, we can assign the test instance to the class that has the smallest pvalue.
3.3 KNN variants
In the above problem formulation, the distances to all training instances are utilized in the hypothesis testing. However, the existence of outlying and irrelevant training instances may decrease the classification accuracy. To alleviate this issue, we can conduct the hypothesis testing on two samples that are derived from the kNNs of the test instance.
Under , two natural kNN variants can be formulated. Similar to the kNN classifier, the first variant is to directly take the kNNs of the test instance to generate two samples. The distances from the test instance to these k nearest training instances are divided into two groups according to the class label, where each group corresponds to one sample in our scenario. The second variant is to take nearest instances from and retrieve nearest instances from to generate two distance sets, where . The rationale behind the second variant is that, if the null hypothesis is true, then the number of kNNs from each class is proportional to the number of training instances in that class. Since when , we can take the same number of kNNs from each class in this case.
3.4 The choice of testing methods
The testing method for twosample differences has been extensively investigated in the literature. One widely used test for this issue is the WMW test, which is also called the MannWhitney U test or Wilcoxon ranksum test [24]. To obtain the test statistic in WMW test, and are merged to form a combined sample . Then, the observations in are ordered:
According to the ordered list, is defined as the rank of in and . If the null hypothesis is true, then
where
Based on the above normal approximation, we can calculate the onesided pvalue to test against () for some .
In our classification model, the choice of testing method is very flexible since the samples to be tested are unidimensional. That is, we can use any univariate twosample testing method in our classifier. Therefore, we can also employ the testing methods such as pooled ttest, twosample KolmogorovSmirnov test [25] and precedence test instead of the WMW test. In Section 5, we will further show that the use of different testing methods will establish the connection between our formulation and existing classification models.
3.5 Handling outliers and FDR control
As we have argued, the testingbased classification model has the advantage of controlling the FDR of classified test instances and handling outlying instances under the same framework. In general, we will assign the test instance to the class that has the smallest pvalue among Q pvalues, where Q is the number of classes. However, it is inappropriate to do so when all Q pvalues are not significant. Luckily, we can use FDR [26] to tackle this problem. We can obtain Q sets of pvalues from all test instances because our method returns Q pvalues to classify every test instance. Every pvalue set is firstly sorted in a nondescending order: , where is the number of all test instances. Given a significance level , let be the largest index for which
If , then the corresponding test instance will be assigned to the current class. After conducting FDR control on all Q pvalue sets, we can label the test instances that are not classified to any class as outliers.
4 Experiments
4.1 Data sets and experimental settings
We have conducted experiments on 40 data sets from the UCI [10] repository and the KEELdataset repository [11]. Among these data sets, the number of instances ranges from 80 to 10092 and the number of features varies from 2 to 90. Most data sets have less than 10 classes and only six of them have more than 10 classes. The detailed characteristics of these data sets are given in Appendix A. Moreover, the instances with missing values are discarded and the numeric feature values are normalized into the interval in the preprocessing process.
In the experiment, we perform 10fold crossvalidation (CV) and count the number of instances which have been correctly classified to compute a classification accuracy value. For every data set, we repeat the 10fold CV experiment 10 times and record the average and standard deviation of 10 accuracy values as the final results.
Methods  Avg accuracy 

IBTU  0.6795 
IBTUKD  0.8027 
IBTUKS  0.7906 
Methods  k=3  k=5  k=7  k=9 

IBTUKD  0.8027  0.7835  0.7677  0.7547 
IBTUKS  0.7906  0.7829  0.7742  0.7703 
4.2 All instances vs. kNNs
In the first experiment, we compare several variants of our formulation to check which one is better in practice. Since our method is a classifier that combines instancebased learning and hypothesis testing, we will use the abbreviation IBT to denote such a classification model. To distinguish different variants, IBTU is used to denote the classification model when the MannWhitney U test is applied to the distance sets derived from all training instances. Similarly, IBTUK is used to denote the classification model in which the distance sets are generated according to kNNs of the test instance. Furthermore, two kNN variants are denoted by IBTUKD (kNNs are obtained Directly without considering the class label) and IBTUKS (kNNs are obtained Separately from different classes), respectively.
Additionally, the parameter k for two kNN variants is specified as 3,5,7 and 9, respectively. The detailed experimental results on these three variants are given in Appendix B, C and D and their average accuracies are summarized in Table 1 and Table 2.
As shown in Table 1, the performance of IBTU is much worse than that of two kNN variants. This indicates that it is plausible to explore the kNN strategy in the testingbased classification model. As shown in Table 2, the average classification accuracies of two kNN variants are quite similar when k is varied from 3 to 9. In the forthcoming sections, we will use IBTUKD (k=3) as a representative of our classifiers in the performance comparison.
4.3 Our method vs. Other testingbased classifiers
In the second experiment, we compare our method with two previous methods, TBC [6] and IDC [8], which also use hypothesis testing to solve a classification problem. The detailed experimental results are given in Appendix E and their average accuracies are presented in Table 3.
In the implementation of TBC, we employ the Hotelling’s test as the testing method, which has been utilized in [6]. And we use the Hotelling’s statistics instead of pvalues in the classification since the generated p
values are often zeros. In the implementation of IDC, we use the Baringhaus and Franz (BF) statistic as the test statistic and assume equal prior probabilities in splite of unequal sample sizes.
For TBC, the classification accuracies on five data sets (Cleveland, Dermatology, Hepatitis, Movement_libras and Winequalityred) are 0 because the number of features of these data sets is larger than the sample size of one class, so we only use the rest 35 data sets to compute the average classification accuracy. For IDC, it can be applied to all data sets, so we simply compute the average of 40 accuracy values. According the comparison result, it’s obvious to see that our method performs significantly better than TBC and IDC.
Among these three methods, our method can achieve the best performance due to the following reasons. First, our method only consider the kNNs of test instance while TBC and IDC utilize all training instances without considering the existence of outlying and irrelevent ones. Second, our method employs a hypothesis testing strategy that is totally different from that used in TBC and IDC.
Methods  Avg accuracy 

TBC  0.5901 
IDC  0.6859 
Our method  0.8027 
Methods  Avg accuracy 

kNN  0.8058 
SVM  0.7928 
DT  0.8003 
Our method  0.8027 
4.4 Our method vs. Classic classifiers
In the third experiment, we compare our method with three classic classifiers: kNN, support vector machine (SVM) and decision tree (DT). The detailed experimental results are given in Appendix F and G and their average accuracies are presented in Table 4.
For SVM, kNN and DT, we use the functions fitcecoc, fitcknn and fitctree with their default parameter settings in Matlab 2018b, respectively. The reason for using fitcecoc function is that it can generate a multiclass model for SVM.
As shown in Table 4, our method is able to achieve the same level performance as these classic classifiers. Concretely, there are 13, 19 and 18 data sets on which our method can produce higher classification accuracies than kNN, SVM and DT among the 40 data sets, respectively. In a word, our method is competitive to these classic classifiers with respect to the overall performance.
4.5 Handling outliers through FDR control
In the last experiment, we investigate the potential of our method on outlier detection and FDR control. The
balance data set from UCI is used as an example, which has 625 instances and three classes (L, B and R). There are 288, 49 and 288 instances in the three classes respectively, as shown in Table 5. If we take a subset of the 576 (288+288) instances from the class L and R as training instances and use the 49 instances from the class B as test instances, then it is obvious that all test instances should be considered as outliers.We randomly take 80 percent of instances from the class L and R to compose the training set. In order to obtain the average performance, 10 different random training sets are generated. We use IBTU as the classifier and the significance level for FDR is set to be 0.05. The experimental results show that 48 of 49 test instances can be labelled as outliers on average. Specifically, there are at most 2 test instances which cannot be labelled as outliers and they are usually different when the training set is different. Therefore, our method is able to recognize outliers and control the FDR of classification results in the same time.
5 Relationship to Other Approaches
Our classification method is a twophase approach: two distance sets are first generated and then the twosample test is conducted. As we have discussed, we may use different significance testing methods in the second phase. In this section, we will show that the use of different testing methods will lead to different classifiers that have close relationship with existing classification models.
5.1 Connection to Nearest Centroid Classifier
The nearest centroid (mean) classifier is one of the most widely used instancebased classification models [27]. In the training phase, only the centroid for each class is calculated and stored. In the classification phase, the distance between one unknown instance and each centroid is calculated to find the nearest centroid. Then, this new test instance is assigned to the class of its nearest centroid.
If the pooled ttest is employed as the significance testing procedure in our model, then we can reveal some interesting connections between our method and the nearest centroid classifier. To simplify the analysis, we first consider the scenario of univariate data set and then discuss the case of multivariate data set.
Given two onedimensional sets and , their centroids (means) can be easily computed by and . Given an unknown instance , the distances between and these two centroids can be measured by and . The nearest centroid classification method will assign to the positive or the negative class according to whether .
In our method, two samples and are obtained and their means are denoted by and . Then, we test the null hypothesis against two alternative hypotheses and on the two samples to obtain two onesided pvalues ( and ). At last, our method will assign to the positive (negative) class if ().
Note that when the pooled ttest is employed in our method, we will obtain two t statistics ( and ). We can get
Similarly, we can also get . Therefore, our method will assign to the positive class if . Otherwise, we will label as a negative instance.
According to the triangle inequality, we can get
in which the equality holds if and only if or . Similarly, we can get in which the equality holds if and only if or .
When and , our method will assign the test instance to the same class label as the nearest centroid classification method. Obviously, the above analysis establish the equivalence between our method and the nearest centroid classifier under very strict constraints: (1) onedimensional data set, (2) the test instance is no less (more) than all training instances in each class.
For the multivariate case, it is very difficult to analyze their relationship in a quantitative manner. One naive connection is that if , then our method and the nearest centroid classification method will produce the same classification result.
5.2 Connection to kNN Classifier
The kNN classifier is one of the most popular classification methods in the literature [28]. In our formulation, if the precedence test [9] is employed as the significance testing method, then we may uncover some interesting connections between our method and the kNN classifier.
We still consider the binary classification problem in which the training data is composed of positive instances from and negative instances from . Given an unknown instance , the kNN classification method finds its k nearest neighbors (kNNs) to conduct the classification. These kNNs can be divided into two groups: positive instances from and instances from , where . If , then will be classified as a positive instance. Otherwise, is assigned to the negative class.
The precedence test is a twosample test based on the order of early failures [29]. Given two independent samples, and , let and denote their order statistics. The precedence test is based on the number of observations from one sample which exceed (precede) some threshold specified by the other sample. More precisely, the test statistic is the number of observations in that precede the rth order statistic from . Alternatively, one can use the number of observations in that exceed the sth order statistic from as the test statistic . Large values of these two test statistics will lead to the rejection of the null hypothesis that two distributions are equal.
In our problem formulation, () is the distance set between and the instances in (). Then, will be the k distance values between and its NNs. If we use the precedence test as the significance testing method and suppose that , we can set to obtain the corresponding test statistic for testing the null hypothesis against the alternative hypothesis (). Alternatively, if we let , we can obtain another test statistic for testing the null hypothesis against the alternative hypothesis (). And we can also get two pvalues, and . At last, will be assigned to the positive (negative) class if the former (latter) is smaller.
If we further assume that the positive training set and the negative training set have the same size, i.e., , then the two pvalues will be totally determined by the two test statistics: or . Therefore, our method and the kNN classifier will generate the same classification result under the above assumptions. From this aspect, we may regard our method equipped with the precedence test as a generalized ”statistical” kNN classifier.
6 Conclusion
Due to the importance of the classification problem, many effective classification algorithms have been proposed from different societies. However, most work on classification does not address the issue of statistical significance. Towards this direction, several initial research efforts have investigated the feasibility of constructing a classifier through significance testing. Unfortunately, this interesting idea has not receive much attention during the past 10 years. This is mainly because the following reasons: (1) there are still no such testingbased classifiers that can achieve the same level performance as the stateoftheart methods on real data sets; (2) the potential benefit of deploying such testingbased classifiers is still not clear.
Based on the above observations, this paper takes one step further towards this direction by formulating the classification problem as a twosample testing problem. This new formulation enables us to generate several testingbased classifiers that have comparable performance with standard classifiers such as SVM. In addition, we show that it is quite easy to handle outlying test instances and control the FDR of classification results based on the pvalues associated with each test instance.
We believe this paper will significantly contribute to the development of testingbased classification model, which will become a new promising classifier family. As the study on the testingbased classification model is still in its infancy stage, many research issues remain unexplored and should be further investigated in the future work. For example, since all the existing testingbased classifiers are based on the idea of instancebased learning, how to build a nonlazy testingbased classifier will be an interesting and challenging issue.
Appendix A
The detailed characteristics of the forty data sets is given by Table 5.
ID  Names  Instances  Features  Classes  Class Distribution  Download Links 

1  Appendicitis  106  7  2  85/21  KEEL 
2  Balance  625  4  3  288/49/288  UCI, KEEL 
3  Banana  5300  2  2  2924/2376  KEEL 
4  Bands  365(539)  19  2  230/135  UCI, KEEL 
5  Bupa  345  6  2  145/200  UCI, KEEL 
6  Cleveland  297(303)  13  5  160/54/35/35/13  UCI, KEEL 
7  Dermatology  358(366)  34  6  111/60/71/48/48/20  UCI, KEEL 
8  Haberman  306  3  2  225/81  UCI, KEEL 
9  Hayesroth  160  4  3  65/64/31  UCI, KEEL 
10  Heart  270  13  2  150/120  UCI, KEEL 
11  Hepatitis  80(155)  19  2  13/67  UCI, KEEL 
12  Ionosphere  351  34  2  225/126  UCI, KEEL 
13  Iris  150  4  3  50/50/50  UCI, KEEL 
14  Led7digit  500  7  10  45/37/51/57/52/52/47/57/53/49  UCI, KEEL 
15  Mammographic  830(961)  5  2  427/403  UCI, KEEL 
16  Marketing  6876(8993)  13  9  1255/529/505/618/527/846/784/1069/743  KEEL 
17  Monks2  432  7  2  290/142  UCI, KEEL 
18  Movement_libras  360  90  15  24/24/24/24/24/24/24/24/24/24/24/24/24/24/24  UCI, KEEL 
19  Newthyroid  215  5  3  150/35/30  UCI, KEEL 
20  Pageblocks  5473  10  5  4913/329/28/88/115  UCI, KEEL 
21  Penbased  10092  16  10  1143/1143/1144/1055/1144/1055/1056/1142/1055/1055  UCI, KEEL 
22  Phoneme  5404  5  2  3818/1586  UCL, KEEL 
23  Pima  768  8  2  500/268  UCI, KEEL 
24  Ring  7400  20  2  3664/3736  TORONTO, KEEL 
25  Satimage  6435  36  7  1533/703/1358/626/707/0/1508  UCI, KEEL 
26  Segment  2310  19  7  330/330/330/330/330/330/330  UCI, KEEL 
27  Sonar  208  60  2  97/111  UCI, KEEL 
28  Spambase  4597(4601)  57  2  2788/1813  UCI, KEEL 
29  Spectfheart  267  44  2  55/212  UCI, KEEL 
30  Tae  151  5  3  49/50/52  UCI, KEEL 
31  Texture  5500  40  11  500/500/500/500/500/500/500/500/500/500/500  UCL, KEEL 
32  Thyroid  7200  21  3  166/368/6666  UCI, KEEL 
33  Titanic  2201  3  2  1490/711  TORONTO, KEEL 
34  Twonorm  7400  20  2  3703/3697  TORONTO, KEEL 
35  Vehicle  846  18  4  212/218/199/217  UCI, KEEL 
36  Vowel  990  13  11  90/90/90/90/90/90/90/90/90/90/90  UCI, KEEL 
37  Wdbc  569  30  2  357/212  UCI, KEEL 
38  Wine  178  13  3  59/71/48  UCI, KEEL 
39  Winequalityred  1599  11  6  10/53/681/638/199/18  UCI, KEEL 
40  Wisconsin  683(699)  9  2  444/239  UCI, KEEL 
Appendix B
The detailed experimental results of IBTU are given by Table 6.
ID  Names  Avg  Std 

1  Appendicitis  0.8557  0.0046 
2  Balance  0.8800  0.0039 
3  Banana  0.5998  0.0017 
4  Bands  0.6405  0.0128 
5  Bupa  0.5574  0.0170 
6  Cleveland  0.5505  0.0048 
7  Dermatology  0.8944  0.0041 
8  Haberman  0.7144  0.0166 
9  Hayesroth  0.5581  0.0221 
10  Heart  0.8241  0.0047 
11  Hepatitis  0.8088  0.0084 
12  Ionosphere  0.6638  0.0033 
13  Iris  0.9567  0.0047 
14  Led7digit  0.7206  0.0076 
15  Mammographic  0.7952  0.0000 
16  Marketing  0.2995  0.0015 
17  Monks2  0.5185  0.0149 
18  Movement_libras  0.3883  0.0146 
19  Newthyroid  0.8581  0.0025 
20  Pageblocks  0.9043  0.0005 
21  Penbased  0.5566  0.0005 
22  Phoneme  0.7172  0.0008 
23  Pima  0.7233  0.0032 
24  Ring  0.5049  0.0000 
25  Satimage  0.7262  0.0005 
26  Segment  0.7923  0.0013 
27  Sonar  0.6861  0.0204 
28  Spambase  0.8241  0.0008 
29  Spectfheart  0.4097  0.0054 
30  Tae  0.3861  0.0125 
31  Texture  0.7414  0.0009 
32  Thyroid  0.3158  0.0015 
33  Titanic  0.7760  0.0000 
34  Twonorm  0.9770  0.0003 
35  Vehicle  0.4375  0.0086 
36  Vowel  0.2748  0.0060 
37  Wdbc  0.9404  0.0010 
38  Wine  0.9416  0.0039 
39  Winequalityred  0.5131  0.0035 
40  Wisconsin  0.9458  0.0000 
Avg  0.6795  0.0055 
Appendix C
The detailed experimental results of IBTUKD are given by Table 7.
ID  Names  k=3  k=5  k=7  k=9  

Avg  Std  Avg  Std  Avg  Std  Avg  Std  
1  Appendicitis  0.8283  0.0116  0.7764  0.0141  0.7642  0.0252  0.7170  0.0209 
2  Balance  0.7782  0.0030  0.7528  0.0039  0.7184  0.0065  0.6834  0.0078 
3  Banana  0.8642  0.0016  0.8500  0.0020  0.8338  0.0024  0.8238  0.0013 
4  Bands  0.6978  0.0132  0.6726  0.0147  0.6564  0.0121  0.6452  0.0226 
5  Bupa  0.5986  0.0087  0.5948  0.0116  0.5797  0.0196  0.5713  0.0119 
6  Cleveland  0.5380  0.0074  0.5091  0.0191  0.4707  0.0122  0.4609  0.0150 
7  Dermatology  0.9402  0.0046  0.9349  0.0076  0.9179  0.0106  0.9101  0.0072 
8  Haberman  0.6585  0.0118  0.6585  0.0180  0.6261  0.0135  0.5971  0.0096 
9  Hayesroth  0.7500  0.0189  0.7256  0.0163  0.7038  0.0232  0.6969  0.0221 
10  Heart  0.7552  0.0088  0.6856  0.0126  0.6722  0.0142  0.6652  0.0160 
11  Hepatitis  0.8150  0.0115  0.7850  0.0287  0.7425  0.0251  0.7363  0.0161 
12  Ionosphere  0.8556  0.0052  0.8575  0.0059  0.8558  0.0043  0.8541  0.0078 
13  Iris  0.9600  0.0054  0.9420  0.0077  0.9053  0.0129  0.9127  0.0097 
14  Led7digit  0.5770  0.0091  0.5230  0.0162  0.4604  0.0089  0.4286  0.0072 
15  Mammographic  0.7171  0.0061  0.7045  0.0084  0.6745  0.0069  0.6508  0.0055 
16  Marketing  0.2573  0.0016  0.2567  0.0025  0.2553  0.0024  0.2480  0.0027 
17  Monks2  0.7704  0.0124  0.7683  0.0174  0.7745  0.0182  0.7745  0.0170 
18  Movement_libras  0.8181  0.0086  0.8036  0.0113  0.7978  0.0084  0.7875  0.0155 
19  Newthyroid  0.9614  0.0058  0.9581  0.0062  0.9470  0.0070  0.9474  0.0054 
20  Pageblocks  0.9534  0.0013  0.9466  0.0015  0.9405  0.0013  0.9361  0.0016 
21  Penbased  0.9931  0.0002  0.9915  0.0005  0.9896  0.0004  0.9876  0.0005 
22  Phoneme  0.8900  0.0014  0.8675  0.0022  0.8516  0.0020  0.8415  0.0033 
23  Pima  0.6915  0.0089  0.6634  0.0096  0.6406  0.0135  0.6319  0.0134 
24  Ring  0.7894  0.0013  0.7948  0.0016  0.8003  0.0020  0.8041  0.0018 
25  Satimage  0.8949  0.0012  0.8827  0.0027  0.8706  0.0022  0.8634  0.0022 
26  Segment  0.9640  0.0017  0.9572  0.0017  0.9513  0.0017  0.9396  0.0027 
27  Sonar  0.8630  0.0089  0.8452  0.0115  0.8260  0.0084  0.7957  0.0109 
28  Spambase  0.8978  0.0017  0.8704  0.0021  0.8458  0.0026  0.8306  0.0017 
29  Spectfheart  0.6835  0.0149  0.6408  0.0129  0.6431  0.0188  0.6015  0.0131 
30  Tae  0.5874  0.0125  0.5139  0.0285  0.5192  0.0319  0.5099  0.0207 
31  Texture  0.9889  0.0005  0.9845  0.0010  0.9814  0.0009  0.9766  0.0008 
32  Thyroid  0.9038  0.0012  0.8834  0.0020  0.8663  0.0016  0.8457  0.0019 
33  Titanic  0.7897  0.0009  0.7899  0.0013  0.7717  0.0049  0.7564  0.0010 
34  Twonorm  0.9381  0.0014  0.9194  0.0019  0.9006  0.0018  0.8880  0.0018 
35  Vehicle  0.6833  0.0060  0.6619  0.0102  0.6426  0.0078  0.6344  0.0093 
36  Vowel  0.9862  0.0027  0.9767  0.0024  0.9743  0.0023  0.9618  0.0028 
37  Wdbc  0.9499  0.0037  0.9387  0.0062  0.9250  0.0060  0.9178  0.0057 
38  Wine  0.9506  0.0069  0.9365  0.0046  0.9298  0.0130  0.9022  0.0113 
39  Winequalityred  0.6196  0.0052  0.5790  0.0080  0.5444  0.0032  0.5225  0.0061 
40  Wisconsin  0.9492  0.0022  0.9384  0.0031  0.9388  0.0041  0.9290  0.0053 
Avg  0.8027  0.0060  0.7835  0.0085  0.7677  0.0091  0.7547  0.0085 
Appendix D
The detailed experimental results of IBTUKS are given by Table 8.
ID  Names  k=3  k=5  k=7  k=9  

Avg  Std  Avg  Std  Avg  Std  Avg  Std  
1  Appendicitis  0.7585  0.0119  0.7594  0.0156  0.7896  0.0214  0.8047  0.0100 
2  Balance  0.7282  0.0047  0.7490  0.0070  0.7494  0.0069  0.7878  0.0083 
3  Banana  0.8826  0.0011  0.8885  0.0014  0.8941  0.0015  0.8965  0.0010 
4  Bands  0.6915  0.0097  0.6734  0.0129  0.6770  0.0100  0.6575  0.0129 
5  Bupa  0.6232  0.0189  0.6188  0.0140  0.6101  0.0101  0.6168  0.0118 
6  Cleveland  0.4879  0.0131  0.4845  0.0092  0.4916  0.0091  0.4889  0.0081 
7  Dermatology  0.9567  0.0020  0.9536  0.0024  0.9489  0.0042  0.9464  0.0018 
8  Haberman  0.6010  0.0104  0.6173  0.0117  0.6281  0.0111  0.6212  0.0147 
9  Hayesroth  0.7325  0.0218  0.6038  0.0341  0.4988  0.0206  0.4850  0.0236 
10  Heart  0.7833  0.0066  0.7985  0.0063  0.8037  0.0086  0.8026  0.0065 
11  Hepatitis  0.7950  0.0087  0.8150  0.0053  0.8000  0.0118  0.8063  0.0106 
12  Ionosphere  0.8698  0.0030  0.8695  0.0042  0.8678  0.0047  0.8667  0.0032 
13  Iris  0.9587  0.0042  0.9600  0.0054  0.9593  0.0073  0.9587  0.0061 
14  Led7digit  0.7088  0.0049  0.7242  0.0075  0.7336  0.0075  0.7324  0.0065 
15  Mammographic  0.7760  0.0028  0.8037  0.0024  0.8060  0.0045  0.8083  0.0036 
16  Marketing  0.2922  0.0027  0.2996  0.0028  0.3052  0.0018  0.3084  0.0027 
17  Monks2  0.7752  0.0084  0.7426  0.0109  0.7384  0.0096  0.7153  0.0095 
18  Movement_libras  0.7839  0.0083  0.7106  0.0095  0.6264  0.0079  0.5964  0.0113 
19  Newthyroid  0.9577  0.0056  0.9507  0.0050  0.9577  0.0056  0.9535  0.0066 
20  Pageblocks  0.8574  0.0012  0.8441  0.0013  0.8377  0.0016  0.8427  0.0010 
21  Penbased  0.9934  0.0002  0.9919  0.0003  0.9902  0.0002  0.9889  0.0003 
22  Phoneme  0.8736  0.0018  0.8656  0.0010  0.8568  0.0016  0.8506  0.0017 
23  Pima  0.7250  0.0045  0.7354  0.0069  0.7316  0.0052  0.7311  0.0067 
24  Ring  0.7155  0.0021  0.6885  0.0012  0.6687  0.0013  0.6539  0.0016 
25  Satimage  0.9024  0.0016  0.9021  0.0013  0.8988  0.0013  0.8959  0.0009 
26  Segment  0.9601  0.0014  0.9515  0.0015  0.9506  0.0018  0.9487  0.0016 
27  Sonar  0.8375  0.0101  0.8341  0.0129  0.7947  0.0072  0.7683  0.0136 
28  Spambase  0.9047  0.0018  0.9031  0.0011  0.9048  0.0013  0.9026  0.0017 
29  Spectfheart  0.6296  0.0101  0.5906  0.0092  0.5918  0.0092  0.5809  0.0076 
30  Tae  0.5318  0.0179  0.5252  0.0269  0.5152  0.0112  0.5099  0.0259 
31  Texture  0.9868  0.0004  0.9835  0.0007  0.9811  0.0005  0.9785  0.0007 
32  Thyroid  0.7707  0.0016  0.7826  0.0024  0.7568  0.0016  0.7504  0.0021 
33  Titanic  0.7601  0.0000  0.7607  0.0013  0.7883  0.0008  0.7892  0.0001 
34  Twonorm  0.9667  0.0008  0.9710  0.0005  0.9726  0.0005  0.9732  0.0007 
35  Vehicle  0.7116  0.0067  0.7047  0.0119  0.6974  0.0067  0.6918  0.0070 
36  Vowel  0.9606  0.0048  0.8629  0.0095  0.7551  0.0113  0.6969  0.0093 
37  Wdbc  0.9645  0.0026  0.9664  0.0021  0.9680  0.0031  0.9685  0.0029 
38  Wine  0.9534  0.0075  0.9528  0.0054  0.9517  0.0060  0.9573  0.0039 
39  Winequalityred  0.4826  0.0070  0.4994  0.0057  0.4946  0.0066  0.5063  0.0078 
40  Wisconsin  0.9750  0.0034  0.9755  0.0029  0.9739  0.0013  0.9735  0.0016 
Avg  0.7906  0.0059  0.7829  0.0068  0.7742  0.0061  0.7703  0.0064 
Appendix E
The detailed experimental results of TBC and IDC are given in Table 9.
ID  Names  IBC  IDC  

Avg  Std  Avg  Std  
1  Appendicitis  0.8613  0.0064  0.8075  0.0101 
2  Balance  0.8654  0.0050  0.7618  0.0065 
3  Banana  0.5568  0.0013  0.7313  0.0019 
4  Bands  0.6088  0.0115  0.5841  0.0141 
5  Bupa  0.6275  0.0088  0.5803  0.0086 
6  Cleveland  0  0  0.4892  0.0126 
7  Dermatology  0  0  0.8746  0.0066 
8  Haberman  0.7310  0.0064  0.6876  0.0222 
9  Hayesroth  0.5288  0.0053  0.4744  0.0238 
10  Heart  0.8396  0.0072  0.8170  0.0040 
11  Hepatitis  0  0  0.8475  0.0211 
12  Ionosphere  0.8695  0.0057  0.7513  0.0043 
13  Iris  0.6667  0.0000  0.9060  0.0021 
14  Led7digit  0.2622  0.0109  0.4736  0.0080 
15  Mammographic  0.8088  0.0016  0.7982  0.0017 
16  Marketing  0.2652  0.0029  0.1284  0.0018 
17  Monks2  0.5294  0.0206  0.6391  0.0140 
18  Movement_libras  0  0  0.2642  0.0169 
19  Newthyroid  0.3023  0.0000  0.8377  0.0056 
20  Pageblocks  0.0750  0.0006  0.8892  0.0007 
21  Penbased  0.1998  0.0000  0.6636  0.0004 
22  Phoneme  0.7595  0.0005  0.7684  0.0010 
23  Pima  0.7615  0.0041  0.7177  0.0028 
24  Ring  0.7621  0.0008  0.9603  0.0004 
25  Satimage  0.3448  0.0002  0.6317  0.0010 
26  Segment  0.2857  0.0000  0.6624  0.0022 
27  Sonar  0.7447  0.0159  0.7226  0.0116 
28  Spambase  0.9064  0.0011  0.8376  0.0006 
29  Spectfheart  0.6105  0.0122  0.7528  0.0138 
30  Tae  0.4974  0.0096  0.4020  0.0153 
31  Texture  0.3163  0.0013  0.5477  0.0022 
32  Thyroid  0.0713  0.0001  0.9239  0.0005 
33  Titanic  0.7807  0.0008  0.6900  0.0015 
34  Twonorm  0.9781  0.0003  0.9771  0.0001 
35  Vehicle  0.5324  0.0196  0.3018  0.0041 
36  Vowel  0.1818  0.0000  0.2899  0.0060 
37  Wdbc  0.9617  0.0023  0.9374  0.0017 
38  Wine  0.6011  0.0000  0.9472  0.0060 
39  Winequalityred  0  0  0.4021  0.0026 
40  Wisconsin  0.9581  0.0012  0.9555  0.0016 
Avg  0.5901  0.0047  0.6859  0.0066 
Appendix F
The detailed experimental results of kNN are given in Table 10.
ID  Names  k=3  k=5  k=7  k=9  

Avg  Std  Avg  Std  Avg  Std  Avg  Std  
1  Appendicitis  0.8406  0.0094  0.8642  0.0119  0.8764  0.0030  0.8708  0.0100 
2  Balance  0.8485  0.0065  0.8661  0.0059  0.8813  0.0048  0.8928  0.0048 
3  Banana  0.8841  0.0014  0.8896  0.0012  0.8942  0.0021  0.8978  0.0012 
4  Bands  0.7093  0.0122  0.6942  0.0122  0.6797  0.0083  0.6712  0.0098 
5  Bupa  0.6371  0.0113  0.6078  0.0130  0.6238  0.0121  0.6293  0.0134 
6  Cleveland  0.5545  0.0152  0.5545  0.0057  0.5663  0.0115  0.5626  0.0117 
7  Dermatology  0.9623  0.0033  0.9592  0.0027  0.9575  0.0039  0.9517  0.0040 
8  Haberman  0.6954  0.0109  0.6944  0.0082  0.7111  0.0054  0.7186  0.0070 
9  Hayesroth  0.6350  0.0187  0.5575  0.0255  0.4344  0.0215  0.3581  0.0228 
10  Heart  0.7778  0.0089  0.8033  0.0066  0.8126  0.0068  0.8115  0.0069 
11  Hepatitis  0.8288  0.0145  0.8525  0.0255  0.8800  0.0134  0.8563  0.0169 
12  Ionosphere  0.8570  0.0044  0.8501  0.0054  0.8393  0.0041  0.8425  0.0043 
13  Iris  0.9507  0.0034  0.9560  0.0034  0.9673  0.0066  0.9527  0.0049 
14  Led7digit  0.6598  0.0077  0.7116  0.0047  0.7090  0.0058  0.7234  0.0041 
15  Mammographic  0.7678  0.0055  0.7981  0.0067  0.7999  0.0051  0.8027  0.0050 
16  Marketing  0.2872  0.0030  0.2942  0.0015  0.2990  0.0025  0.3050  0.0020 
17  Monks2  0.7972  0.0072  0.8000  0.0054  0.7914  0.0127  0.7644  0.0074 
18  Movement_libras  0.8075  0.0049  0.7417  0.0103  0.7181  0.0090  0.6739  0.0218 
19  Newthyroid  0.9409  0.0044  0.9381  0.0058  0.9316  0.0054  0.9237  0.0050 
20  Pageblocks  0.9596  0.0012  0.9583  0.0009  0.9545  0.0009  0.9536  0.0006 
21  Penbased  0.9935  0.0004  0.9926  0.0004  0.9919  0.0003  0.9905  0.0003 
22  Phoneme  0.8878  0.0021  0.8808  0.0028  0.8752  0.0017  0.8701  0.0023 
23  Pima  0.7396  0.0055  0.7367  0.0072  0.7449  0.0055  0.7357  0.0046 
24  Ring  0.7186  0.0014  0.6922  0.0010  0.6747  0.0012  0.6608  0.0017 
25  Satimage  0.9096  0.0012  0.9078  0.0011  0.9065  0.0015  0.9049  0.0019 
26  Segment  0.9613  0.0020  0.9532  0.0014  0.9502  0.0015  0.9481  0.0015 
27  Sonar  0.8303  0.0072  0.8135  0.0115  0.7880  0.0135  0.7457  0.0175 
28  Spambase  0.9019  0.0021  0.9030  0.0015  0.8995  0.0013  0.8959  0.0023 
29  Spectfheart  0.7150  0.0134  0.7390  0.0149  0.7629  0.0142  0.7547  0.0124 
30  Tae  0.5119  0.0153  0.5219  0.0184  0.5086  0.0253  0.4927  0.0263 
31  Texture  0.9878  0.0005  0.9853  0.0005  0.9828  0.0007  0.9809  0.0007 
32  Thyroid  0.9391  0.0008  0.9407  0.0005  0.9401  0.0005  0.9400  0.0002 
33  Titanic  0.6109  0.0107  0.7796  0.0118  0.7819  0.0013  0.7816  0.0034 
34  Twonorm  0.9650  0.0010  0.9697  0.0007  0.9705  0.0008  0.9714  0.0006 
35  Vehicle  0.7033  0.0051  0.7025  0.0054  0.7039  0.0055  0.6941  0.0096 
36  Vowel  0.9706  0.0025  0.9387  0.0057  0.8871  0.0071  0.7972  0.0108 
37  Wdbc  0.9692  0.0017  0.9678  0.0024  0.9705  0.0027  0.9692  0.0028 
38  Wine  0.9640  0.0039  0.9573  0.0089  0.9596  0.0052  0.9567  0.0088 
39  Winequalityred  0.5839  0.0062  0.5902  0.0069  0.5797  0.0040  0.5803  0.0042 
40  Wisconsin  0.9691  0.0022  0.9742  0.0024  0.9728  0.0019  0.9706  0.0021 
Avg  0.8058  0.0060  0.8085  0.0067  0.8045  0.0060  0.7951  0.0069 
Appendix G
The detailed experimental results of SVM and DT are given in Table 11.
ID  Names  SVM  DT  

Avg  Std  Avg  Std  
1  Appendicitis  0.8736  0.0049  0.8358  0.0135 
2  Balance  0.8698  0.0060  0.7894  0.0080 
3  Banana  0.5517  0.0000  0.8799  0.0027 
4  Bands  0.6877  0.0107  0.6285  0.0272 
5  Bupa  0.5791  0.0018  0.6571  0.0183 
6  Cleveland  0.5859  0.0104  0.5091  0.0079 
7  Dermatology  0.9673  0.0019  0.9374  0.0058 
8  Haberman  0.7340  0.0017  0.6935  0.0139 
9  Hayesroth  0.5144  0.0198  0.8181  0.0192 
10  Heart  0.8374  0.0041  0.7581  0.0196 
11  Hepatitis  0.8575  0.0278  0.8350  0.0269 
12  Ionosphere  0.8821  0.0054  0.8806  0.0101 
13  Iris  0.9613  0.0061  0.9487  0.0045 
14  Led7digit  0.7392  0.0075  0.7114  0.0075 
15  Mammographic  0.7959  0.0026  0.7988  0.0065 
16  Marketing  0.3210  0.0014  0.2970  0.0032 
17  Monks2  0.6713  0.0000  0.9067  0.0130 
18  Movement_libras  0.7197  0.0117  0.6572  0.0265 
19  Newthyroid  0.8944  0.0062  0.9298  0.0060 
20  Pageblocks  0.9342  0.0005  0.9649  0.0010 
21  Penbased  0.9784  0.0004  0.9582  0.0010 
22  Phoneme  0.7731  0.0008  0.8650  0.0032 
23  Pima  0.7699  0.0032  0.7078  0.0105 
24  Ring  0.7651  0.0008  0.8858  0.0028 
25  Satimage  0.8646  0.0008  0.8608  0.0039 
26  Segment  0.9303  0.0012  0.9568  0.0039 
27  Sonar  0.7736  0.0169  0.7221  0.0185 
28  Spambase  0.9031  0.0009  0.9190  0.0028 
29  Spectfheart  0.7951  0.0018  0.7401  0.0155 
30  Tae  0.5364  0.0219  0.5444  0.0168 
31  Texture  0.9873  0.0003  0.9220  0.0030 
32  Thyroid  0.9371  0.0001  0.9960  0.0004 
33  Titanic  0.7760  0.0000  0.7898  0.0013 
34  Twonorm  0.9783  0.0003  0.8431  0.0048 
35  Vehicle  0.7356  0.0039  0.7139  0.0115 
36  Vowel  0.7129  0.0062  0.7666  0.0111 
37  Wdbc  0.9773  0.0027  0.9185  0.0040 
38  Wine  0.9860  0.0040  0.9096  0.0107 
39  Winequalityred  0.5841  0.0027  0.6077  0.0102 
40  Wisconsin  0.9687  0.0021  0.9492  0.0042 
Avg  0.7928  0.0050  0.8003  0.0095 
Acknowledgments
This work was partially supported by the Natural Science Foundation of China (Nos. 61572094, 61771331) and the Fundamental Research Funds for the Central Universities (No. DUT2017TB02).
References
 [1] M. F. Delgado, E. Cernadas, S. Barro, and D. G. Amorim, “Do we need hundreds of classifiers to solve real world classification problems?” Journal of Machine Learning Research, vol. 15, no. 1, pp. 3133–3181, 2014.
 [2] T. M. Cover and P. E. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967.
 [3] C. Cortes and V. Vapnik, “Support vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
 [4] L. Breiman, “Random forests,” Machine Learning, vol. 45, pp. 5–32, 2001.
 [5] O. Wagih, J. Reimand, and G. D. Bader, “MIMP: Predicting the impact of mutations on kinasesubstrate phosphorylation,” Nature Methods, vol. 12, no. 6, pp. 531–3, 2015.
 [6] S.M. Liao and M. Akritas, “Testbased classification: A linkage between classification and statistical testing,” Statistics & probability letters, vol. 77, no. 12, pp. 1269–1281, 2007.
 [7] S. Ghimire and H. Wang, “Classification of image pixels based on minimum distance and hypothesis testing,” Computational Statistics & Data Analysis, vol. 56, no. 7, pp. 2273–2287, 2012.
 [8] L. Guo and R. Modarres, “Interpoint distance classification of high dimensional discrete observations,” International Statistical Review, 2018.
 [9] J. D. Gibbons and S. Chakraborti, Nonparametric statistical inference, 5th ed. CRC Press, 2011.
 [10] D. Dheeru and E. Karra Taniskidou, “UCI machine learning repository.” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
 [11] J. AlcaláFdez, A. Fernández, J. Luengo, J. Derrac, and S. García, “Keel datamining software tool: Data set repository, integration of algorithms and experimental analysis framework,” Journal of MultipleValued Logic & Soft Computing, vol. 17, pp. 255–287, 2011.
 [12] T. M. Mitchell, Machine Learning, 1st ed. New York, NY, USA: McGrawHill, Inc., 1997.
 [13] D. R. Wilson and T. R. Martinez, “Reduction techniques for instancebased learning algorithms,” Machine learning, vol. 38, no. 3, pp. 257–286, 2000.
 [14] S. Garcia, J. Derrac, J. Cano, and F. Herrera, “Prototype selection for nearest neighbor classification: Taxonomy and empirical study,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 3, pp. 417–435, 2012.
 [15] J. Derrac, S. García, and F. Herrera, “Fuzzy nearest neighbor algorithms: Taxonomy, experimental analysis and prospects,” Information Sciences, vol. 260, pp. 98–119, 2014.
 [16] R. Modarres, “On the interpoint distances of Bernoulli vectors,” Statistics & Probability Letters, vol. 84, pp. 215–222, 2014.
 [17] ——, “Multivariate Poisson interpoint distances,” Statistics & Probability Letters, vol. 112, pp. 113–123, 2016.
 [18] ——, “Multinomial interpoint distances,” Statistical Papers, vol. 59, no. 1, pp. 341–360, 2018.

[19]
C. Elkan, “The foundations of costsensitive learning,” in
International joint conference on artificial intelligence
, vol. 17, no. 1. Lawrence Erlbaum Associates Ltd, 2001, pp. 973–978.  [20] B. Zadrozny, J. Langford, and N. Abe, “Costsensitive learning by costproportionate example weighting,” in Data Mining, 2003. ICDM 2003. Third IEEE International Conference on. IEEE, 2003, pp. 435–442.
 [21] C. Scott and R. Nowak, “A NeymanPearson approach to statistical learning,” IEEE Transactions on Information Theory, vol. 51, no. 11, pp. 3806–3819, 2005.
 [22] X. Tong, Y. Feng, and A. Zhao, “A survey on NeymanPearson classification and suggestions for future research,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 8, no. 2, pp. 64–81, 2016.
 [23] X. Tong, Y. Feng, and J. J. Li, “NeymanPearson classification algorithms and NP receiver operating characteristics,” Science Advances, vol. 4, no. 2, 2018. [Online]. Available: http://advances.sciencemag.org/content/4/2/eaao1659

[24]
H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,”
Annals of Mathematical Statistics, vol. 18, no. 1, pp. 50–60, 1947.  [25] J. Wang, W. W. Tsang, and G. Marsaglia, “Evaluating Kolmogorov’s distribution,” Journal of Statistical Software, vol. 8, no. 18, 2003.
 [26] Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: A practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society, vol. 57, no. 1, pp. 289–300, 1995.
 [27] J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning. Springer series in statistics New York, NY, USA, 2001.
 [28] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, S. Y. Philip et al., “Top 10 algorithms in data mining,” Knowledge and information systems, vol. 14, no. 1, pp. 1–37, 2008.
 [29] N. Balakrishnan and H. T. Ng, Precedencetype tests and applications. John Wiley & Sons, Hoboken, NJ, 2006.
Comments
There are no comments yet.