1 Introduction
Active learning [1]
is a wellstudied subject area in many machine learning and data mining scenarios such as text AL
[17], image AL [18] [47] [48], transfer AL [19] [20], online learning [2][3], and so on, where the unannotated resources are abundant and cheap, but collecting massive annotated data is expensive, timeconsuming, and impractical. In this learning process, reducing the prediction error rate of the version space (data set) and being able to achieve this through fewer queries and little training is the goal of an active learner. To improve the performance of the classifier, the learner is allowed to sample a subset from an unlabeled data pool to select those instances that provide the main support for constructing the classification model in AL. Usually, training the optimal classification hypothesis by accessing the unlabeled data pool and querying their true labels for a certain number is essential, but this may encounter a selection difficulty because there is a large amount of unlabeled data in the pool.To tackle this issue, uncertainty sampling [4] was proposed to guide AL by selecting the most important instances in a given sampling scheme or distribution assumption, such as Margin [5]
, uncertainty probability
[6], maximum entropy[7], confused votes by committee [8], maximum model diameter [9], maximum unreliable [44], and so on. Therefore, the main issue for AL is to find a way to reduce the number of queries or converge the classifier quickly to reduce the total cost of the learning process. Accompanied by multiple iterations, querying stops when the defined sampling number is met or a satisfactory model is found. However, it is still necessary to traverse the huge version space repeatedly in this framework, although this technique performs well.Querying the labels of sampled data is a reasonable approach to improving the AL prediction model when the training set is insufficient, but devising such positive evaluation rules is awkward because neither the learners nor the human annotators know which instances are the most important in the pool. In general, we seek methods with advantages of (1) high efficiency in querying the most effective or important instances; and (2) low redundancy in reducing the queries on redundant or useless instances. Intuitively, training a robust prediction model that performs well on unannotated data is the common goal of the different AL approaches and there have also been many uncertain evaluation strategies proposed to achieve this goal. However, they always suffer from one main limitation, that is, heuristically searching the complete version space to obtain the optimal sampling subset is impossible because of the unpredictable scale of the candidate set.
In practice, it might be more efficient when the optimal classification model can still be trained by a subspace without any prior experience, and this will solve the previously mentioned limitation in a different way [14] [45] [46]
. For reliable space scaling, hierarchical sampling utilizes unsupervised learning as a way of obtaining the cluster bone points to improve the sampling (see Figure 1(b)). Although this provides positive support with more informative instances, the data points within clusters always have weak or no influence on the current model because of their clear class labels. We call these data points
core points. On the other hand, hierarchical sampling does sample some redundant points to annotate its subtree by its root node’s label. Interestingly, after removing the core points, a similar trained model can still be obtained, although only the cluster boundary points are retained (see Figure 1(c)).In this paper, the cluster boundary points detection problem is considered equivalent to a geometric description problem, where boundary points are located at the geometric surface of one high dimensional enclosing space. By utilizing the geometric features of manifold space, [23] reconstructed the geometric space by local representative sampling for AL, and [24] mapped the underlying geometry of the data by its manifold adaptive kernel space for AL. Therefore, we consider the cluster boundary points detection problem as the enclosing ball boundary fitting, which is popular in a hardmargin support vector data description (SVDD) [16]. In this oneclass classification problem, fitting the hyperplane of the high dimensional ball is used to improve the generalization of the training model when the trained data labels are imbalanced. To reduce the time consumption of multiple quadratic programming (QP) in large scale data, [51] [52] changed the SVM to a problem of minimum enclosing ball (MEB) and then iteratively calculated the ball center and radius in a (1+) approximation. Trained by the detected core sets, the proposed Core vector Machine (CVM) performed faster than the SVM and needed less support vectors. Especially in the Gaussian kernel, a fixed radius was used to simplify the MEB problem to the EB (Enclosing Ball), and accelerated the calculation process of the Ball Vector Machine (BVM) [54]. Without sophisticated heuristic searches in the kernel space, the training model, using points of high dimensional ball surface, can still be approximated to the optimal solution. However, the MEB alone could not calculate the fitting hyperplane of the ball and nor could obtain the real boundary points of the ball. This is because the kernel data space might not be a complete ball space or the ball surface might not be tight to class within the ball (Figure 2(a)).
To obtain a tighter [55] enclosing ball boundary, we split the MEB, which is a global optimization problem, into two types of local minimum enclosing ball (LMEB) issues, where one type is the Bball (boundary ball), the other is Cball (core ball), and centers of the Bballs are the enclosing ball boundary points. This approach tries to optimize the goodness of fit to obtain the whole geometric boundary points for each cluster. Figure 2(b) shows the motivation for this approach. The above observations and investigations motivated us to propose a new AL strategy  Local Enclosing Ball (LEB), which utilizes the MEB approach to obtain the enclosing ball boundary points for AL sampling. Our contributions in this paper are:

We propose an idea of reducing the uncertainty sampling space to an enclosing ball boundary hyperplane and validate it in various settings of classification.

We develop an AL approach termed LEB that samples independently without iteration and help from labeled data.

We break the theoretical curse of uncertainty sampling by enclosing ball boundary in AL since LEB is neither a modelbased nor labelbased strategy with the fixed time and space complexities of and respectively.

We conduct experiments to verify that LEB can be applied in multiclass settings to overcome the binary classification limitation of many existing AL approaches.
The remainder of this paper is structured as follows. The preliminaries are described in Section 2 and the performance of cluster boundary is defined in Section 3.1 (Theorem 1). To prove it, we discuss the model distance (Lemma 1 of Section 3.2) and inclusion relation of classifiers (Lemma 2 of Section 3.3) between cluster boundary and core points in binary classification, multiclass of low and high dimensional settings, respectively. The background for the MEB problem is presented in Section 4.1. Then, we optimize the geometric update of radius (Section 4.2) and center (Section 4.3) when extending the local ball space, in which the established update optimization equation is analyzed in Lemmas 35 of Section 4.4. Based on the above findings, we design the LEB algorithm in Section 4.5, analyze its time and space complexities in Section 4.6, and discuss its advantages in Section 4.6. The experiments and results, including eight geometric clustering data sets and one unstructured letter recognition data set, are reported in Sections 5.15.3. Then Section 5.4 further discusses the time performance of different AL approaches. Finally, we conclude this paper in Section 6.
2 Preliminary
In this section, we firstly describe the general AL problem, and then classify the unlabeled data into two kinds of objects involved with evaluating whether a sampled data point will benefit the classifier training. As we define the AL sampling issue as geometric cluster boundary detection problem, we introduce some related geometric structures for geometric AL, where the related definitions, main notations and variables are briefly summarized in Table 1.
Given represents data space , where and the label space , considering the classifier:
(1) 
where is the parameter vector and is the constant vector, here gives:
Definition 1. Active Learning. Optimizing to get the minimum RSS (residual sum of squares)[22] [23]:
(2) 
i.e.,
(3) 
where is the labeled data, is the queried data, and is the updated training set.
Given hypothesis , the error rate change of predicting after adding the queried data is
(4) 
where represents the prediction error rate of when training the input classification model.
Definition 2. Effective point: If , is an effective point that shows positive help for the next training after adding it to . Here , and it is an impactor factor that decides whether the data point will affect .
Definition 3. Redundant point: If , is a redundant point that has weak or negative influence on the current and future model .
In an enclosing geometric space, we scale the AL sampling issue as cluster boundary point detection problem. Here we introduce some related geometric structures in AL.
Definition 4. Cluster boundary point[11]:
A boundary point is an object that satisfies the following conditions:
1. It is within a dense region .
2. region near , or .
Definition 5. Core point:
A core point is an object that satisfies the following conditions:
1. It is within a dense region .
2. an expanded region based on , .
Definition 6. Enclosing ball boundary:
An enclosed high dimensional hyperplane connects all the boundary points.
Let define the boundary points of one class, and , where is the number of boundary points, then the closed hyperplane satisfies the following conditions:
1. Most of the boundary points are distributed in the hyperplane.
2.
where , and is a dimension constant vector.
Notation  Definition 

classifiers  
hyperplane of the enclosing ball  
prediction error rate of when training  
data set  
data number of  
number of labeled, unlabeled, queried data  
label set  
a data point in  
labeled data points in  
queried data points in  
training set after querying  
distance function  
core points  
cluster boundary points  
noises  
training set of []  
core points located inside the positive class  
core points located inside the negative class  
cluster boundary points located near  
noises  
core points  
boundary points  
noises  
distribution function  
variables  
constant  
ball  
ball center  
radius  
relaxation variable  
C  userdefined parameter 
coefficient vector  
Lagrange parameter  
kernel matrix  
K(, )  kernel change between and 
a point of ’s KNN 
3 Motivation
In clusteringbased AL work, core points are redundant points because of their clear class labels, and provide a little help for parameter training of classifiers. Considering cluster boundary points may provide decisive factors for the support vectors, CVM and BVM use the points distributed on the hyperplane of an enclosing ball to train fast core support vectors in largescale data sets. Their significant success motivates the work of this paper.
To further show the importance of cluster boundary points, we (1) clarify the performance of training cluster boundary points in Section 3.1, (2) discuss the model distance to the classification line or hyperplane of boundary and core points in Section 3. 2, and (3) analyze the inclusion relation of classifiers when training boundary and core points in Section 3.3, where the discussion cases of (2) and (3) are binary, and multiclass classifications of low and high dimensional space.
3.1 Performance of cluster boundary
In this paper, we consider the performance of classification model is determined by cluster boundary points. Therefore, we have
Theorem 1. The performance of classification model by training cluster boundary points are similar with that of boundary points, that is to say,
(5) 
where represents the core points, represents the cluster boundary points, and =[].
Theorem 1 aims to show which core points are redundant and have little influences on training h. The objective function is supported by Lemma1 and 2 in the next subsections. One of them is that cluster boundary points are closer to classification model compared with other data and the other is that the trained models based on core points are a subset of the boundary points’. Then, the detailed proofs of the two Lemmas are discussed in settings of binary, multiclass settings of low and high dimension space, respectively.
3.2 Model distance
Model distance function is defined as the distance to the classification line or hyperplane of one data point. The model distance relations of boundary points and core points are described in the following Lemma 1.
Lemma 1. The model distance of boundary points are bigger than that of core points, that is to say,
(6) 
Lemma 1 is divided into three different cases:

Corollary 1: binary classification in low dimensional space, where Corollaries 1.1 and 1.2 prove Theorem 1 in the adjacent classes and separation classes, respectively.

Corollary 2: multiclass classification problem in low dimensional space.

Corollary 3: high dimensional space.
Corollary 1: Binary classification in low dimensional space
Given two facts in the classification: (1) the data points far from h usually have clear assigned labels with a high prediction class probability; (2) h is always surrounded by noises and a part of the boundary points. Based on these facts, the proof is as follows.
Corollary 1.1: Adjacent classes
Proof.
For the binary classification of the adjacent classes problem (see Figure 3(a)) with {1,+1}, we get the result:
(7) 
where represents the core points located inside the positive class, represents the core points located inside the negative class, represents the cluster boundary points near h, and represents the noises near h. Here , and represent their numbers of the four types of points.
Because noises always have wrong guidance on model training, we only focus on the differences between the core and boundary points, that is to say,
(8) 
The distance function between and in space is:
(9) 
Because the classifier definition is , , then Lemma 1 is established when (see Figure 3(b)). ∎
Corollary 1.2: Separation classes
Proof.
In the separation classes problem (see Figure 3(c)), the trained model based on any data points will lead to a strong classification result, that is to say, all AL approaches will perform well in this setting since:
(10) 
where represents the boundary points near h in the positive class, represents the boundary points near h in the negative class, , and . Let , , we can still have the results of Eq. (8) and (9). ∎
Corollary 2: Multiclass classification in low dimensional space
Proof.
In this setting, , the classifier set , and cluster boundary points are segmented into parts , where represents the data points close to , (see Figure 3(d))). Based on the result of Case 1, dividing the multiclass classification problem into binary classification problems, we can obtain:
(11) 
and
(12) 
where represents the core points near . Then, the following holds:
(13) 
∎
Corollary 3: High dimensional space
Proof.
In high dimensional space, the distance function between and hyperplane is
(14) 
where , and is a dimension vector. Because the above equation is the mdimension extension of Eq. (9), the proof relating to low dimensional space is still valid in high dimensional space. ∎
3.3 Inclusion relation of classifiers
Inclusion relation of classifiers is the collection relation of training different data sub sets. Lemma 2 shows this relation of training boundary and core points, respectively.
Lemma 2. Training models based on are the subset of models based on ,
that is to say,
(15) 
It shows training models based on can predict well, but the model based on may sometimes not predict well. To prove this relation, we discuss it in three different cases:

Corollary 4: binary classification in low dimensional space, where Corollary 4.1 and Corollary 4.2 prove Lemma 2 in onedimension space and twodimension space, respectively.

Corollary 5: binary classification in high dimensional space.

Corollary 6: multiclass classification.
Corollary 4: Binary classification in low dimensional space
Corollary 4.1: Linear onedimension space
Proof.
Given point classifier in the linear onedimension space as described in Figure 4(a),
(16) 
where are core points. In comparison, the boundary points of have smaller distances to the optimal classification model , i.e., . Therefore, it is easy to conclude: Then, classifying and by is successful, but we cannot classify and by , or , respectively. ∎
Corollary 4.2: Two dimensional space
Proof.
Given two core points in the two dimensional space, the line segment between them is described as follows:
(17) 
Training and can get the following classifier:
(18) 
where is the angle between (see Figure 4(b)).
Similarly, the classifier trained by is subject to:
(19) 
where is the line segment between and . Intuitively, the difference of and is their constraint equation. Because , we can conclude:
(20) 
It aims to show cannot classify and when or in the constraint equation. But for any , it can classify correctly. ∎
Corollary 5: High dimensional space
Proof.
Given two core points , the Bounded Hyperplane between them is:
(21) 
Training the two data points can get the following classifier:
(22) 
where is the angle between and , is the normal vector of . Given point , which is located on , if , in the positive class or in the negative class, cannot predict and correctly. It can also be described as follows: if segments the bounded hyperplane between and , or and , the trained can not classify and . Then Lemma 2 is established. ∎
Corollary 6: Multiclass classification
Proof.
Like the multiclass classification proof of Lemma 1, the multiclass problem can be segmented into parts of binary classification problems. ∎
4 Enclosing ball boundary
The hardmargin support data description in oneclass classification is equivalent to the MEB (Minimum Enclosing Ball) problem which attempts to find the radius component of the radiusmargin bound and its center. It is described in Section 4.1. To improve the fitting of the ball boundary, we split the ball of each cluster to two kinds of different small balls: Cball (core balls) and Bball (boundary balls), where core balls are located within the clusters, and boundary balls are located at the edge of clusters.
Our task of this section is to detect the Bball of each cluster by calculating the increment of the ball radius (Section 4.2) and center (Section 4.3) when extending the local space, where the two types of interments of Bball are bigger than that of Cball. To enhance the difference of the two types of local features, we consider both radius and center updates to propose an optimization scheme in Section 4.4 and develop the LEB algorithm in Section 4.5. The time and space complexities then are analyzed in Section 4.5. Finally, the advantages of our approach are further discussed in Section 4.7.
4.1 MEB in SVDD
The MEB problem is to optimize the [51][52]:
(23) 
where is the ball radius, and is the ball center. The corresponding dual is to optimize:
(24) 
where is the relaxation variable and is a userdefined parameter to describe . The optimization result is:
(25) 
where . According to the conclusion in [51] [52], is close to a constant, and then the optimization task changes to:
(26) 
4.2 Update radius
Intuitively, the geometric volume of Bball is larger than that of Cball by the global characteristics description. Therefore, the local characteristics of radius update when adding more data to the current enclosing ball, will benefit the enhancement of characteristics scale.
based on the global characteristics and the local characteristics of radius update when adding more data to the current enclosing ball benefits the enhancement of characteristics scale.
When the data is added to on time , the new radius is:
(27) 
where , and is the updated kernel matrix after adding. Then, the square increment of the radius is:
(28) 
where , , is the th row of matrix and is the th column of matrix . Therefore, after  times of adding, the kernel matrix changes to :
(29) 
Let , and . The square increment of adding features to is close to:
(30) 
In the kernel matrix, , therefore the optimization task changes to:
(31) 
4.3 Update center
Path change of ball center when adding more data to the current enclosing ball is another important local characteristics to distinguish between Bball and Cball, where the length of path update of Bball is bigger than that of Cball.
Given as the ball center of ttime, the optimization objective function is [54]:
(32) 
and the Lagrange equation is:
(33) 
On setting its derivative to zero, we obtain:
(34) 
where . As such,
(35) 
where is a constant. Therefore, the increment of adding features to can be written as:
(36) 
The matrix form is:
(37) 
where .
4.4 Geometric update optimization
To enhance the difference of local geometric feature of Bball and Cball, we consider both the radius and center update and discuss the properties of the optimized objective function.
The kernel update of and is:
(38) 
Let to calculate the update:
(39) 
where , .
Next, let us produce some properties for this objective function. The detailed proofs of the following lemmas are presented in the Appendix.
Lemma 3. Suppose that , where , . Otherwise, when .
Lemma 4. is a monotonically increasing function on .
Lemma 5.
4.5 LEB algorithm
Based on the conclusion of Lemmas 35, we find: and increases with the extension of local ball volume. Therefore, we propose an AL sampling algorithm called LEB (see Algorithm 1).
To calculate the update of radius and center, we need to capture the neighbors of each data point. After initialization in Lines 13, Lines 6 calculates the kNN matrix of using the Kdtree with a time consumption of . Then, Line 7 8 iteratively calculate by Eq. (39) and stores it in , where the used kernel function is RBF kernel.
However, radius and center updates of noises sometimes may be larger than that of ball boundary points. To smooth noises, Line 10 sorts times by descending, where is the querying number. After sorting, the sorted values of are stored in matrix and their corresponding positions are stored in matrix .
Intuitively, the data with an update value located in the interval of and is a noise and ball boundary point respectively, where is the round down operation. In other words, the input parameter is an effective liner segmentation of noises and querying data by their update values of Eq. (39).
After capturing the update range of ball boundary points, Line 11 finds the position of queried data from matrix and Line 1214 then return the queried data accordingly. Finally, the expert gives label annotation help for the queried data in Line 15.
Algorithm 1. LEB 
Input: data set with samples, 
number of queries , 
nearest neighbor number k, 
noise ratio . 
Output: Queried data 
1: Initialize: 
2: 
3: 
4: Begin: 
5: for i=1 to do 
6: Calculate the NN of by Kdtree and store them in 
7: Let , then calculate using Eq. (39) 
8: and store in 
9: endfor 
10: [ sort(, descending, ) 
11 
12: for i=1 to do 
13: add to matrix 
14: endfor 
15: Query the labels of all data of 
4.6 Time and space complexities
In modelbased approaches, the time complexity of training classifiers determines the time consumption of sampling process. Studying the time complexity of SVM is to , we predict that Margin’s time cost will rise to to with a given query number of , where is the number of labeled data. For Hierarchical [10]
AL approach, hierarchical clustering is its main time consumption process that costs
. Similarly, calculating the kernel matrix also costs the time price of in TED [22]. Although Reactive [21] is a novel idea, it still needs to visit the whole version space to select a data point by approximately times of SVM training. It means that the time complexity of one selection will cost to and the time consumption of sampling data points will be to . (The detailed descriptions of Hierarchical, TED, and Reactive are presented in Section 5.1.)In our LEB approach, Line 6 uses the Kdtree to calculate the kNN matrix of data set and the time complexity is , Line 78 cost to calculate the radius and center update of each data point, Line 10 costs for sorting, and Line 1114 return the boundary points of . After that, we will train the boundary points within a short time . Therefore, the total time complexity is
(40) 
Standard SVM training has space complexity when the training set size is . It is thus computationally expensive on data sets with a large amount of samples. By observing the iterative sampling process of modelbased AL approaches, we conclude these approaches cost space complexities. However, our LEB approach uses the tree structure to calculate the kNN matrix, which costs cheaply with a space consumption of . Therefore, the space complexity of LEB is lower than that of other modelbased AL approaches.
4.7 Advantages of LEB
Our investigation finds that many existing AL algorithms which need labeled data for training are modelbased and suffer from the model curse. To describe this problem, we have summarized the iterative sampling model in Algorithm 2. In its description, Line 610 calculate the uncertainty function, Line 11 finds the position of the data with the maximum uncertainty, where represents this operation, and Line 1213 update the labeled and unlabeled set. After times iterations, Line 1516 train the classifier and returns the error rate of predicting .
Interestingly, different labeled data will lead to various iterative sampling sets because of is always retrained after updating and . Then the matrix must be recalculated in each iteration. In addition, some AL algorithms work in special scenarios, for example: (1) the marginbased AL approaches only work under the SVM classification; (2) Entropybased AL only works under the probabilistic classifier or probability return values. Table II lists the properties summary of different AL approaches.
From the analysis results, we can find the reported approaches all need iterative sampling, the support of labeled data, and the high time consumption. We can observe that many AL algorithms pay too much attention on the uncertainty of the classification model since the unfamiliarity of which data are their main sampling objects. However, our proposed LEB algorithm does not need any iteration and labeled data to sample, and also can be trained by any available classifier whatever in binary or multiclass settings.
Approach  Model  Iteration  Label support  Classifier  Multiclass  Time consumption 
Margin  SVM  Y  Y  SVM  Y  Uncertain 
Entropy  Uncertain probability  Y  Y  Probability classifier  Y  Uncertain 
Hierarchical  Clustering  Y  Y  Any  Y  
TED  Experimental optimization  Y  Y  Any  Y  
Reactive  Maximize the model difference  Y  Y  Any  N  Uncertain, but high 
LEB  Enclosing ball boundary  N  N  Any  N 
Algorithm 2. Iterative sampling 
Input: , 
number of queries , 
labeled data 
Output: prediction error rate 
1:Initialize: uncertainty function , 
2: , 
3: 
4: unlabeled data and it has data 
5: while 
6: for i=1:1: 
7: =train() 
8: calculate the based on 
9: store it in matrix 
10: endfor 
11: ) 
12: ] 
13: update 
14: endwhile 
15: =train() 
16: return = err() 
5 Experiments
To demonstrate the effectiveness of our proposed LEB algorithm, we evaluate and compare the classification performance with existing algorithms in eight clustering data sets (structured data sets) since they have clear geometry boundary, and a letter recognition data set (unstructured data set) to observe its performance. The structure of this section is: Section 5.1 and 5.2 describe the related baselines and tested data sets, respectively, Section 5.3 describes the experimental settings and analyzes the results, and Section 5.4 discusses the time and space performance of different AL approaches.
5.1 Baselines
Several algorithms have been proposed in the literature [5] [10] [22] [21], and will be compared with LEB, where Random is an uncertainty sampling without any guidance, Margin is based on SVM, Hierarchical is a clusteringbased AL approach, TED is a statical experimental optimization approach, and Reactive is an idea of maximizing the model differences:

Random, which uses a random sampling strategy to query unlabeled data, and can be applied to any AL task but with an uncertain result.

Hierarchical[10] sampling is a very different idea, compared to many existing AL approaches. It labels the subtree with the root node’s label when the subtree meets the objective probability function. But incorrect labeling always leads to a bad classification result.

TED[22] favors data points that are on the one side hard to predict and on the other side representative for the rest of the unlabeled data.

Reactive[21] learning finds the data point which has the maximum influence on the future prediction result after annotating the selected data with positive and negative labels. This novel idea does not need to query the label information of unlabeled data when relabeling, but needs a welltrained classification model at the beginning. Furthermore, its reported approach cannot be applied in multiclass classification problems without extension.
5.2 Data sets
We compare the different algorithms’ best classification results on some structured data sets [30] , and one unstructured letter recognition data set letter.

g2230[31]:20482. There are 2 adjacent clusters in the data set.

Flame[36]:2402. It has 2 adjacent clusters with similar densities.

Jain[35]:3732. It has two adjacent clusters with different densities.

Pathbased[33]:3002. Two clusters are close and surround by an arc cluster.

Spiral[33]:3122. There are three spiral curve clusters which are linear inseparable.

Aggregation[32]:7882. There are 7 adjacent clusters in the data set and noises connect them.

R15[34]:6002. There are 7 separate clusters and 8 adjacent clusters.

D31[34]:31002. It has 31 adjacent Gaussian clusters.

letter[37] [38]:2000016. It is a classical letter recognition data set with 26 English letters. We select 5 pairs letters which are difficult to distinguish from each other to test the above AL algorithms in a twoclass setting. They are DvsP, EvsF, IvsJ, MvsN, UvsV, respectively. For multiclass test, we select AD, AH, AL, AP, AT, AX, AZ, respectively. Of these, AD is the letter set A to D, and AH is the letter set A to H, … , AZ is the letter set A to Z. The seven multiclass sets have 4, 8, 12, 16, 20, and 26 classes respectively.
In addition to the introduction for the tested data sets, all twodimensional data sets are shown in Figure 5.
Data sets  Num_C  Algorithms  Number of queries (percentage of the data set)  

1%  5%  10%  15%  20%  30%  40%  50%  60%  
g2230  2  Random  .516.026  .546.012  .603.028  .652.029  .693.031  .767.026  .815.026  .849.021  .881.022 
Margin  .500.000  .509.015  .551.047  .590.076  .644.103  .709.153  .822.139  .882.161  .927.188  
Hierarchical  .504.000  .550.000  .585.000  .615.000  .668.000  .774.014  .847.000  .920.011  .974.000  
TED  .610.000  .619.009  .651.003  .759.006  .848.007  .875.005  .901.005  .964.005  .972.000  
Reactive  .506.008  .531.029  .554.052  .593.065  .634.058  .744.060  .715.047  .811.000  .816.000  
LEB  .724.163  .725.022  .790.021  .825.018  .886.012  .909.013  .927.011  .994.008  1.00.000  
Flame  2  Random  .670.142  .794.106  .904.059  .944.036  .958.025  .976.014  .984.008  .987.005  .990.006 
Margin  .499.137  .596.102  .740.162  .872.158  .930.159  .935.145  .961.120  .963.109  .944.165  
Hierarchical  .720.041  .607.042  .855.062  .972.010  .999.000 
Comments
There are no comments yet.