DeepAI
Log In Sign Up

Soft Confusion Matrix Classifier for Stream Classification

09/16/2021
by   Pawel Trajdos, et al.
0

In this paper, the issue of tailoring the soft confusion matrix (SCM) based classifier to deal with stream learning task is addressed. The main goal of the work is to develop a wrapping-classifier that allows incremental learning to classifiers that are unable to learn incrementally. The goal is achieved by making two improvements in the previously developed SCM classifier. The first one is aimed at reducing the computational cost of the SCM classifier. To do so, the definition of the fuzzy neighborhood of an object is changed. The second one is aimed at effective dealing with the concept drift. This is done by employing the ADWIN-driven concept drift detector that is not only used to detect the drift but also to control the size of the neighbourhood. The obtained experimental results show that the proposed approach significantly outperforms the reference methods.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

01/29/2020

stream-learn – open-source Python library for difficult data stream batch analysis

stream-learn is a Python package compatible with scikit-learn and develo...
05/04/2021

Automatic Learning to Detect Concept Drift

Many methods have been proposed to detect concept drift, i.e., the chang...
10/04/2021

DenDrift: A Drift-Aware Algorithm for Host Profiling

Detecting and reacting to unauthorized actions is an essential task in s...
07/15/2019

ParaFIS:A new online fuzzy inference system based on parallel drift anticipation

This paper proposes a new architecture of incremen-tal fuzzy inference s...
12/09/2018

Towards Neural Network Patching: Evaluating Engagement-Layers and Patch-Architectures

In this report we investigate fundamental requirements for the applicati...

1 Introduction

Classification of streaming data is one of the most difficult problems in modern pattern recognition theory and practice. This is due to the fact that a typical data stream is characterized by several features that significantly impede making the correct classification decision. These features include: continuous flow, huge data volume, rapid arrival rate, and susceptibility to change

[19]. If a streaming data classifier aspires to practical applications, it must face these requirements and have to satisfy numerous constraints (e.g. bounded memory, single-pass, real-time response, change of data concept) to an acceptable extent. It is not easy, that is why the methodology of recognizing stream data has been developing very intensively for over two decades, proposing new, more and more perfect classification methods [8, 24].

Incremental learning is a vital capability for classifiers used in stream data classification [27]

. It allows the classifier to utilize new objects generated by the stream to improve the model built so far. It also allows, to some extent, dealing with the concept drift. Some of the well-known classifiers are naturally capable to be trained iteratively. Examples of such classifiers are neural networks, nearest neighbours classifiers, or probabilistic methods such as the naive Bayes classifier 

[11]. Some of the classifiers were tailored to be learned incrementally. An example of such a method is well-known Hoeffding Tree classifier [26]. Those types of classifiers can be easily used in stream classification systems. On the other hand, when a classifier is unable to learn in an incremental way, the options for using for stream classification are very limited [27]. Only one option is to keep a set of objects and rebuild the classifier from scratch whenever it is necessary [11].

To bridge this gap, we propose a wrapping-classifier-based on the soft confusion matrix approach (SCM). The wrapping-classifier may be used to add incremental learning functionality to any batch classifier. The classifier based on the idea of soft confusion matrix has been proposed in [30]. It proved to be an efficient tool for solving such practical problems as hand gesture recognition [22]. An additional advantage in solving the above-mentioned classification problem is the ability to use imprecise feedback information about a class assignment. The SCM-based algorithm was also successfully used in multilabel learning [31].

Dealing with the concept drift using incremental learning only is insufficient. This is because the incremental classifiers deal effectively only with the incremental drift [8]. To handle the sudden concept drift, additional mechanism such as single/multiple window approach [20], forgetting mechanisms [33], drift detectors [2] must be used. In this study, we decided to use ADWIN algorithm [2] to detect the drift and to manage the set of stored objects. We use the ADWIN-based detector because this approach was shown to be an effective method [13, 1].

The concept drift may also be dealt with using ensemble classifiers [8]. There are a plethora of ensemble-based approaches [12, 3, 19] however, in this work we are focused on single-classifier-based systems.

The rest of the paper is organized as follows. Section 2 presents the corrected classifier and gives insight into its two-level structure and the original concepts of RRC and SCM which are the basis of its construction. Section 3 describes the adopted model of concept drifting data stream and provides details of chunk-based learning scheme of base classifiers and online dynamic learning of the correcting procedure and describes the method of combining ensemble members. In section 4 the description of the experimental procedure is given. The results are presented and discussed in section 5. Section 6 concludes the paper.

2 Classifier with Correction

2.1 Preliminaries

Let us consider the pattern recognition problem in which

denotes a feature vector of an object and

is its class number ( and are feature space and set of class numbers, respectively). Let be a classifier trained on the learning set , which assigns a class number to the recognized object. We assume that is described by the canonical model [21], i.e. for given it first produces values of normalized classification functions (supports) () and then classify object according to the maximum support rule:

(1)

To recognize the object we will apply the original procedure, which using additional information about the local (relative to ) properties of can change its decision to increase the chance of correct classification of .

The proposed correcting procedure which has the form of classifier built over will be called a wrapping-classifier. The wrapping classifier acts according to the following Bayes scheme:

(2)

where a posterioriprobabilities can be expressed in a form depending on the probabilistic properties of classifier :

(3)

denotes the probability that belongs to the -th class given that and is the probability of assigning to class by Since for deterministic classifier both above probabilities are equal to 0 or 1 we will use two concepts for their approximate calculation: randomized reference classifier (RRC) and soft confusion matrix (SCM).

2.2 Randomized Reference Classifier (RRC)

RRC is a randomized model of classifier and with its help the probabilities are calculated.

RRC

as a probabilistic classifier is defined by a probability distribution over the set of class labels

. Its classifying functions

are observed values of random variables

that meet – in addition to the normalizing conditions – the following condition:

(4)

where is the expected value operator. Formula (4) denotes that acts – on average – as the modeled classifier , hence the following approximation is fully justified:

(5)

where

(6)

can be easily determined if we assume – as in the original work of Woloszynski and Kurzynski [32] – that

follows the beta distribution.

2.3 Soft Confusion Matrix (SCM)

SCM will be used to determine the assessment of probability which denotes class-dependent probabilities of the correct classification (for ) and the misclassification (for ) of at the point . The method defines the neighborhood of the point containing validation objects in terms of fuzzy sets allowing for flexible selection of membership functions and assigning weights to individual validation objects dependent on distance from .

The SCM providing an image of the classifier local (relative to ) probabilities , is in the form of two-dimensional table, in which the rows correspond to the true classes while the columns correspond to the outcomes of the classifier , as it is shown in Table 1.

Classification by
True
class
Table 1: The soft confusion matrix of classifier

The value is determined from validation set and is defined as the following ratio:

(7)

where and are fuzzy sets specified in the validation set and denotes the cardinality of a fuzzy set [7].

The set denotes the set of validation objects from the -th class. Formulating this set in terms of fuzzy sets theory it can be assumed that the grade of membership of validation object to is the class indicator which leads to the following definition of :

(8)
(9)

The concept of fuzzy set is defined as follows:

(10)

where is calculated according to (5) and (6). Formula (10) demonstrates that the membership of validation object to the set is not determined by the decision of classifier . The grade of membership of object to depends on the potential chance of classifying to the -th class by the classifier . We assume, that this potential chance is equal to the probability calculated approximately using the randomized model RRC of classifier .

Set plays the crucial role in the proposed concept of SCM, because it decides which validation objects and with which weights will be involved in determining the local properties of the classifier and – as a consequence – in the procedure of correcting its classifying decision. Formally, is also a fuzzy set:

(11)

but its membership function is not defined univocally because it depends on many circumstances. By choosing the shape of the membership function we can freely model the adopted concept of ”locality” (relative to ).

depends on the distance between validation object and test object : its value is equal to 1 for and decreases with increasing the distance between and . This leads to the following form of the proposed membership function of the set:

(12)

denotes Euclidean distance in the feature space , is the Euclidean distance between and the -th nearest neighbor in , and is a normalizing coefficient. The first factor in (12) limits the concept of “locality” (relatively to ) to the set of nearest neighbors with Gaussian model of membership grade.

Since under the stream classification framework, there should be only one pass over the data [19], and parameters cannot be found using the extensive grid search approach just like it was for the originally proposed approach [30, 22]. Consequently, in this work, we decided to set to 1. Additionally, the initial number of nearest neighbours is found using a simple rule of thumb [6]:

(13)

To avoid ties, the final number of neighbours is set as follows:

(14)

Additionally, the computational cost of computing the neighbourhood may be further reduced by using the kd-tree algorithm to find the nearest neighbours [18].

Finally, from (8), (10) and (11) we get the following approximation:

(15)

which together with (3), (5) and (6) give (2) i.e. the corrected classifier .

2.4 Creating the validation set

In this section, the procedure of creating the validation set from the training is described. In the original work describing SCM [30], the set of labelled data was wplit into the learning set and the validation set . The learning set and the validation set were disjoint . The cardinality of the validation set was controlled by the parameter , . The coefficient was usualy set to , however to achieve the highest classification quality, it should be determined using the grid-search procedure. As it was said above, in this work we want to avoid using the grid-search procedure. Therefore, we construct the validation set using three-fold cross-validation procedure that allows using of the entire learning set as a validation set. The procedure is described in Algorithm 1.

Data: -- Initial learning set;
Result: -- Validation set;
– Decision sets (see (10));
-- Trained classifier.
1 begin
2       for  do
3             Extract fold specific training and validation set , ;
4             Learn the using ;
5             ;
6             Update the class-specific decision sets using predictions of for instances from (see (10));
7            
8       end for
9      Learn the using ;
10      
11 end
12
Algorithm 1 Procedure of training the SCM classifier. Including the procedure of validation set creation.

3 Classification of Data Stream

The main goal of the work is to develop a wrapping-classifier that allows incremental learning to classifiers that are unable to learn incrementally. In this section, we describe the incremental learning procedure used by the SCM-based wrapping-classifier.

3.1 Model of Data Stream

We assume that instances from a data stream appear as a sequence of labeled examples , where represents a -dimensional feature vector of an object that arrived at time and

is its class number. In this study we consider a completely supervised learning approach which means that the true class number

is available after the arrival of the object and before the arrival of the next object and this information may be used by classifier for classification of . Such a framework is one of the most often considered in the related literature [4, 25].

In addition, we assume that a data stream can be generated with a time-varying distribution, yielding the phenomenon of concept drift [8]. We do not impose any restrictions on the concept drift. It can be real drift referring to changes of class distribution or virtual drift referring to the distribution of features. We allow sudden, incremental, gradual, and recurrent changes in the distribution of instances creating a data stream. Changes in the distribution can cause an imbalanced class system to appear in a changing configuration.

3.2 Incremental learning for SCM classifier

We assumed that the base classifier wrapped by the SCM classifier is unable to learn incrementally. Consequently, an initial training set has to be used to build the classifier. This initial data set is called an initial chunk . The desired size of the initial bath is denoted by . The initial data set is built by storing incoming examples from the data stream. By the time the initial batch is collected, the prediction is impossible. Until then, the prediction is made on the basis of a priori

probabilities estimated from the incomplete initial batch.

Since is unable to learn incrementally, incremental learning is handled with changing the validation set. Incoming instances are added to the validation set until the ADWIN-based drift detector detects that the concept drift has occurred. The ADWIN-based drift detector analyses the outcomes of the corrected classifier for the instances stored in the validation set [2]. When there is a significant difference between the older and the newer part of the validation set, the detector removes the older part of the validation set. The remaining part of the validation set is then used to correct the outcome of . The ADWIN-based drift detector also controls the size of the neighbourhood. Even if there is no concept drift, the detector may detect the deterioration of the classification quality when the neighbourhood becomes too large.

The detailed procedure of the ensemble building is described in Algorithms 2 and 3.

Data: -- validation set;
-- new instance to add;
Result: Updated validation set
1 begin
       i= ;
        // Predict object class using corrected classifier
2       Check the prediction using ADWIN detector;
3       if ADWIN detector detects drift then
4             Ask the detector fot the newer part of the validation set ;
5             ;
6            
7       ;
8      
9 end
10
Algorithm 2 Validation set update controlled by ADWIN detector.
Data: -- new instance;
Result: Learned SCM wrapping-classifier
1 begin
2       if  then
3             Train the SCM classifier using the procedure described in Algorithm 1 using as a learning set;
4             ;
             ;
              // Make a copy of the validation set
5             foreach object  do
6                  Update the validation set using and the procedure described in Algorithm 2
7             end foreach
8            
9       else if Is SCM classifier trained then
10             Update the validation set using and the procedure described in Algorithm 2
11       else
12             ;
13            
14       end if
15      
16 end
17
Algorithm 3 Incremental learning procedure of the SCM wrapping-classifier.

4 Experimental Setup

To validate the classification quality obtained by the proposed approaches, the experimental evaluation, which setup is described below, is performed.

The following base classifiers were employed:

The classifiers implemented in WEKA framework [15] were used. If not stated otherwise, the classifier parameters were set to their defaults. We have chosen the classifiers that offer both batch and incremental learning procedures.

The experimental code was implemented using WEKA [15] framework. The source code of the algorithms is available online 111https://github.com/ptrajdos/rrcBasedClassifiers/tree/develop 222https://github.com/ptrajdos/StreamLearningPT/tree/develop.

During the experimental evaluation, the following classifiers were compared:

  1. – The ADWIN-driven classifier created using the unmodified base classifier (The base classifier is able to update incrementally.) [2].

  2. – The ADWIN-driven created using the unmodified base classifier with the incremental learning disabled. The base classifier is only retrained whenever ADWIN-based detector detects concept drift.

  3. – The ADWIN-driven approach using SCM correction scheme with online-learning. As described in Section 3.

  4. – The ADWIN-driven approach created using SCM correction scheme but the online-learning is disabled. The SCM-corrected classifier is only retrained whenever ADWIN-based detector detects concept drift.

To evaluate the proposed methods, the following classification-loss criteria are used [29]: Macro-averaged (1- precision), (1-recall), Matthews correlation coefficient (). The Matthews coefficient is rescaled in such a way that 0 is perfect classification and 1 is the worst one. Quality measures from the macro-averaging group are considered because this kind of measures is more sensitive to the performance for minority classes. For many real-world classification problems, the minority class is the class that attracts the most attention [23].

Following the recommendations of [5] and [10], the statistical significance of the obtained results was assessed using the two-step procedure. The first step is to perform the Friedman test [5] for each quality criterion separately. Since multiple criteria were employed, the familywise errors (FWER) should be controlled [17]. To do so, the Holm [17] procedure of controlling FWER of the conducted Friedman tests was employed. When the Friedman test shows that there is a significant difference within the group of classifiers, the pairwise tests using the Wilcoxon signed-rank test [5] were employed. To control FWER of the Wilcoxon-testing procedure, the Holm approach was employed [17]. For all tests, the significance level was set to .

The experiments were conducted using 48 synthetic datasets generated using the STREAM-LEARN library 333https://github.com/w4k2/stream-learn. The properties of the datasets were as follows: Datasets size: 30k examples; Number of attributes: 8;Types of drift generated: incremental, sudden;Noise: 0%, 10%, 20%; Imbalance ratio: 0 – 4.

Datasets used in this experiment are available online 444https://github.com/ptrajdos/MLResults/blob/master/data/stream_data.tar.xz?raw=true

To examine the effectiveness of the incremental update algorithms, we applied an experimental procedure based on the methodology which is characteristic of data stream classification, namely, the test-then-update procedure [9]. The chunk size for evaluation purposes was set to 200.

5 Results and Discussion

To compare multiple algorithms on multiple benchmark sets, the average ranks approach is used. In this approach, the winning algorithm achieves a rank equal to ’1’, the second achieves a rank equal to ’2’, and so on. In the case of ties, the ranks of algorithms that achieve the same results are averaged.

The numerical results are given in Table 2 to  5. Each table is structured as follows. The first row contains the names of the investigated algorithms. Then, the table is divided into six sections – one section is related to a single evaluation criterion. The first row of each section is the name of the quality criterion investigated in the section. The second row shows the p-value of the Friedman test. The third one shows the average ranks achieved by algorithms. The following rows show p-values resulting from the pairwise Wilcoxon test. The p-value equal to informs that the p-values are lower than . P-values lower than are bolded. Due to the page limit, the raw results are published online 555https://github.com/ptrajdos/MLResults/blob/master/RandomizedClassifiers/Results_cldd_2021.tar.xz?raw=true

To provide a visualization of the average ranks and the outcome of the statistical tests, the rank plots are used. The rank plots are compatible with the rank plots described in [5]. That is, each classifier is placed along the line representing the values of the achieved average ranks. The classifiers between which there are no significant differences (in terms of the pairwise Wilcoxon test) are connected with a horizontal bar placed below the axis representing the average ranks. The results are visualised on figures 1 – 4.

Let us begin with an analysis of the correction ability of the SCM approach when incremental learning is disabled. Although this kind of analysis has been already done [30, 22], in this work it should be done again since the definition of the neighbourhood is significantly changed (see Section 2.3). To assess the impact of the SCM-based correction, we compare the algorithms and for different base classifiers. For and base classifiers the employment of SCM-based correction allows achieving significant improvement in terms of all quality criteria (see Figures 1 and 2). For the remaining base classifiers, on the other hand, there are no significant differences between and . These results confirm observations previously made in [30, 22]. That is, the correction ability of the SCM approach is more noticeable for classifiers that are considered to be weaker ones. The previously observed correction ability holds although the extensive grid-search technique is not applied.

In this paper, the SCM-based approach is proposed to be used as a wrapping-classifier that handles the incremental learning for base classifiers that are unable to be updated incrementally. Consequently, now we are going to analyse the SCM approach in that scenario. The results show that significantly outperforms for all base classifiers and quality criteria. It means that it works great as the incremental-learning-handling wrapping-classifier. What is more, it outperforms also for all base classifiers and criteria. It clearly shows that the source of the achieved improvement does not lie in the batch-learning-improvement-ability but the ability to handle incremental learning is also present. Moreover, it handles incremental learning more effective than the base classifiers designed to do so. This observation is confirmed by the fact that also outperforms for all base classifiers and quality criteria.

Crit. Name MaFDR MaFNR MaMCC
Friedman p-value 1.213e-28 5.963e-28 5.963e-28
Average Rank 2.000 3.812 1.00 3.188 2.000 3.583 1.00 3.417 2.000 3.667 1.00 3.333
.000 .000 .000 .000 .000 .000 .000 .000 .000
.000 .000 .000 .111 .000 .002
.000 .000 .000
Table 2: Statistical evaluation for the stream classifiers based on classifier.
Crit. Name MaFDR MaFNR MaMCC
Friedman p-value 3.329e-28 3.329e-28 1.739e-28
Average Rank 2.021 3.771 1.00 3.208 2.000 3.708 1.00 3.292 2.000 3.792 1.00 3.208
.000 .000 .000 .000 .000 .000 .000 .000 .000
.000 .000 .000 .001 .000 .000
.000 .000 .000
Table 3: Statistical evaluation for the stream classifiers based on classifier.
Crit. Name MaFDR MaFNR MaMCC
Friedman p-value 1.883e-27 1.883e-27 1.883e-27
Average Rank 2.000 3.521 1.00 3.479 2.000 3.542 1.00 3.458 2.000 3.500 1.00 3.500
.000 .000 .000 .000 .000 .000 .000 .000 .000
.000 .955 .000 .545 .000 .757
.000 .000 .000
Table 4: Statistical evaluation for the stream classifiers based on classifier.
Crit. Name MaFDR MaFNR MaMCC
Friedman p-value 3.745e-27 1.563e-27 1.563e-27
Average Rank 2.042 3.500 1.00 3.458 2.021 3.292 1.00 3.688 2.000 3.438 1.00 3.562
.000 .000 .000 .000 .000 .000 .000 .000 .000
.000 .947 .000 .005 .000 .088
.000 .000 .000
Table 5: Statistical evaluation for the stream classifiers based on classifier.
(a) Macro-averaged
(b) Macro-averaged
(c) Macro-averaged
Figure 1: Ranking plot for the stream classifiers based on classifier.
(a) Macro-averaged
(b) Macro-averaged
(c) Macro-averaged
Figure 2: Ranking plot for the stream classifiers based on classifier.
(a) Macro-averaged
(b) Macro-averaged
(c) Macro-averaged
Figure 3: Ranking plot for the stream classifiers based on classifier.
(a) Macro-averaged
(b) Macro-averaged
(c) Macro-averaged
Figure 4: Ranking plot for the stream classifiers based on classifier.

6 Conclusions

In this paper, we propose a modified SCM classifier to be used as a wrapping-classifier that allows incremental learning of classifiers that are not designed to be incrementally updated. We applied two modifications of the SCM wrapping-classifier originally described in [30, 22]. The first one is a modified neighbourhood definition. The newly proposed neighbourhood does not need an excessive grid-search procedure to be performed to find the best set of parameters. Due to the modified neighbourhood definition, the computational cost of performing the SCM-based correction is significantly smaller. The second modification is to incorporate ADWIN-based approach to create and manage the validation set used by SCM-based algorithm. This modification not only allows the proposed method to effectively deal with the concept drift but also it can shrink the neighbourhood when it becomes too wide.

The experimental results show that the proposed approach outperforms the reference methods for all investigated base classifiers in terms of all considered quality criteria.

The results obtained in this study are very promising. Consequently, we are going to continue our research related to the employment of randomised classifiers in the task of stream learning. Our next step will probably be a proposition of a stream learning ensemble that used the SCM-correction method proposed in this paper.

Acknowledgments.

This work was supported by the statutory funds of the Department of Systems and Computer Networks, Wroclaw University of Science and Technology.

References

  • [1] R. S. M. d. Barros and S. G. T. d. C. Santos (2019-12) An overview and comprehensive comparison of ensembles for concept drift. Information Fusion 52, pp. 213–244. External Links: Document, ISSN 1566-2535 Cited by: §1.
  • [2] A. Bifet and R. Gavaldà (2007-04) Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining, External Links: Document Cited by: §1, §3.2, item 1.
  • [3] D. Brzezinski and J. Stefanowski (2014-01) Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learning Syst. 25 (1), pp. 81–94. External Links: Document, ISSN 2162-237X, 2162-2388 Cited by: §1.
  • [4] D. Brzezinski and J. Stefanowski (2014-05) Combining block-based and online methods in learning ensembles from concept drifting data streams. Information Sciences 265, pp. 50–67. External Links: Document, ISSN 0020-0255 Cited by: §3.1.
  • [5] J. Demšar (2006) Statistical comparisons of classifiers over multiple data sets.

    The Journal of Machine Learning Research

    7, pp. 1–30.
    Cited by: §4, §5.
  • [6] L. Devroye, L. Györfi, and G. Lugosi (1996) A probabilistic theory of pattern recognition. Springer New York. External Links: Document, ISSN 0172-4568, 2197-439X Cited by: §2.3.
  • [7] M. Dhar (2013-05) On cardinality of fuzzy sets. IJISA 5 (6), pp. 47–52. External Links: Document, ISSN 2074-904X, 2074-9058 Cited by: §2.3.
  • [8] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia (2014-03) A survey on concept drift adaptation. CSUR 46 (4), pp. 1–37. External Links: Document, ISSN 0360-0300 Cited by: §1, §1, §1, §3.1.
  • [9] J. Gama (2010-05) Knowledge discovery from data streams. 1st edition, Chapman and Hall/CRC. External Links: ISBN 9780429103797, Document, Link Cited by: §4.
  • [10] S. Garcia and F. Herrera (2008-12) An extension on“statistical comparisons of classifiers over multiple data sets”for all pairwise comparisons. Journal of Machine Learning Research 9 (), pp. 2677–2694. External Links: ISSN 1532-4435 Cited by: §4.
  • [11] C. Giraud-Carrier (2000) A note on the utility of incremental learning. Ai Communications 13 (4), pp. 215–223. Cited by: §1.
  • [12] H. M. Gomes, J. P. Barddal, F. Enembreck, and A. Bifet (2017-03) A survey on ensemble learning for data stream classification. CSUR 50 (2), pp. 1–36. External Links: Document, ISSN 0360-0300 Cited by: §1.
  • [13] P. M. Gonçalves, S. G.T. de Carvalho Santos, R. S.M. Barros, and D. C.L. Vieira (2014-12) A comparative study on concept drift detectors. Expert Syst. Appl. 41 (18), pp. 8144–8156. External Links: Document, ISSN 0957-4174 Cited by: §1.
  • [14] G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer (2003) KNN model-based approach in classification. In On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, pp. 986–996. External Links: Document, ISSN 0302-9743, 1611-3349, ISBN 9783540204985, 9783540399643 Cited by: 3rd item.
  • [15] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten (2009-11) The WEKA data mining software. SIGKDD Explor. Newsl. 11 (1), pp. 10. External Links: Document, ISSN 1931-0145 Cited by: §4, §4.
  • [16] D. J. Hand and K. Yu (2001-12) Idiot’s bayes: Not so stupid after all?. International Statistical Review / Revue Internationale de Statistique 69 (3), pp. 385. External Links: Document, ISSN 0306-7734 Cited by: 2nd item.
  • [17] S. Holm (1979) A Simple Sequentially Rejective Multiple Test Procedure. Scandinavian Journal of Statistics 6 (2), pp. 65–70. External Links: Document, ISSN 03036898 Cited by: §4.
  • [18] W. Hou, D. Li, C. Xu, H. Zhang, and T. Li (2018-12) An advanced k nearest neighbor classification algorithm based on KD-tree. In 2018 IEEE International Conference of Safety Produce Informatization (IICSPI), External Links: Document, ISBN 9781538655146 Cited by: §2.3.
  • [19] B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, and M. Woźniak (2017-09) Ensemble learning for data stream analysis: A survey. Information Fusion 37, pp. 132–156. External Links: Document Cited by: §1, §1, §2.3.
  • [20] L. I. Kuncheva and I. Žliobaitė (2009-11) On the window size for classification in changing environments. IDA 13 (6), pp. 861–872. External Links: Document, ISSN 1571-4128, 1088-467X Cited by: §1.
  • [21] L. I. Kuncheva (2014-09) Combining pattern classifiers. John Wiley & Sons, Inc.. External Links: Document, ISBN 9781118914564, 9781118315231 Cited by: §2.1.
  • [22] M. Kurzynski, M. Krysmann, P. Trajdos, and A. Wolczowski (2016-02) Multiclassifier system with hybrid learning applied to the control of bioprosthetic hand. Comput. Biol. Med. 69, pp. 286–297. External Links: Document, ISSN 0010-4825 Cited by: §1, §2.3, §5, §6.
  • [23] J. L. Leevy, T. M. Khoshgoftaar, R. A. Bauder, and N. Seliya (2018-11) A survey on addressing high-class imbalance in big data. J Big Data 5 (1). External Links: Document, ISSN 2196-1115 Cited by: §4.
  • [24] S. Mehta et al. (2017) Concept drift in streaming data classification: Algorithms, platforms and issues. Procedia Comput. Sci. 122, pp. 804–811. External Links: Document, ISSN 1877-0509 Cited by: §1.
  • [25] H. Nguyen, Y. Woon, and W. Ng (2014-12) A survey on data stream clustering and classification. Knowl Inf Syst 45 (3), pp. 535–569. External Links: Document, ISSN 0219-1377, 0219-3116 Cited by: §3.1.
  • [26] B. Pfahringer, G. Holmes, and R. Kirkby (2007) New options for hoeffding trees. In

    AI 2007: Advances in Artificial Intelligence

    , M. A. Orgun and J. Thornton (Eds.),
    Berlin, Heidelberg, pp. 90–99. External Links: ISBN 978-3-540-76928-6 Cited by: §1, 1st item.
  • [27] J. Read, A. Bifet, B. Pfahringer, and G. Holmes (2012) Batch-incremental versus instance-incremental learning in dynamic and evolving data. In Advances in Intelligent Data Analysis XI, pp. 313–323. External Links: Document, ISSN 0302-9743, 1611-3349, ISBN 9783642341557, 9783642341564 Cited by: §1.
  • [28] C. Sakr, A. Patil, S. Zhang, Y. Kim, and N. Shanbhag (2017-03) Minimum precision requirements for the SVM-SGD learning algorithm. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), External Links: Document, ISBN 9781509041176 Cited by: 4th item.
  • [29] M. Sokolova and G. Lapalme (2009-07) A systematic analysis of performance measures for classification tasks. Information Processing & Management 45 (4), pp. 427–437. External Links: ISSN 0306-4573, Document Cited by: §4.
  • [30] P. Trajdos and M. Kurzynski (2016-03) A dynamic model of classifier competence based on the local fuzzy confusion matrix and the random reference classifier. Int. J. Appl. Math. Comput. Sci. 26 (1), pp. 175–189. External Links: Document, ISSN 2083-8492 Cited by: §1, §2.3, §2.4, §5, §6.
  • [31] P. Trajdos and M. Kurzynski (2018-09) A correction method of a binary classifier applied to multi-label pairwise models. Int. J. Neur. Syst. 28 (09), pp. 1750062. External Links: Document, ISSN 0129-0657, 1793-6462 Cited by: §1.
  • [32] T. Woloszynski and M. Kurzynski (2011-10) A probabilistic model of classifier competence for dynamic ensemble selection. Pattern Recognit. 44 (10-11), pp. 2656–2668. External Links: Document, ISSN 0031-3203 Cited by: §2.2.
  • [33] I. Žliobaitė (2011-06) Combining similarity in time and space for training set formation under concept drift. IDA 15 (4), pp. 589–611. External Links: Document, ISSN 1571-4128, 1088-467X Cited by: §1.