DeepAI
Log In Sign Up

Detecting Concept Drift in the Presence of Sparsity – A Case Study of Automated Change Risk Assessment System

Missing values, widely called as sparsity in literature, is a common characteristic of many real-world datasets. Many imputation methods have been proposed to address this problem of data incompleteness or sparsity. However, the accuracy of a data imputation method for a given feature or a set of features in a dataset is highly dependent on the distribution of the feature values and its correlation with other features. Another problem that plagues industry deployments of machine learning (ML) solutions is concept drift detection, which becomes more challenging in the presence of missing values. Although data imputation and concept drift detection have been studied extensively, little work has attempted a combined study of the two phenomena, i.e., concept drift detection in the presence of sparsity. In this work, we carry out a systematic study of the following: (i) different patterns of missing values, (ii) various statistical and ML based data imputation methods for different kinds of sparsity, (iii) several concept drift detection methods, (iv) practical analysis of the various drift detection metrics, (v) selecting the best concept drift detector given a dataset with missing values based on the different metrics. We first analyze it on synthetic data and publicly available datasets, and finally extend the findings to our deployed solution of automated change risk assessment system. One of the major findings from our empirical study is the absence of supremacy of any one concept drift detection method across all the relevant metrics. Therefore, we adopt a majority voting based ensemble of concept drift detectors for abrupt and gradual concept drifts. Our experiments show optimal or near optimal performance can be achieved for this ensemble method across all the metrics.

READ FULL TEXT VIEW PDF

page 6

page 8

08/09/2020

Concept Drift Detection: Dealing with MissingValues via Fuzzy Distance Estimations

In data streams, the data distribution of arriving observations at diffe...
05/04/2021

Automatic Learning to Detect Concept Drift

Many methods have been proposed to detect concept drift, i.e., the chang...
03/21/2022

From Concept Drift to Model Degradation: An Overview on Performance-Aware Drift Detectors

The dynamicity of real-world systems poses a significant challenge to de...
08/18/2021

Look Before You Leap! Designing a Human-Centered AI System for Change Risk Assessment

Reducing the number of failures in a production system is one of the mos...
12/03/2021

A Survey on Concept Drift in Process Mining

Concept drift in process mining (PM) is a challenge as classical methods...
10/10/2018

Adaptive Fraud Detection System Using Dynamic Risk Features

eCommerce transaction frauds keep changing rapidly. This is the major is...
09/26/2020

fMRI Multiple Missing Values Imputation Regularized by a Recurrent Denoiser

Functional Magnetic Resonance Imaging (fMRI) is a neuroimaging technique...

1 Introduction

There are many reasons why missing values often occur in a dataset – it may be because of erroneous data entry process, irregular data collection or intentionally not supplied by the users, especially when they are filling forms that have non-mandatory fields. Missing values can make the downstream tasks, such as, predicting the value of a target variable, much more challenging. Since performance of all algorithms degrade in the presence of sparsity, and many a times algorithms are not even designed to handle missing data, one may either ignore rows that have missing values altogether – however, this may cause problems, especially when the sparsity level is high, or use some data imputation technique before applying the algorithm. The situation is further aggravated when one tries to detect concept drifts on such datasets with sparsity – we experienced this firsthand when we tried to do the same with our deployment of machine learning solution for change risk assessment Gupta et al. (2021). This solution is currently operational for changes targeted across Walmart’s US, UK and Mexico stores, US Sam’s Clubs and e-Commerce. Post deployment, the production team has confirmed that the number of major incidents has been reduced by 33% with net savings ranging into multi-million dollars as in Q2 of 2021. Note that there may be other factors, e.g., software design changes, that have contributed to the savings; however, it is acknowledged that our ML based prediction system has been the primary contributor. To improve the performance further, we studied concept drift detection in the presence of sparsity, and based on our findings, the main contributions of this work are as follows:

  • Provide an empirical guideline on applying data imputation given a data distribution and a sparsity pattern.

  • Suggest various metrics for concept drift detection – we found that some of the metrics in literature can be misleading in practice, and propose some new ones that we found helpful.

  • Provide a majority voting based ensemble of concept drift detectors for abrupt and gradual drifts that perform well across the whole spectrum of concept drift detection metrics – a feat that individual concept drift detectors may not attain.

2 Preliminaries

2.1 Types of Missingness

The nomenclature for the types of missing values was introduced by Rubin in Rubin (1976) that is considered a defacto standard in any kind of statistical analysis with incomplete data. This nomenclature distinguishes between three cases:

  • Missing Completely At Random (MCAR). In MCAR, the missingness is completely independent of the data.

  • Missing At Random (MAR).

    In MAR, the probability of missingness depends only on observed values.

  • Missing Not At Random (MNAR). In MNAR, the probability of missingness depends on the unobserved values, and therefore it leads to important biases in the data.

2.2 Data Imputation Techniques

These techniques can be broadly differentiated into two categories. First, those techniques that look into a single feature at a time – mean, median, mode and zero – these replace the missing values with the mean, median, highest occurring (mode) and a constant value zero, respectively. Second, those techniques that look into other features as well for correlation – k-nearest neighbours (kNN), iterative imputer van Buuren and Groothuis-Oudshoorn (2011), soft impute Hastie et al. (2015) and optimal transport Muzellec et al. (2020).

2.3 Concept Drift

Concept drift is the change in the joint probability distribution for input

and label between two time points and Gupta et al. (2021).

2.4 Concept Drift Detection Algorithms

The concept drift detection algorithms discussed here are as follows. Page-Hinkley (PHPage (1954) signals concept drift when the difference of the observed values from the mean crosses over a user-defined threshold. Drift Detection Method (DDMGama et al. (2004)

detects concept drifts in streams by analyzing the error rates and their standard deviation; if the error rate increases, then DDM concludes that the current predictor is outdated. Early Drift Detection Method (

EDDM) monitors the distance between two consecutive errors, rather than the error rate. Consequently, when the concepts remain stationary the distance grows larger, whereas, a decrease in the distance signals drift. Heoffding’s inequality based Drift Detection Method (HDDM) Frías-Blanco et al. (2015) has two variants: HDDMA that uses moving averages to detect drifts, and HDDMW that uses exponentially weighted moving averages to detect drifts. ADaptive WINdowing (ADWINBifet and Gavaldà (2007) maintains two sub-windows, one for historic data and the other for new data; a significant difference between the means of these sub-windows indicates a concept drift. Kolmogorov-Smirnov WINdowing (KSWINRaab et al. (2020) maintains a sliding window of fixed size where the last samples represent the latest concept. A concept drift is detected if Kolmogorov-Smirnov test between the two distributions of and samples yields a significant difference.

2.5 Metrics for Concept Drift

It is important to note that the first two metrics apply to the model that makes the predictions, while the latter four apply to the concept drift detector (CDD). Note that the last two metrics have been proposed by us.

Prequential Error: For a given sequence of instances , it is the accumulated sum, i.e.

, of a loss function

between the prediction and the observed value .

Accuracy: It is the fraction of the accurate predictions made to the total number of predictions.

Average Detection Delay (ADD): It is average of the distances (in terms of instances) between the actual and the predicted concept drifts.

True Positive Rate (TPR): Fraction of detected drifts which were true, i.e., within acceptable detection interval (ADI), to the total number of drifts detected.
Due to detection delays, in our experiments, we keep the ADI four times the drift width as per existing literature Pesaranghader and Viktor (2016); Yan (2020).

True Positives per Drift (TPD): Fraction of detected drifts which were true to the total number of actual drifts.
It is possible that a CDD declares or more drifts within the ADI, and accordingly, its TPR may be high which can be misleading; therefore, we introduce TPD whose optimal value is (though it can be higher or lower), i.e., CDD declares a drift exactly once for every actual drift in the dataset.

Drift Count: Total number of actual drifts detected.
Since detecting multiple drifts within the ADI for an actual drift may offset for the case when no drift is detected for another actual drift, we included drift count in our set of metrics.

3 Background & Motivation

Concept drift problem exists in many real-world situations such as automated change risk assessment in technology driven industry. This can result in poor and degrading predictive performance in predictive models that assume a static relationship between input and output variables.

One brute-force way to mitigate this problem is to periodically train the static model with more recent data. However, the cost factor, such as Computational Cost, Labor Cost and Implementation Cost, puts up a significant impediment towards retraining model frequently.

A more elegant way to approach this problem is to algorithmically detect concept drift in data and retrain the model depending on the outcome of the algorithm. There are different categories of drift detection algorithms such as error rate-based drift detection algorithms, distribution-based drift detection algorithms, and few others Lu et al. (2019)

. However, such drift detection methods suffer from severe impairment in presence of data sparsity. Non-consideration of inaccuracies in the imputation methods in estimating the underlying data distribution is at the core of this problem 

Liu et al. (2020). A fuzzy distance estimation based method for detecting concept drifts in the presence of missing values has been addresses in Liu et al. (2020). To formally discuss the impact of data sparsity on drift-detection methods, we need to delve deep into the following questions:

  • Question 1. How can imputing missing values perturb the original distribution of the data?

  • Question 2. How can a drift detection method be impacted by the perturbed distribution of data?

To analyze Question 1, we first introduce a few notations. We assume that we are given a sample space , a finite set of data points. From this sample space, we are given a set of positive labeled points . We assume these labeled points are drawn i.i.d. from some unknown target distribution over . We assume that represents a data-point which is characterized by a set of features for . We define the missing indicator variable as follows:

In any setting of missingness, such as, MCAR, MAR or MNAR, the true expectation of the data can be quite deviant from the observable expectation depending on the accuracy of missing value imputation. Having to represent the prior on and to represent the conditional distribution of missingness which can be MCAR, MAR or MNAR, we can write applying Bayes rule:

(1)

We can estimate the empirical expected value of a single feature the following way:

(2)

In other words, the estimated expected value of a feature is with respect to the conditional probability instead of . We now discuss the situation when we impute the missing values. Let , which we denote as for simplicity, represent any imputation method invoked when is missing. In other words, we use to approximate the ground truth of when it is missing.

(3)

where

(4)

Observe that two weighted expectations are contributing to the estimated expectation of feature . The first term in equation 3 represents the expectation of feature estimated on the basis of only the observed values. The second term, however, is the expectation over another conditional distribution of some imputed feature . The deviation between the imputed feature value , and the corresponding ground truth depends on the accuracy of the imputation method. Notice also that higher is the degree of sparsity, more pronounced is the effect of on .

Equation 3 essentially means that, in effect of missing value imputation, , the marginal distribution of every feature, , the overall data distribution and , the conditional distribution where represents the label of the data-point represented by admit some degree of perturbation.

For Question 2, we first take an example of an error-rate based method, such as, DDM Gama et al. (2004)

. The essence of this method lies in comparing the error rate of a classifier between two consecutive batches of data-points of length

and checking if the change in error-rate of the classifier between these two batches is statistically significant.

Suppose a sequence of examples, in the form of pairs . For each example, the actual classification model predicts

, that can be True or False. For a set of examples, the error is a random variable from Bernoulli trials 

Forbes et al. (2010)

. The Binomial distribution gives the general form of the probability for the random variable that represents the number of errors in a sample of

examples. For each point in the sequence, the error-rate is the probability of observed False, , with standard deviation given by . Statistical theory Mitchell (1997) guarantees the decline in error rate of the learning algorithm () with the increase of on the condition of stationarity of class distribution. Suppose the length of the first batch is and that of the second batch is with proportion of error being and , respectively. and represent two random variables satisfying and

. Error-rate based concept drift detection methods, such as, DDM, end up constructing a test to validate the following null hypothesis:

(5)

The limitation of the above approach becomes prominent in the presence of a high degree of sparsity in the two consecutive sequences of data samples. As explained in previous section, the distributions of the data of those two batches get perturbed following missing value imputation. Therefore, the binomial distribution which is used to model the error-rate in those two batches also ends up being perturbed and eventually results in wrong perception of concept drift.

Data distribution-based drift detection algorithms are considered the second largest category of drift detection methods where a distance function is employed to quantify the dissimilarity between the distributions of two consecutive sequences or batches of data samples. As the missing value imputation perturbs the batches of data samples, this category of algorithms suffers from similar problem as mentioned above in the presence of high degree of sparsity.

Therefore, it warrants us to develop a strategy to ensure a guarantee, at least to some extent, on the performance of the drift detection method even in the presence of various non-ideal data characteristics, such as, high degree of sparsity.

4 Methodology

Our methodology involves the following broad steps.

Distribution MCAR MAR MNAR
Normal Mean Mean Mean
Uniform Mean Mean Mean
Chi-squared Mean Mean Mean
Cauchy Median Median Median
Binomial Median Median Median
Multi-variate kNN50 (30%) kNN100 kNN50
normal kNN100 (30%)
Table 1: Distribution-wise best data imputation technique.

1. Find distribution-wise best data imputation scheme. We started by creating various distributions (one may think of these as synthetic datasets with a single feature except for the multi-variate normal case), and applied MCAR, MAR and MNAR sparsities at different levels – . The data imputation techniques applied were mean, median, mode, zero (constant), and kNN (we tried multiple values of k). Our findings are summarized in Table 1

. Note that for some distributions, some of the data imputation schemes may be identical, e.g., mean, median and mode for normal distribution; however, we mention only one of these identical schemes for a given distribution in this table. Surprisingly, other than the multi-variate normal distribution, the best performing scheme is identical irrespective of the type of missingness and the sparsity level.

Figure 1: Finding the best data imputation scheme for a given dataset with sparsity.

2. Find the best data imputation scheme for a given dataset. For a given dataset with sparsity, we first identify what is the level of sparsity for each feature. Next we replace each missing value with 0 and non-missing value with 1 and apply runs test for randomness Bradley (1968); Paindaveine (2009)

. Given our application domain and the corresponding data, which is mostly obtained through forms filled in by human change requestors who tend to leave out non-mandatory fields and thus leading to sparsity, we ruled out the possibility of MCAR (although we do perform experiments with MCAR on synthetic datasets in subsequent section), and strongly believe that there is a (possibly hidden) pattern to the missingness in our data. We envision that most domain experts should be able to use prior knowledge to tell whether MCAR is indeed a possibility for their use cases or not – in our experience, it is a rare phenomenon. Therefore, if the test declares that the data is missing at random, then we conclude that it is a case of MNAR, i.e., we are currently not collecting the variables on which the missingness depend; otherwise, we consider it a case of MAR. We also use quantile-quantile (Q-Q) plots to find the probability distribution of each feature. Subsequently, we isolate the rows that do not have any missing values. Having already identified the sparsity level and the type of missingness, we apply the same on these rows (without sparsity). Since we know the original values for these newly introduced missing values, we use the same to find the best data imputation scheme among the winner identified in the previous step and correlation-dependent imputation techniques mentioned in Section 

2.2. We use root-mean-square error (RMSE) to find the best scheme. This strategy is captured in Figure 1.

3. Apply the majority voting based ensemble of concept drift detectors on the dataset with imputed values. The central idea behind our approach is to construct an ensemble of multiple CDDs and infer on concept drift based on the majority voting among the individual CDDs. The intuition is that the majority voting strategy will provide a lower bound on the resulting accuracy by taking advantage of the strengths of individual CDDs in diverse situations.

On a formal note, consider a concept drift detection problem in which an algorithm takes two consecutive sequences of data instances, and , from a data stream and predicts if drift exists between and . We have to represent the prediction of the drift detection algorithm where -1 and +1 represent absence and presence of drift respectively. The overall decision is based on the average prediction of the base CDDs. Let the average prediction of the base CDDs, , be the score :

(6)

In the usual case, if the score is negative then the overall decision is , and if positive then , i.e. .

To analyze the properties of the ensemble CDD, consider a generic framework in which the ensemble CDD, , takes the overall decision:

where is a rejection threshold and is the most uncertain region for the ensemble drift detector . Define a random variable where represents the ground truth of the existence of drift between .and and admits value either or depending on the existence of drift while is the output of the ensemble drift detector as defined above. Due to the encoding , is negative for incorrect predictions and positive for correct predictions. If is in the range , it causes the ensemble drift detector to suffer from indecision. With being a random variable, the probability of wrong prediction

(7)

and the probability of rejection

(8)

We now define the risk of the ensemble drift detector as below:

(9)

Here and represent the cost of wrong prediction and rejection or indecision respectively. We now look to find an upper bound of to analyze the worst case situation. As discussed in Breiman (2001), we have

(10)

where represents the average pairwise correlation between base CDDs and . Following Cantelli’s inequality (Wu et al., 2021), we have

(11)

Setting , we obtain

(12)

Similarly, using the inequality 10, we have

(13)

Thus combining 9, 12 and 13, we can find the upper bound of as below:

(14)

In our model, we set which gives,

(15)

Observe that the above inequality applies only when and the upper bound of comes down with the decrease in . In other words, with more diversity among the base CDDs, the ensemble, as defined in equation 6, becomes less prone to wrong prediction about concept drift even if the base CDDs suffer from deterioration in performance due to sparsity.

5 Experimental Setup & Results

5.1 Dataset Description

(a) Prequential error.
(b) Accuracy.

(c) Average detection delay.
(d) True positive rate.

(e) True positives per drift.
(f) Drift count.
Figure 2: Performance of CDDs with and without data imputation for change risk assessment data.

We use the publicly available Harvard dataverse Lobo (2020) that contains synthetic datasets for abrupt and gradual concept drift detection. For our change risk assessment project, we have 50K samples (change requests) that are labelled as either “risky” (i.e., potentially may lead to some major incident) or “not risky” – thus, our primary task is binary classification. Now, we take the 50K samples and shuffle these to remove any pre-existing concept drifts in the data, and then create separate datasets of abrupt and gradual concept drifts by flipping the class labels; we tried various combinations of sample points where the drifts are introduced and various drift widths. Note that this process of shuffling the original dataset (which ideally has no adverse effect on the original classifier), and then introducing drifts artificially is a common process for real-world datasets because in reality, it is not always feasible to clearly identify the onset and width of a concept drift Souza et al. (2020); Sethi and Kantardzic (2017).

5.2 Experimental Results

Initially, we check whether data imputation aids in concept drift detection or not. On experimenting with Harvard dataverse and our data, we find that data imputation had a positive effect on all the metrics mentioned in Section 2.5 and for all the algorithms in Section 2.4 – this is shown in Figure 2 for our data, which had lowest RMSE for kNN4.

(a) Prequential error.
(b) Accuracy.

(c) Average detection delay.
(d) True positive rate.

(e) True positives per drift.
(f) Drift count.
Figure 3: Performance of CDDs (post data imputation) for gradual drifts in Harvard dataverse with different MAR sparsities.

Next we introduce different levels of sparsity using different techniques – MCAR, MAR, MNAR – on Harvard dataverse, and record the performance of the 7 concept drift detectors for the different metrics. We notice that although prequential error is widely used as a metric for comparing CDDs Baena-Garcia et al. (2006); Yan (2020); Lu et al. (2019); however, it can be sometimes misleading – consider the case when one has a bad CDD that detects drifts often (even when there is none), and consequent retraining, which are typically quite costly, lead to low prequential error value. Similar problem may arise with the metric accuracy. We have discussed the challenges with the other metrics already in Section 2.5. However, these metrics together serve as key indicators for performance of CDDs.

(a) Prequential error.
(b) Accuracy.

(c) ADD.
(d) TPR.

(e) TPD.
(f) Drift count.
Figure 4: Performance of top-3 CDDs (post data imputation) for gradual drifts in change risk assessment data.

Empirically, we find that none of the 7 drift detectors performs well across all the metrics, as can be seen in Figure 3, for any given missingness or drift type. Hence, we decide to go with a majority voting based ensemble of detectors. Ranking the algorithms based on their performance for different metrics, we find that for abrupt drifts: ADWIN HDDMA KSWIN, and for gradual drifts: HDDMA + HDDMW + Page-Hinkley, achieve best performance across all the metrics.

A challenge for working with an ensemble of detectors is finding the optimal window size, such that, if at any instant within that window, two of the detectors detect a drift, then the ensemble declares a drift at that instant. Our experiments show that for Harvard dataverse, the optimal window size is , and for our data, it is . The ensemble detector for Harvard dataverse is also shown in Figure 3. We omit the details of this experiment for finding the optimal window size due to page limitation.

We, finally, test our ensemble detector on change risk assessment data as shown in Figure 4. As can be seen in this figure, if not the best, the ensemble detector always features in the top-3 for all the metrics. Therefore, we plan to deploy it in near future because the majority voting based ensemble detector serves as a strong baseline, and it is unlikely that a concept drift will be missed by it.

6 Conclusion

Concept drift detection poses a major challenge for real-world deployments of machine learning solutions. The problem is further worsened in the presence of sparsity, especially since the ground truths for the missing values can never be obtained in reality. One of our major findings in course of tackling this problem is that none of the popular concept drift detectors exhibit optimal performance across all metrics in all situations consistently. Therefore, we design a majority voting based ensemble of detectors for abrupt and gradual drifts and it delivers best or close to best performance for the whole spectrum of metrics.

References

  • M. Baena-Garcia, J. del Campo-Avila, R. Fidalgo, A. Bifet, R. Gavalda, and R. Morales-Bueno (2006) Early drift detection method. In Proc. 4th Int. Workshop Knowledge Discovery from Data Streams, Cited by: §5.2.
  • A. Bifet and R. Gavaldà (2007) Learning from time-changing data with adaptive windowing. In SDM, pp. 443–448. Cited by: §2.4.
  • J. V. Bradley (1968) Distribution-free statistical tests. Prentice-Hall. Cited by: §4.
  • L. Breiman (2001) Random forests. Mach. Learn. 45 (1), pp. 5–32. Cited by: §4.
  • C. Forbes, M. Evans, N. Hastings, and B. Peacock (2010) Statistical distributions. 4th edition, John Wiley & Sons. Cited by: §3.
  • I. Frías-Blanco, J. d. Campo-Ávila, G. Ramos-Jiménez, R. Morales-Bueno, A. Ortiz-Díaz, and Y. Caballero-Mota (2015) Online and non-parametric drift detection methods based on hoeffding’s bounds. IEEE Transactions on Knowledge and Data Engineering 27 (3), pp. 810–823. Cited by: §2.4.
  • J. Gama, P. Medas, G. Castillo, and P. P. Rodrigues (2004) Learning with drift detection. In SBIA, LNCS, Vol. 3171, pp. 286–295. Cited by: §2.4, §3.
  • B. Gupta, A. Chatterjee, H. Matha, K. Banerjee, L. Parsai, and V. Agneeswaran (2021) Look before you leap! designing a human-centered AI system for change risk assessment. CoRR abs/2108.07951. Cited by: §1, §2.3.
  • T. Hastie, R. Mazumder, J. D. Lee, and R. Zadeh (2015) Matrix completion and low-rank SVD via fast alternating least squares. J. Mach. Learn. Res. 16, pp. 3367–3402. Cited by: §2.2.
  • A. Liu, J. Lu, and G. Zhang (2020) Concept drift detection: dealing with missingvalues via fuzzy distance estimations. CoRR abs/2008.03662. Cited by: §3.
  • J. L. Lobo (2020) Synthetic datasets for concept drift detection purposes. Harvard Dataverse. External Links: Link Cited by: §5.1.
  • J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang (2019) Learning under concept drift: A review. IEEE Trans. Knowl. Data Eng. 31 (12), pp. 2346–2363. Cited by: §3, §5.2.
  • T. M. Mitchell (1997) Machine learning. McGraw-Hill. External Links: ISBN 978-0-07-042807-2 Cited by: §3.
  • B. Muzellec, J. Josse, C. Boyer, and M. Cuturi (2020) Missing data imputation using optimal transport. In ICML, Proceedings of Machine Learning Research, Vol. 119, pp. 7130–7140. Cited by: §2.2.
  • E. S. Page (1954) Continuous inspection schemes. Biometrika 41 (1/2), pp. 100–115. Cited by: §2.4.
  • D. Paindaveine (2009) On multivariate runs tests for randomness. Journal of the American Statistical Association 104 (488), pp. 1525–1538. Cited by: §4.
  • A. Pesaranghader and H. L. Viktor (2016) Fast hoeffding drift detection method for evolving data streams. In Machine Learning and Knowledge Discovery in Databases, pp. 96–111. Cited by: §2.5.
  • C. Raab, M. Heusinger, and F. Schleif (2020) Reactive soft prototype computing for concept drift streams. Neurocomputing 416, pp. 340–351. Cited by: §2.4.
  • D. B. Rubin (1976) Inference and missing data. Biometrika 63 (3), pp. 581–592. Cited by: §2.1.
  • T. S. Sethi and M. M. Kantardzic (2017) On the reliable detection of concept drift from streaming unlabeled data. Expert Syst. Appl. 82, pp. 77–99. Cited by: §5.1.
  • V. M. A. Souza, D. M. Reis, A. G. Maletzke, and G. E. A. P. A. Batista (2020) Challenges in benchmarking stream learning algorithms with real-world data. Data Mining and Knowledge Discovery 34, pp. 1805–1858. Cited by: §5.1.
  • S. van Buuren and K. Groothuis-Oudshoorn (2011) Mice: multivariate imputation by chained equations in r. Journal of Statistical Software, Articles 45 (3), pp. 1–67. External Links: ISSN 1548-7660 Cited by: §2.2.
  • M. M. W. Yan (2020) Accurate detecting concept drift in evolving data streams. ICT Express 6 (4), pp. 332–338. Cited by: §2.5, §5.2.