On the Reliable Detection of Concept Drift from Streaming Unlabeled Data

03/31/2017
by   Tegjyot Singh Sethi, et al.
0

Classifiers deployed in the real world operate in a dynamic environment, where the data distribution can change over time. These changes, referred to as concept drift, can cause the predictive performance of the classifier to drop over time, thereby making it obsolete. To be of any real use, these classifiers need to detect drifts and be able to adapt to them, over time. Detecting drifts has traditionally been approached as a supervised task, with labeled data constantly being used for validating the learned model. Although effective in detecting drifts, these techniques are impractical, as labeling is a difficult, costly and time consuming activity. On the other hand, unsupervised change detection techniques are unreliable, as they produce a large number of false alarms. The inefficacy of the unsupervised techniques stems from the exclusion of the characteristics of the learned classifier, from the detection process. In this paper, we propose the Margin Density Drift Detection (MD3) algorithm, which tracks the number of samples in the uncertainty region of a classifier, as a metric to detect drift. The MD3 algorithm is a distribution independent, application independent, model independent, unsupervised and incremental algorithm for reliably detecting drifts from data streams. Experimental evaluation on 6 drift induced datasets and 4 additional datasets from the cybersecurity domain demonstrates that the MD3 approach can reliably detect drifts, with significantly fewer false alarms compared to unsupervised feature based drift detectors. The reduced false alarms enables the signaling of drifts only when they are most likely to affect classification performance. As such, the MD3 approach leads to a detection scheme which is credible, label efficient and general in its applicability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2018

Handling Adversarial Concept Drift in Streaming Data

Classifiers operating in a dynamic, real world environment, are vulnerab...
research
08/16/2021

Task-Sensitive Concept Drift Detector with Constraint Embedding

Detecting drifts in data is essential for machine learning applications,...
research
01/31/2022

Implicit Concept Drift Detection for Multi-label Data Streams

Many real-world applications adopt multi-label data streams as the need ...
research
08/12/2019

Automatic Model Monitoring for Data Streams

Detecting concept drift is a well known problem that affects production ...
research
04/24/2020

Concept Drift Detection via Equal Intensity k-means Space Partitioning

Data stream poses additional challenges to statistical classification ta...
research
04/17/2023

Computational Performance Aware Benchmarking of Unsupervised Concept Drift Detection

For many AI systems, concept drift detection is crucial to ensure the sy...
research
05/25/2023

Detecting Dataset Drift and Non-IID Sampling via k-Nearest Neighbors

We present a straightforward statistical test to detect certain violatio...

Please sign up or login with your details

Forgot password? Click here to reset