D.MCA: Outlier Detection with Explicit Micro-Cluster Assignments

10/15/2022
by   Shuli Jiang, et al.
0

How can we detect outliers, both scattered and clustered, and also explicitly assign them to respective micro-clusters, without knowing apriori how many micro-clusters exist? How can we perform both tasks in-house, i.e., without any post-hoc processing, so that both detection and assignment can benefit simultaneously from each other? Presenting outliers in separate micro-clusters is informative to analysts in many real-world applications. However, a naïve solution based on post-hoc clustering of the outliers detected by any existing method suffers from two main drawbacks: (a) appropriate hyperparameter values are commonly unknown for clustering, and most algorithms struggle with clusters of varying shapes and densities; (b) detection and assignment cannot benefit from one another. In this paper, we propose D.MCA to Detect outliers with explicit Micro-Cluster Assignment. Our method performs both detection and assignment iteratively, and in-house, by using a novel strategy that prunes entire micro-clusters out of the training set to improve the performance of the detection. It also benefits from a novel strategy that avoids clustered outliers to mask each other, which is a well-known problem in the literature. Also, D.MCA is designed to be robust to a critical hyperparameter by employing a hyperensemble "warm up" phase. Experiments performed on 16 real-world and synthetic datasets demonstrate that D.MCA outperforms 8 state-of-the-art competitors, especially on the explicit outlier micro-cluster assignment task.

READ FULL TEXT
research
08/26/2018

Detecting Outliers in Data with Correlated Measures

Advances in sensor technology have enabled the collection of large-scale...
research
07/26/2022

Task Agnostic and Post-hoc Unseen Distribution Detection

Despite the recent advances in out-of-distribution(OOD) detection, anoma...
research
02/22/2023

Cluster Purging: Efficient Outlier Detection based on Rate-Distortion Theory

Rate-distortion theory-based outlier detection builds upon the rationale...
research
01/05/2018

Clustering with Outlier Removal

Cluster analysis and outlier detection are strongly coupled tasks in dat...
research
10/13/2021

C-AllOut: Catching Calling Outliers by Type

Given an unlabeled dataset, wherein we have access only to pairwise simi...
research
08/10/2022

SSDBCODI: Semi-Supervised Density-Based Clustering with Outliers Detection Integrated

Clustering analysis is one of the critical tasks in machine learning. Tr...
research
07/15/2013

On Soft Power Diagrams

Many applications in data analysis begin with a set of points in a Eucli...

Please sign up or login with your details

Forgot password? Click here to reset