SDCOR: Scalable Density-based Clustering for Local Outlier Detection in Massive-Scale Datasets

This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets. Differently from well-known traditional algorithms, which assume that all the data is memory-resident, our proposed method is scalable and processes the data chunk-by-chunk within the confines of a limited memory buffer. At first, a temporary clustering model is built, then it is incrementally updated by analyzing consecutive memory loads of points. Ultimately, the proposed algorithm will give an outlying score to each object, which is named SDCOR (Scalable Density-based Clustering Outlierness Ratio). Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time complexity and is more effective and efficient compared to best-known conventional density-based methods, which need to load all the data into memory; and also some fast distance-based methods which can perform on the data resident in the disk.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2016

A Local Density-Based Approach for Local Outlier Detection

This paper presents a simple but effective density-based outlier detecti...
research
08/17/2016

Outlier Detection on Mixed-Type Data: An Energy-based Approach

Outlier detection amounts to finding data points that differ significant...
research
03/05/2022

Wasserstein Distance-based Spectral Clustering with Application to Transaction Data

With the rapid development of online payment platforms, it is now possib...
research
06/19/2021

A Generic Distributed Clustering Framework for Massive Data

In this paper, we introduce a novel Generic distributEd clustEring frame...
research
07/21/2018

Linear density-based clustering with a discrete density model

Density-based clustering techniques are used in a wide range of data min...
research
11/22/2022

Scalable and Effective Conductance-based Graph Clustering

Conductance-based graph clustering has been recognized as a fundamental ...

Please sign up or login with your details

Forgot password? Click here to reset