On Metric DBSCAN with Low Doubling Dimension

02/27/2020
by   Hu Ding, et al.
0

The density based clustering method Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a popular method for outlier recognition and has received tremendous attention from many different areas. A major issue of the original DBSCAN is that the time complexity could be as large as quadratic. Most of existing DBSCAN algorithms focus on developing efficient index structures to speed up the procedure in low-dimensional Euclidean space. However, the research of DBSCAN in high-dimensional Euclidean space or general metric space is still quite limited, to the best of our knowledge. In this paper, we consider the metric DBSCAN problem under the assumption that the inliers (excluding the outliers) have a low doubling dimension. We apply a novel randomized k-center clustering idea to reduce the complexity of range query, which is the most time consuming step in the whole DBSCAN procedure. Our proposed algorithms do not need to build any complicated data structures and are easy to be implemented in practice. The experimental results show that our algorithms can significantly outperform the existing DBSCAN algorithms in terms of running time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2019

Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction

We study the problem of k-center clustering with outliers in arbitrary m...
research
01/24/2019

Greedy Strategy Works for Clustering with Outliers and Coresets Construction

We study the problems of clustering with outliers in high dimension. Tho...
research
02/27/2020

A Data Dependent Algorithm for Querying Earth Mover's Distance with Low Doubling Dimension

In this paper, we consider the following query problem: given two weight...
research
02/27/2020

A Data-Dependent Algorithm for Querying Earth Mover's Distance with Low Doubling Dimensions

In this paper, we consider the following query problem: given two weight...
research
06/13/2016

Modal-set estimation with an application to clustering

We present a first procedure that can estimate -- with statistical consi...
research
11/19/2018

On Geometric Alignment in Low Doubling Dimension

In real-world, many problems can be formulated as the alignment between ...
research
06/27/2023

Non-parametric online market regime detection and regime clustering for multidimensional and path-dependent data structures

In this work we present a non-parametric online market regime detection ...

Please sign up or login with your details

Forgot password? Click here to reset