GriT-DBSCAN: A Spatial Clustering Algorithm for Very Large Databases

10/14/2022
by   Xiaogang Huang, et al.
0

DBSCAN is a fundamental spatial clustering algorithm with numerous practical applications. However, a bottleneck of the algorithm is in the worst case, the run time complexity is O(n^2). To address this limitation, we propose a new grid-based algorithm for exact DBSCAN in Euclidean space called GriT-DBSCAN, which is based on the following two techniques. First, we introduce a grid tree to organize the non-empty grids for the purpose of efficient non-empty neighboring grids queries. Second, by utilising the spatial relationships among points, we propose a technique that iteratively prunes unnecessary distance calculations when determining whether the minimum distance between two sets is less than or equal to a certain threshold. We theoretically prove that the complexity of GriT-DBSCAN is linear to the data set size. In addition, we obtain two variants of GriT-DBSCAN by incorporating heuristics, or by combining the second technique with an existing algorithm. Experiments are conducted on both synthetic and real-world data sets to evaluate the efficiency of GriT-DBSCAN and its variants. The results of our analyses show that our algorithms outperform existing algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2018

An Efficient Density-based Clustering Algorithm for Higher-Dimensional Data

DBSCAN is a typically used clustering algorithm due to its clustering ab...
research
09/13/2017

An efficient clustering algorithm from the measure of local Gaussian distribution

In this paper, I will introduce a fast and novel clustering algorithm ba...
research
12/01/2019

HCA-DBSCAN: HyperCube Accelerated Density Based Spatial Clustering for Applications with Noise

Density-based clustering has found numerous applications across various ...
research
02/26/2018

All nearest neighbor calculation based on Delaunay graphs

When we have two data sets and want to find the nearest neighbour of eac...
research
01/26/2021

New Algorithms for Computing Field of Vision over 2D Grids

The aim of this paper is to propose new algorithms for Field of Vision (...
research
07/31/2020

MSPP: A Highly Efficient and Scalable Algorithm for Mining Similar Pairs of Points

The closest pair of points problem or closest pair problem (CPP) is an i...
research
12/16/2021

KnAC: an approach for enhancing cluster analysis with background knowledge and explanations

Pattern discovery in multidimensional data sets has been a subject of re...

Please sign up or login with your details

Forgot password? Click here to reset