DeepAI AI Chat
Log In Sign Up

GriT-DBSCAN: A Spatial Clustering Algorithm for Very Large Databases

by   Xiaogang Huang, et al.
Southwestern University of Finance and Economics
University of Canberra

DBSCAN is a fundamental spatial clustering algorithm with numerous practical applications. However, a bottleneck of the algorithm is in the worst case, the run time complexity is O(n^2). To address this limitation, we propose a new grid-based algorithm for exact DBSCAN in Euclidean space called GriT-DBSCAN, which is based on the following two techniques. First, we introduce a grid tree to organize the non-empty grids for the purpose of efficient non-empty neighboring grids queries. Second, by utilising the spatial relationships among points, we propose a technique that iteratively prunes unnecessary distance calculations when determining whether the minimum distance between two sets is less than or equal to a certain threshold. We theoretically prove that the complexity of GriT-DBSCAN is linear to the data set size. In addition, we obtain two variants of GriT-DBSCAN by incorporating heuristics, or by combining the second technique with an existing algorithm. Experiments are conducted on both synthetic and real-world data sets to evaluate the efficiency of GriT-DBSCAN and its variants. The results of our analyses show that our algorithms outperform existing algorithms.


page 1

page 2

page 3

page 4


An Efficient Density-based Clustering Algorithm for Higher-Dimensional Data

DBSCAN is a typically used clustering algorithm due to its clustering ab...

An efficient clustering algorithm from the measure of local Gaussian distribution

In this paper, I will introduce a fast and novel clustering algorithm ba...

HCA-DBSCAN: HyperCube Accelerated Density Based Spatial Clustering for Applications with Noise

Density-based clustering has found numerous applications across various ...

All nearest neighbor calculation based on Delaunay graphs

When we have two data sets and want to find the nearest neighbour of eac...

New Algorithms for Computing Field of Vision over 2D Grids

The aim of this paper is to propose new algorithms for Field of Vision (...

MSPP: A Highly Efficient and Scalable Algorithm for Mining Similar Pairs of Points

The closest pair of points problem or closest pair problem (CPP) is an i...

KnAC: an approach for enhancing cluster analysis with background knowledge and explanations

Pattern discovery in multidimensional data sets has been a subject of re...