FINEX: A Fast Index for Exact Flexible Density-Based Clustering (Extended Version with Proofs)*

04/10/2023
by   Konstantin Emil Thiel, et al.
0

Density-based clustering aims to find groups of similar objects (i.e., clusters) in a given dataset. Applications include, e.g., process mining and anomaly detection. It comes with two user parameters (ϵ, MinPts) that determine the clustering result, but are typically unknown in advance. Thus, users need to interactively test various settings until satisfying clusterings are found. However, existing solutions suffer from the following limitations: (a) Ineffective pruning of expensive neighborhood computations. (b) Approximate clustering, where objects are falsely labeled noise. (c) Restricted parameter tuning that is limited to ϵ whereas MinPts is constant, which reduces the explorable clusterings. (d) Inflexibility in terms of applicable data types and distance functions. We propose FINEX, a linear-space index that overcomes these limitations. Our index provides exact clusterings and can be queried with either of the two parameters. FINEX avoids neighborhood computations where possible and reduces the complexities of the remaining computations by leveraging fundamental properties of density-based clusters. Hence, our solution is effcient and flexible regarding data types and distance functions. Moreover, FINEX respects the original and straightforward notion of density-based clustering. In our experiments on 12 large real-world datasets from various domains, FINEX frequently outperforms state-of-the-art techniques for exact clustering by orders of magnitude.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2022

An Improved Probability Propagation Algorithm for Density Peak Clustering Based on Natural Nearest Neighborhood

Clustering by fast search and find of density peaks (DPC) (Since, 2014) ...
research
02/08/2020

Index-based Solutions for Efficient Density Peaks Clustering

Density Peaks Clustering (DPC), a novel density-based clustering approac...
research
05/18/2023

Faster Parallel Exact Density Peaks Clustering

Clustering multidimensional points is a fundamental data mining task, wi...
research
10/16/2019

FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance

FISHDBC is a flexible, incremental, scalable, and hierarchical density-b...
research
07/11/2022

Fast Density-Peaks Clustering: Multicore-based Parallelization Approach

Clustering multi-dimensional points is a fundamental task in many fields...
research
02/03/2022

Fast and explainable clustering based on sorting

We introduce a fast and explainable clustering method called CLASSIX. It...
research
12/21/2021

Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly Types

We introduce anomaly clustering, whose goal is to group data into semant...

Please sign up or login with your details

Forgot password? Click here to reset