High Dimensional Clustering with r-nets

11/06/2018
by   Georgia Avarikioti, et al.
0

Clustering, a fundamental task in data science and machine learning, groups a set of objects in such a way that objects in the same cluster are closer to each other than to those in other clusters. In this paper, we consider a well-known structure, so-called r-nets, which rigorously captures the properties of clustering. We devise algorithms that improve the run-time of approximating r-nets in high-dimensional spaces with ℓ_1 and ℓ_2 metrics from Õ(dn^2-Θ(√(ϵ))) to Õ(dn + n^2-α), where α = Ω(ϵ^1/3/(1/ϵ)). These algorithms are also used to improve a framework that provides approximate solutions to other high dimensional distance problems. Using this framework, several important related problems can also be solved efficiently, e.g., (1+ϵ)-approximate kth-nearest neighbor distance, (4+ϵ)-approximate Min-Max clustering, (4+ϵ)-approximate k-center clustering. In addition, we build an algorithm that (1+ϵ)-approximates greedy permutations in time Õ((dn + n^2-α) ·Φ) where Φ is the spread of the input. This algorithm is used to (2+ϵ)-approximate k-center with the same time complexity.

READ FULL TEXT
research
03/17/2023

High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations

Approximate K nearest neighbor (AKNN) search is a fundamental and challe...
research
04/18/2018

HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

Nearest neighbor searching of large databases in high-dimensional spaces...
research
12/20/2017

Fast kNN mode seeking clustering applied to active learning

A significantly faster algorithm is presented for the original kNN mode ...
research
09/09/2020

KNN-DBSCAN: a DBSCAN in high dimensions

Clustering is a fundamental task in machine learning. One of the most su...
research
07/16/2019

Random projections and sampling algorithms for clustering of high-dimensional polygonal curves

We study the center and median clustering problems for high-dimensional ...
research
02/13/2021

ThetA – fast and robust clustering via a distance parameter

Clustering is a fundamental problem in machine learning where distance-b...
research
01/02/2018

Sketching and Clustering Metric Measure Spaces

Two important optimization problems in the analysis of geometric data se...

Please sign up or login with your details

Forgot password? Click here to reset