A Probabilistic ℓ_1 Method for Clustering High Dimensional Data

04/06/2015
by   Tsvetan Asamov, et al.
0

In general, the clustering problem is NP-hard, and global optimality cannot be established for non-trivial instances. For high-dimensional data, distance-based methods for clustering or classification face an additional difficulty, the unreliability of distances in very high-dimensional spaces. We propose a distance-based iterative method for clustering data in very high-dimensional space, using the ℓ_1-metric that is less sensitive to high dimensionality than the Euclidean distance. For K clusters in R^n, the problem decomposes to K problems coupled by probabilities, and an iteration reduces to finding Kn weighted medians of points on a line. The complexity of the algorithm is linear in the dimension of the data space, and its performance was observed to improve significantly as the dimension increases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2020

Clustering small datasets in high-dimension by random projection

Datasets in high-dimension do not typically form clusters in their origi...
research
11/30/2022

High-Dimensional Wide Gap k-Means Versus Clustering Axioms

Kleinberg's axioms for distance based clustering proved to be contradict...
research
07/13/2023

Kernel t-distributed stochastic neighbor embedding

This paper presents a kernelized version of the t-SNE algorithm, capable...
research
01/13/2022

How I learned to stop worrying and love the curse of dimensionality: an appraisal of cluster validation in high-dimensional spaces

The failure of the Euclidean norm to reliably distinguish between nearby...
research
11/29/2019

Minkowski distances and standardisation for clustering and classification of high dimensional data

There are many distance-based methods for classification and clustering,...
research
04/21/2021

Skeleton Clustering: Dimension-Free Density-based Clustering

We introduce a density-based clustering method called skeleton clusterin...
research
08/13/2017

Mahalanonbis Distance Informed by Clustering

A fundamental question in data analysis, machine learning and signal pro...

Please sign up or login with your details

Forgot password? Click here to reset