Dimensionality's Blessing: Clustering Images by Underlying Distribution

04/08/2018
by   Wen-Yan Lin, et al.
0

Many high dimensional vector distances tend to a constant. This is typically considered a negative "contrast-loss" phenomenon that hinders clustering and other machine learning techniques. We reinterpret "contrast-loss" as a blessing. Re-deriving "contrast-loss" using the law of large numbers, we show it results in a distribution's instances concentrating on a thin "hyper-shell". The hollow center means apparently chaotically overlapping distributions are actually intrinsically separable. We use this to develop distribution-clustering, an elegant algorithm for grouping of data points by their (unknown) underlying distribution. Distribution-clustering, creates notably clean clusters from raw unlabeled data, estimates the number of clusters for itself and is inherently robust to "outliers" which form their own clusters. This enables trawling for patterns in unorganized data and may be the key to enabling machine intelligence.

READ FULL TEXT

page 1

page 3

page 8

research
04/25/2020

Clustering by Constructing Hyper-Planes

As a kind of basic machine learning method, clustering algorithms group ...
research
09/10/2019

Subspace clustering without knowing the number of clusters: A parameter free approach

Subspace clustering, the task of clustering high dimensional data when t...
research
12/29/2021

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional...
research
11/24/2020

Min-Sum Clustering (with Outliers)

We give a constant factor polynomial time pseudo-approximation algorithm...
research
05/13/2022

DRBM-ClustNet: A Deep Restricted Boltzmann-Kohonen Architecture for Data Clustering

A Bayesian Deep Restricted Boltzmann-Kohonen architecture for data clust...
research
02/09/2022

Application of the Affinity Propagation Clustering Technique to obtain traffic accident clusters at macro, meso, and micro levels

Accident grouping is a crucial step in identifying accident-prone locati...
research
08/13/2018

Clustering genomic words in human DNA using peaks and trends of distributions

In this work we seek clusters of genomic words in human DNA by studying ...

Please sign up or login with your details

Forgot password? Click here to reset