A Fast Algorithm for Clustering High Dimensional Feature Vectors

11/02/2018
by   Shahina Rahman, et al.
0

We propose an algorithm for clustering high dimensional data. If P features for N objects are represented in an N× P matrix X, where N≪ P, the method is based on exploiting the cluster-dependent structure of the N× N matrix XX^T. Computational burden thus depends primarily on N, the number of objects to be clustered, rather than P, the number of features that are measured. This makes the method particularly useful in high dimensional settings, where it is substantially faster than a number of other popular clustering algorithms. Aside from an upper bound on the number of potential clusters, the method is independent of tuning parameters. When compared to 16 other clustering algorithms on 32 genomic datasets with gold standards, we show that it provides the most accurate cluster configuration more than twice as often than its closest competitors. We illustrate the method on data taken from highly cited genomic studies.

READ FULL TEXT

page 5

page 6

research
02/16/2022

Using the left Gram matrix to cluster high dimensional data

For high dimensional data, where P features for N objects (P >> N) are r...
research
06/27/2018

Quantile-based clustering

A new cluster analysis method, K-quantiles clustering, is introduced. K-...
research
07/01/2022

Enhancing cluster analysis via topological manifold learning

We discuss topological aspects of cluster analysis and show that inferri...
research
01/30/2018

Links: A High-Dimensional Online Clustering Method

We present a novel algorithm, called Links, designed to perform online c...
research
05/08/2018

Finding Frequent Entities in Continuous Data

In many applications that involve processing high-dimensional data, it i...
research
03/08/2016

A Bayesian non-parametric method for clustering high-dimensional binary data

In many real life problems, objects are described by large number of bin...
research
03/02/2022

Efficient Dynamic Clustering: Capturing Patterns from Historical Cluster Evolution

Clustering aims to group unlabeled objects based on similarity inherent ...

Please sign up or login with your details

Forgot password? Click here to reset