Clustering Plotted Data by Image Segmentation

10/06/2021
by   Tarek Naous, et al.
6

Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar points. In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data: by training neural networks to perform instance segmentation on plotted data. Our approach, Visual Clustering, has several advantages over traditional clustering algorithms: it is much faster than most existing clustering algorithms (making it suitable for very large datasets), it agrees strongly with human intuition for clusters, and it is by default hyperparameter free (although additional steps with hyperparameters can be introduced for more control of the algorithm). We describe the method and compare it to ten other clustering methods on synthetic data to illustrate its advantages and disadvantages. We then demonstrate how our approach can be extended to higher dimensional data and illustrate its performance on real-world data. The implementation of Visual Clustering is publicly available and can be applied to any dataset in a few lines of code.

READ FULL TEXT
research
08/21/2020

Clustering small datasets in high-dimension by random projection

Datasets in high-dimension do not typically form clusters in their origi...
research
12/26/2018

Group evolution patterns in running races

We address the problem of tracking and detecting interactions between th...
research
03/27/2021

Instance segmentation with the number of clusters incorporated in embedding learning

Semantic and instance segmentation algorithms are two general yet distin...
research
09/08/2020

A Distance-preserving Matrix Sketch

Visualizing very large matrices involves many formidable problems. Vario...
research
04/14/2015

Probabilistic Clustering of Time-Evolving Distance Data

We present a novel probabilistic clustering model for objects that are r...
research
01/24/2023

Generating Multidimensional Clusters With Support Lines

Synthetic data is essential for assessing clustering techniques, complem...
research
05/24/2022

Embedding Neighborhoods Simultaneously t-SNE (ENS-t-SNE)

We propose an algorithm for visualizing a dataset by embedding it in 3-d...

Please sign up or login with your details

Forgot password? Click here to reset