ThetA – fast and robust clustering via a distance parameter

02/13/2021
by   Eleftherios Garyfallidis, et al.
0

Clustering is a fundamental problem in machine learning where distance-based approaches have dominated the field for many decades. This set of problems is often tackled by partitioning the data into K clusters where the number of clusters is chosen apriori. While significant progress has been made on these lines over the years, it is well established that as the number of clusters or dimensions increase, current approaches dwell in local minima resulting in suboptimal solutions. In this work, we propose a new set of distance threshold methods called Theta-based Algorithms (ThetA). Via experimental comparisons and complexity analyses we show that our proposed approach outperforms existing approaches in: a) clustering accuracy and b) time complexity. Additionally, we show that for a large class of problems, learning the optimal threshold is straightforward in comparison to learning K. Moreover, we show how ThetA can infer the sparsity of datasets in higher dimensions.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

10/10/2016

Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering

We consider the problem of Gaussian mixture clustering in the high-dimen...
10/31/2018

On the Persistence of Clustering Solutions and True Number of Clusters in a Dataset

Typically clustering algorithms provide clustering solutions with prespe...
12/01/2019

HCA-DBSCAN: HyperCube Accelerated Density Based Spatial Clustering for Applications with Noise

Density-based clustering has found numerous applications across various ...
11/06/2018

High Dimensional Clustering with r-nets

Clustering, a fundamental task in data science and machine learning, gro...
09/23/2021

Fast Density Estimation for Density-based Clustering Methods

Density-based clustering algorithms are widely used for discovering clus...
04/06/2020

Class Anchor Clustering: a Distance-based Loss for Training Open Set Classifiers

Existing open set classifiers distinguish between known and unknown inpu...
10/01/2018

Accelerated Training of Large-Scale Gaussian Mixtures by a Merger of Sublinear Approaches

We combine two recent lines of research on sublinear clustering to signi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.