Clustering - What Both Theoreticians and Practitioners are Doing Wrong

05/22/2018
by   Shai Ben-David, et al.
0

Unsupervised learning is widely recognized as one of the most important challenges facing machine learning nowa- days. However, in spite of hundreds of papers on the topic being published every year, current theoretical understanding and practical implementations of such tasks, in particular of clustering, is very rudimentary. This note focuses on clustering. I claim that the most signif- icant challenge for clustering is model selection. In contrast with other common computational tasks, for clustering, dif- ferent algorithms often yield drastically different outcomes. Therefore, the choice of a clustering algorithm, and their pa- rameters (like the number of clusters) may play a crucial role in the usefulness of an output clustering solution. However, currently there exists no methodical guidance for clustering tool-selection for a given clustering task. Practitioners pick the algorithms they use without awareness to the implications of their choices and the vast majority of theory of clustering papers focus on providing savings to the resources needed to solve optimization problems that arise from picking some concrete clustering objective. Saving that pale in com- parison to the costs of mismatch between those objectives and the intended use of clustering results. I argue the severity of this problem and describe some recent proposals aiming to address this crucial lacuna.

READ FULL TEXT

page 1

page 2

page 3

research
11/10/2020

Higher-Order Spectral Clustering of Directed Graphs

Clustering is an important topic in algorithms, and has a number of appl...
research
09/28/2021

Clustering to the Fewest Clusters Under Intra-Cluster Dissimilarity Constraints

This paper introduces the equiwide clustering problem, where valid parti...
research
09/14/2017

Supervising Unsupervised Learning

We introduce a framework to leverage knowledge acquired from a repositor...
research
12/23/2021

Ensemble Method for Cluster Number Determination and Algorithm Selection in Unsupervised Learning

Unsupervised learning, and more specifically clustering, suffers from th...
research
07/06/2023

Optimal Bandwidth Selection for DENCLUE

In modern day industry, clustering algorithms are daily routines of algo...
research
05/19/2022

Confident Clustering via PCA Compression Ratio and Its Application to Single-cell RNA-seq Analysis

Unsupervised clustering algorithms for vectors has been widely used in t...
research
06/05/2019

Unsupervised Temporal Clustering to Monitor the Performance of Alternative Fueling Infrastructure

Zero Emission Vehicles (ZEV) play an important role in the decarbonizati...

Please sign up or login with your details

Forgot password? Click here to reset