Hierarchical Clustering for Euclidean Data

12/27/2018
by   Moses Charikar, et al.
0

Recent works on Hierarchical Clustering (HC), a well-studied problem in exploratory data analysis, have focused on optimizing various objective functions for this problem under arbitrary similarity measures. In this paper we take the first step and give novel scalable algorithms for this problem tailored to Euclidean data in R^d and under vector-based similarity measures, a prevalent model in several typical machine learning applications. We focus primarily on the popular Gaussian kernel and other related measures, presenting our results through the lens of the objective introduced recently by Moseley and Wang [2017]. We show that the approximation factor in Moseley and Wang [2017] can be improved for Euclidean data. We further demonstrate both theoretically and experimentally that our algorithms scale to very high dimension d, while outperforming average-linkage and showing competitive results against other less scalable approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2018

Hierarchical Clustering better than Average-Linkage

Hierarchical Clustering (HC) is a widely studied problem in exploratory ...
research
07/28/2022

Expanding the class of global objective functions for dissimilarity-based hierarchical clustering

Recent work on dissimilarity-based hierarchical clustering has led to th...
research
05/24/2018

Hierarchical Clustering with Structural Constraints

Hierarchical clustering is a popular unsupervised data analysis method. ...
research
06/18/2020

Fair Hierarchical Clustering

As machine learning has become more prevalent, researchers have begun to...
research
12/15/2019

Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection

Hierarchical Clustering is an unsupervised data analysis method which ha...
research
04/11/2022

Improved Approximations for Euclidean k-means and k-median, via Nested Quasi-Independent Sets

Motivated by data analysis and machine learning applications, we conside...
research
02/19/2014

Analysis of Multibeam SONAR Data using Dissimilarity Representations

This paper considers the problem of low-dimensional visualisation of ver...

Please sign up or login with your details

Forgot password? Click here to reset