Hierarchical Clustering: New Bounds and Objective

11/12/2021
by   Mirmahdi Rahgoshay, et al.
0

Hierarchical Clustering has been studied and used extensively as a method for analysis of data. More recently, Dasgupta [2016] defined a precise objective function. Given a set of n data points with a weight function w_i,j for each two items i and j denoting their similarity/dis-similarity, the goal is to build a recursive (tree like) partitioning of the data points (items) into successively smaller clusters. He defined a cost function for a tree T to be Cost(T) = ∑_i,j ∈ [n](w_i,j× |T_i,j| ) where T_i,j is the subtree rooted at the least common ancestor of i and j and presented the first approximation algorithm for such clustering. Then Moseley and Wang [2017] considered the dual of Dasgupta's objective function for similarity-based weights and showed that both random partitioning and average linkage have approximation ratio 1/3 which has been improved in a series of works to 0.585 [Alon et al. 2020]. Later Cohen-Addad et al. [2019] considered the same objective function as Dasgupta's but for dissimilarity-based metrics, called Rev(T). It is shown that both random partitioning and average linkage have ratio 2/3 which has been only slightly improved to 0.667078 [Charikar et al. SODA2020]. Our first main result is to consider Rev(T) and present a more delicate algorithm and careful analysis that achieves approximation 0.71604. We also introduce a new objective function for dissimilarity-based clustering. For any tree T, let H_i,j be the number of i and j's common ancestors. Intuitively, items that are similar are expected to remain within the same cluster as deep as possible. So, for dissimilarity-based metrics, we suggest the cost of each tree T, which we want to minimize, to be Cost_H(T) = ∑_i,j ∈ [n](w_i,j× H_i,j). We present a 1.3977-approximation for this objective.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2020

Hierarchical Clustering: a 0.585 Revenue Approximation

Hierarchical Clustering trees have been widely accepted as a useful form...
research
01/26/2021

Hierarchical Clustering via Sketches and Hierarchical Correlation Clustering

Recently, Hierarchical Clustering (HC) has been considered through the l...
research
08/13/2021

An Information-theoretic Perspective of Hierarchical Clustering

A combinatorial cost function for hierarchical clustering was introduced...
research
12/15/2020

Objective-Based Hierarchical Clustering of Deep Embedding Vectors

We initiate a comprehensive experimental study of objective-based hierar...
research
09/09/2021

An objective function for order preserving hierarchical clustering

We present an objective function for similarity based hierarchical clust...
research
06/09/2015

Clustering by transitive propagation

We present a global optimization algorithm for clustering data given the...
research
02/27/2019

Reconciliation k-median: Clustering with Non-Polarized Representatives

We propose a new variant of the k-median problem, where the objective fu...

Please sign up or login with your details

Forgot password? Click here to reset