Model-Based Hierarchical Clustering

01/16/2013
by   Shivakumar Vaithyanathan, et al.
0

We present an approach to model-based hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that is a key component of our model. Features can have either a unique distribution in every cluster or a common distribution over some (or even all) of the clusters. The cluster subsets over which these features have such a common distribution correspond to the nodes (clusters) of the tree representing the hierarchy. We apply this general model to the problem of document clustering for which we use a multinomial likelihood function and Dirichlet priors. Our algorithm consists of a two-stage process wherein we first perform a flat clustering followed by a modified hierarchical agglomerative merging process that includes determining the features that will have common distributions over the merged clusters. The regularization induced by using the marginal likelihood automatically determines the optimal model structure including number of clusters, the depth of the tree and the subset of features to be modeled as having a common distribution at each node. We present experimental results on both synthetic data and a real document collection.

READ FULL TEXT

page 1

page 2

page 5

page 7

page 9

research
08/02/2020

Dirichlet-tree multinomial mixtures for clustering microbiome compositions

A common routine in microbiome research is to identify reproducible patt...
research
11/06/2019

A Hybrid Approach To Hierarchical Density-based Cluster Selection

HDBSCAN is a density-based clustering algorithm that constructs a cluste...
research
02/08/2012

Automatic Clustering with Single Optimal Solution

Determining optimal number of clusters in a dataset is a challenging tas...
research
02/27/2023

Detecting Jumps on a Tree: a Hierarchical Pitman-Yor Model for Evolution of Phenotypic Distributions

This work focuses on clustering populations with a hierarchical dependen...
research
06/12/2015

Leading Tree in DPCLUS and Its Impact on Building Hierarchies

This paper reveals the tree structure as an intermediate result of clust...
research
04/28/2021

SMLSOM: The shrinking maximum likelihood self-organizing map

Determining the number of clusters in a dataset is a fundamental issue i...
research
01/20/2016

Hierarchical Latent Word Clustering

This paper presents a new Bayesian non-parametric model by extending the...

Please sign up or login with your details

Forgot password? Click here to reset