Bayesian contiguity constrained clustering, spanning trees and dendrograms

02/24/2023
by   Etienne Côme, et al.
0

Clustering is a well-known and studied problem, one of its variants, called contiguity-constrained clustering, accepts as a second input a graph used to encode prior information about cluster structure by means of contiguity constraints i.e. clusters must form connected subgraphs of this graph. This paper discusses the interest of such a setting and proposes a new way to formalise it in a Bayesian setting, using results on spanning trees to compute exactly a posteriori probabilities of candidate partitions. An algorithmic solution is then investigated to find a maximum a posteriori (MAP) partition and extract a Bayesian dendrogram from it. The interest of this last tool, which is reminiscent of the classical output of a simple hierarchical clustering algorithm, is analysed. Finally, the proposed approach is demonstrated with real applications. A reference implementation of this work is available in the R package gtclust that accompanies the paper (available at http://github.com/comeetie/gtclust)

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset