Hierarchical clustering of bipartite data sets based on the statistical significance of coincidences

04/27/2020
by   Ignacio Tamarit, et al.
0

When a set 'entities' are related by the 'features' they share they are amenable to a bipartite network representation. Plant-pollinator ecological communities, co-authorship of scientific papers, customers and purchases, or answers in a poll, are but a few examples. Analysing clustering of such entities in the network is a useful tool with applications in many fields, like internet technology, recommender systems, or detection of diseases. The algorithms most widely applied to find clusters in bipartite networks are variants of modularity optimisation. Here we provide an hierarchical clustering algorithm based on a dissimilarity between entities that quantifies the probability that the features shared by two entities is due to mere chance. The algorithm performance is O(n^2) when applied to a set of n entities, and its outcome is a dendrogram exhibiting the connections of those entities. Through the introduction of a 'susceptibility' measure we can provide an 'optimal' choice for the clustering as well as quantify its quality. The dendrogram reveals further useful structural information though – like the existence of sub-clusters within clusters. We illustrate the algorithm by applying it first to a set of synthetic networks, and then to a selection of examples. We also illustrate how to transform our algorithm into a valid alternative for uni-modal networks as well, and show that it performs at least as well as the standard, modularity-based algorithms – with a higher numerical performance. We provide an implementation of the algorithm in Python freely accessible from GitHub.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2018

Bipartite graph analysis as an alternative to reveal clusterization in complex systems

We demonstrate how analysis of co-clustering in bipartite networks may b...
research
09/15/2021

Co-Embedding: Discovering Communities on Bipartite Graphs through Projection

Many datasets take the form of a bipartite graph where two types of node...
research
10/28/2010

Random Graph Generator for Bipartite Networks Modeling

The purpose of this article is to introduce a new iterative algorithm wi...
research
09/13/2022

Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm

The time needed to apply a hierarchical clustering algorithm is most oft...
research
08/12/2023

Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More

Algorithms for node clustering typically focus on finding homophilous st...
research
01/27/2021

Motif-based tests for bipartite networks

Bipartite networks are a natural representation of the interactions betw...
research
09/15/2021

α-Indirect Control in Onion-like Networks

Tens of thousands of parent companies control millions of subsidiaries t...

Please sign up or login with your details

Forgot password? Click here to reset