DeepAI AI Chat
Log In Sign Up

Selecting a significance level in sequential testing procedures for community detection

by   Riddhi Pratim Ghosh, et al.

While there have been numerous sequential algorithms developed to estimate community structure in networks, there is little available guidance and study of what significance level or stopping parameter to use in these sequential testing procedures. Most algorithms rely on prespecifiying the number of communities or use an arbitrary stopping rule. We provide a principled approach to selecting a nominal significance level for sequential community detection procedures by controlling the tolerance ratio, defined as the ratio of underfitting and overfitting probability of estimating the number of clusters in fitting a network. We introduce an algorithm for specifying this significance level from a user-specified tolerance ratio, and demonstrate its utility with a sequential modularity maximization approach in a stochastic block model framework. We evaluate the performance of the proposed algorithm through extensive simulations and demonstrate its utility in controlling the tolerance ratio in single-cell RNA sequencing clustering by cell type and by clustering a congressional voting network.


page 1

page 2

page 3

page 4


Hierarchical community detection by recursive bi-partitioning

The problem of community detection in networks is usually formulated as ...

Multiple Hypothesis Testing To Estimate The Number of Communities in Sparse Stochastic Block Models

Network-based clustering methods frequently require the number of commun...

Selecting a suitable Parallel Label-propagation based algorithm for Disjoint Community Detection

Community detection is an essential task in network analysis as it helps...

Enhancing Efficiency in Parallel Louvain Algorithm for Community Detection

Community detection is a key aspect of network analysis, as it allows fo...

Evaluating Overfit and Underfit in Models of Network Community Structure

A common data mining task on networks is community detection, which seek...

Provable Estimation of the Number of Blocks in Block Models

Community detection is a fundamental unsupervised learning problem for u...

A Generalized Estimating Equation Approach to Network Regression

Regression models applied to network data where node attributes are the ...