Multiple Hypothesis Testing To Estimate The Number of Communities in Sparse Stochastic Block Models

01/12/2022
by   Chetkar Jha, et al.
0

Network-based clustering methods frequently require the number of communities to be specified a priori. Moreover, most of the existing methods for estimating the number of communities assume the number of communities to be fixed and not scale with the network size n. The few methods that assume the number of communities to increase with the network size n are only valid when the average degree d of a network grows at least as fast as O(n) (i.e., the dense case) or lies within a narrow range. This presents a challenge in clustering large-scale network data, particularly when the average degree d of a network grows slower than the rate of O(n) (i.e., the sparse case). To address this problem, we proposed a new sequential procedure utilizing multiple hypothesis tests and the spectral properties of Erdös Rényi graphs for estimating the number of communities in sparse stochastic block models (SBMs). We prove the consistency of our method for sparse SBMs for a broad range of the sparsity parameter. As a consequence, we discover that our method can estimate the number of communities K^(n)_⋆ with K^(n)_⋆ increasing at the rate as high as O(n^(1 - 3γ)/(4 - 3γ)), where d = O(n^1 - γ). Moreover, we show that our method can be adapted as a stopping rule in estimating the number of communities in binary tree stochastic block models. We benchmark the performance of our method against other competing methods on six reference single-cell RNA sequencing datasets. Finally, we demonstrate the usefulness of our method through numerical simulations and by using it for clustering real single-cell RNA-sequencing datasets.

READ FULL TEXT
research
07/03/2015

Estimating the number of communities in networks by spectral methods

Community detection is a fundamental problem in network analysis with ma...
research
04/14/2023

Subsampling-Based Modified Bayesian Information Criterion for Large-Scale Stochastic Block Models

Identifying the number of communities is a fundamental problem in commun...
research
09/15/2022

Selecting a significance level in sequential testing procedures for community detection

While there have been numerous sequential algorithms developed to estima...
research
04/10/2018

Strong consistency of Krichevsky-Trofimov estimator for the number of communities in the Stochastic Block Model

In this paper we introduce the Krichevsky-Trofimov estimator for the num...
research
04/30/2020

Consistency of Spectral Clustering on Hierarchical Stochastic Block Models

We propose a generic network model, based on the Stochastic Block Model,...
research
10/26/2016

Estimating the Size of a Large Network and its Communities from a Random Sample

Most real-world networks are too large to be measured or studied directl...
research
12/30/2020

Adjusted chi-square test for degree-corrected block models

We propose a goodness-of-fit test for degree-corrected stochastic block ...

Please sign up or login with your details

Forgot password? Click here to reset