Motif Estimation via Subgraph Sampling: The Fourth Moment Phenomenon

11/05/2020
by   Bhaswar B. Bhattacharya, et al.
0

Network sampling is an indispensable tool for understanding features of large complex networks where it is practically impossible to search over the entire graph. In this paper, we develop a framework for statistical inference for counting network motifs, such as edges, triangles, and wedges, in the widely used subgraph sampling model, where each vertex is sampled independently, and the subgraph induced by the sampled vertices is observed. We derive necessary and sufficient conditions for the consistency and the asymptotic normality of the natural Horvitz-Thompson (HT) estimator, which can be used for constructing confidence intervals and hypothesis testing for the motif counts based on the sampled graph. In particular, we show that the asymptotic normality of the HT estimator exhibits an interesting fourth-moment phenomenon, which asserts that the HT estimator (appropriately centered and rescaled) converges in distribution to the standard normal whenever its fourth-moment converges to 3 (the fourth-moment of the standard normal distribution). As a consequence, we derive the exact thresholds for consistency and asymptotic normality of the HT estimator in various natural graph ensembles, such as sparse graphs with bounded degree, Erdos-Renyi random graphs, random regular graphs, and dense graphons.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2018

Counting Motifs with Graph Sampling

Applied researchers often construct a network from a random sample of no...
research
02/24/2020

Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms

The statistical analysis of Randomized Numerical Linear Algebra (RandNLA...
research
02/10/2023

Matching Correlated Inhomogeneous Random Graphs using the k-core Estimator

We consider the task of estimating the latent vertex correspondence betw...
research
08/07/2023

Testing Graph Properties with the Container Method

We establish nearly optimal sample complexity bounds for testing the ρ-c...
research
06/07/2018

Undirected network models with degree heterogeneity and homophily

The degree heterogeneity and homophily are two typical features in netwo...
research
02/21/2022

Quantifying Uncertainty for Temporal Motif Estimation in Graph Streams under Sampling

Dynamic networks, a.k.a. graph streams, consist of a set of vertices and...
research
07/23/2021

A principled (and practical) test for network comparison

How might one test the hypothesis that graphs were sampled from the same...

Please sign up or login with your details

Forgot password? Click here to reset