Counting Motifs with Graph Sampling

02/21/2018
by   Jason M. Klusowski, et al.
0

Applied researchers often construct a network from a random sample of nodes in order to infer properties of the parent network. Two of the most widely used sampling schemes are subgraph sampling, where we sample each vertex independently with probability p and observe the subgraph induced by the sampled vertices, and neighborhood sampling, where we additionally observe the edges between the sampled vertices and their neighbors. In this paper, we study the problem of estimating the number of motifs as induced subgraphs under both models from a statistical perspective. We show that: for any connected h on k vertices, to estimate s=s(h,G), the number of copies of h in the parent graph G of maximum degree d, with a multiplicative error of ϵ, (a) For subgraph sampling, the optimal sampling ratio p is Θ_k({ (sϵ^2)^-1/k, d^k-1/sϵ^2}), achieved by Horvitz-Thompson type of estimators. (b) For neighborhood sampling, we propose a family of estimators, encompassing and outperforming the Horvitz-Thompson estimator and achieving the sampling ratio O_k({ (d/sϵ^2)^1/k-1, √(d^k-2/sϵ^2)}). This is shown to be optimal for all motifs with at most 4 vertices and cliques of all sizes. The matching minimax lower bounds are established using certain algebraic properties of subgraph counts. These results quantify how much more informative neighborhood sampling is than subgraph sampling, as empirically verified by experiments on both synthetic and real-world data. We also address the issue of adaptation to the unknown maximum degree, and study specific problems for parent graphs with additional structures, e.g., trees or planar graphs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/12/2018

Estimating the Number of Connected Components in a Graph via Subgraph Sampling

Learning properties of large graphs from samples has been an important p...
research
11/05/2020

Motif Estimation via Subgraph Sampling: The Fourth Moment Phenomenon

Network sampling is an indispensable tool for understanding features of ...
research
12/01/2018

Number of Connected Components in a Graph: Estimation via Counting Patterns

Due to the limited resources and the scale of the graphs in modern datas...
research
10/26/2016

Estimating the Size of a Large Network and its Communities from a Random Sample

Most real-world networks are too large to be measured or studied directl...
research
02/16/2021

Empirical Characterization of Graph Sampling Algorithms

Graph sampling allows mining a small representative subgraph from a big ...
research
10/18/2019

Weighted Edge Sampling for Static Graphs

Graph Sampling provides an efficient yet inexpensive solution for analyz...
research
02/23/2018

Estimating Graphlet Statistics via Lifting

Exploratory analysis over network data is often limited by our ability t...

Please sign up or login with your details

Forgot password? Click here to reset