Estimating Shortest Path Length Distributions via Random Walk Sampling

06/05/2018
by   Minhui Zheng, et al.
0

In a network, the shortest paths between nodes are of great importance as they allow the fastest and strongest interaction between nodes. However measuring the shortest paths between all nodes in a large network is computationally expensive. In this paper we propose a method to estimate the shortest path length (SPL) distribution of a network by random walk sampling. To deal with the unequal inclusion probabilities of dyads (pairs of nodes) in the sample, we generalize the usage of Hansen-Hurwitz estimator and Horvitz-Thompson estimator (and their ratio forms) and apply them to the sampled dyads. Based on theory of Markov chains we prove that the selection probability of a dyad is proportional to the product of the degrees of the two nodes. To approximate the actual SPL for a dyad, we use the observed SPL in the induced subgraph for networks with large degree variability, i.e., the standard deviation is at least two times of the mean, and for networks with small degree variability, estimate the SPL using landmarks for networks with small degree variability. By simulation studies and applications to real networks, we find that 1) for large networks, high estimation accuracy can be achieved by using a single random or multiple random walks with total number of steps equal to at least 20 as the network size increases but tends to stabilize when the network is large enough; 3) a single random walk performs as well as multiple random walks; 4) the Horvitz-Thompson ratio estimator performs best among the four estimators.

READ FULL TEXT

page 34

page 36

page 40

research
10/21/2021

Degree-Based Random Walk Approach for Graph Embedding

Graph embedding, representing local and global neighborhood information ...
research
05/12/2022

Sampling Online Social Networks: Metropolis Hastings Random Walk and Random Walk

As social network analysis (SNA) has drawn much attention in recent year...
research
10/06/2022

Beyond the shortest path: the path length index as a distribution

The traditional complex network approach considers only the shortest pat...
research
09/26/2022

Efficient Random Walk based Sampling with Inverse Degree

Random walk sampling methods have been widely used in graph sampling in ...
research
01/21/2021

Synwalk – Community Detection via Random Walk Modelling

Complex systems, abstractly represented as networks, are ubiquitous in e...
research
06/14/2020

Estimation of dense stochastic block models visited by random walks

We are interested in recovering information on a stochastic block model ...
research
05/09/2016

On the Emergence of Shortest Paths by Reinforced Random Walks

The co-evolution between network structure and functional performance is...

Please sign up or login with your details

Forgot password? Click here to reset