Efficient Random Walk based Sampling with Inverse Degree

09/26/2022
by   Xiao Qi, et al.
0

Random walk sampling methods have been widely used in graph sampling in recent years, while it has bias towards higher degree nodes in the sample. To overcome this deficiency, classical methods such as MHRW design weighted walking by repeating low-degree nodes while rejecting high-degree nodes, so that the long-term behavior of Markov chain can achieve uniform distribution. This modification, however, may make the sampler stay in the same node for several times, leading to undersampling. To address this issue, we propose a sampling framework that only need current and candidate node degree to improve the performance of graph sampling methods. We also extend our original idea to a more general framework. Our extended IDRW method finds a balance between the large deviation problem of SRW and sample rejection problem in MHRW. We evaluate our technique in simulation by running extensive experiments on various real-world datasets, and the result show that our method improves the accuracy compared with the state of art techniques. We also investigate the effect of the parameter and give the suggested range for a better usage in application.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2022

Weighted Jump in Random Walk Graph Sampling

Random walk based sampling methods have been widely used in graph sampli...
research
09/27/2022

A Review: Random Walk in Graph Sampling

Graph sampling is a technique to pick a subset of vertices and/ or edges...
research
08/28/2023

Sampling unknown large networks restricted by low sampling rates

Graph sampling plays an important role in data mining for large networks...
research
05/12/2022

Sampling Online Social Networks: Metropolis Hastings Random Walk and Random Walk

As social network analysis (SNA) has drawn much attention in recent year...
research
06/05/2018

Estimating Shortest Path Length Distributions via Random Walk Sampling

In a network, the shortest paths between nodes are of great importance a...
research
07/23/2020

Sampling connected subgraphs: nearly-optimal mixing time bounds, nearly-optimal ε-uniform sampling, and perfect uniform sampling

We study the connected subgraph sampling problem: given an integer k ≥ 3...
research
09/02/2016

SynsetRank: Degree-adjusted Random Walk for Relation Identification

In relation extraction, a key process is to obtain good detectors that f...

Please sign up or login with your details

Forgot password? Click here to reset