Targeted sampling from massive Blockmodel graphs with personalized PageRank

10/04/2019
by   Fan Chen, et al.
0

This paper provides statistical theory and intuition for Personalized PageRank (PPR), a popular technique that samples a small community from a massive network. We study a setting where the entire network is expensive to thoroughly obtain or maintain, but we can start from a seed node of interest and "crawl" the network to find other nodes through their connections. By crawling the graph in a designed way, the PPR vector can be approximated without querying the entire massive graph, making it an alternative to snowball sampling. Using the Degree-Corrected Stochastic Blockmodel, we study whether the PPR vector can select nodes that belong to the same block as the seed node. We provide a simple and interpretable form for the PPR vector, highlighting its biases towards high degree nodes outside of the target block. We examine a simple adjustment based on node degrees and establish consistency results for PPR clustering that allows for directed graphs. We illustrate the method with the Twitter friendship graph and find that (i) the adjusted and unadjusted PPR techniques are complementary approaches, where the adjustment makes the results particularly localized around the seed node and (ii) the bias adjustment greatly benefits from degree regularization.

READ FULL TEXT
research
10/14/2021

Residual2Vec: Debiasing graph embedding with random graphs

Graph embedding maps a graph into a convenient vector-space representati...
research
06/27/2023

Network-Adjusted Covariates for Community Detection

Community detection is a crucial task in network analysis that can be si...
research
07/15/2019

Seedless Graph Matching via Tail of Degree Distribution for Correlated Erdos-Renyi Graphs

The graph matching problem refers to recovering the node-to-node corresp...
research
04/10/2012

Co-clustering for directed graphs: the Stochastic co-Blockmodel and spectral algorithm Di-Sim

Directed graphs have asymmetric connections, yet the current graph clust...
research
06/11/2019

Statistical guarantees for local graph clustering

Local graph clustering methods aim to find small clusters in very large ...
research
04/07/2021

Random graphs with node and block effects: models, goodness-of-fit tests, and applications to biological networks

Many popular models from the networks literature can be viewed through a...
research
10/17/2018

Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data

Searching for high-dimensional vector data with high accuracy is an inev...

Please sign up or login with your details

Forgot password? Click here to reset