Distributed Algorithms for Fully Personalized PageRank on Large Graphs

03/28/2019
by   Wenqing Lin, et al.
0

Personalized PageRank (PPR) has enormous applications, such as link prediction and recommendation systems for social networks, which often require the fully PPR to be known. Besides, most of real-life graphs are edge-weighted, e.g., the interaction between users on the Facebook network. However, it is computationally difficult to compute the fully PPR, especially on large graphs, not to mention that most existing approaches do not consider the weights of edges. In particular, the existing approach cannot handle graphs with billion edges on a moderate-size cluster. To address this problem, this paper presents a novel study on the computation of fully edge-weighted PPR on large graphs using the distributed computing framework. Specifically, we employ the Monte Carlo approximation that performs a large number of random walks from each node of the graph, and exploits the parallel pipeline framework to reduce the overall running time of the fully PPR. Based on that, we develop several optimization techniques which (i) alleviate the issue of large nodes that could explode the memory space, (ii) pre-compute short walks for small nodes that largely speedup the computation of random walks, and (iii) optimize the amount of random walks to compute in each pipeline that significantly reduces the overhead. With extensive experiments on a variety of real-life graph datasets, we demonstrate that our solution is several orders of magnitude faster than the state-of-the-arts, and meanwhile, largely outperforms the baseline algorithms in terms of accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2018

Efficient Graph Computation for Node2Vec

Node2Vec is a state-of-the-art general-purpose feature learning method f...
research
05/25/2023

Efficient Approximation Algorithms for Spanning Centrality

Given a graph 𝒢, the spanning centrality (SC) of an edge e measures the ...
research
06/07/2020

Distributed-Memory Vertex-Centric Network Embedding for Large-Scale Graphs

Network embedding is an important step in many different computations ba...
research
11/17/2010

Supervised Random Walks: Predicting and Recommending Links in Social Networks

Predicting the occurrence of links is a fundamental problem in networks....
research
04/03/2019

Efficient Estimation of Heat Kernel PageRank for Local Clustering

Given an undirected graph G and a seed node s, the local clustering prob...
research
06/20/2021

Large-Scale Network Embedding in Apache Spark

Network embedding has been widely used in social recommendation and netw...
research
09/07/2018

RetGK: Graph Kernels based on Return Probabilities of Random Walks

Graph-structured data arise in wide applications, such as computer visio...

Please sign up or login with your details

Forgot password? Click here to reset