Efficient Estimation of Heat Kernel PageRank for Local Clustering

04/03/2019
by   Renchi Yang, et al.
0

Given an undirected graph G and a seed node s, the local clustering problem aims to identify a high-quality cluster containing s in time roughly proportional to the size of the cluster, regardless of the size of G. This problem finds numerous applications on large-scale graphs. Recently, heat kernel PageRank (HKPR), which is a measure of the proximity of nodes in graphs, is applied to this problem and found to be more efficient compared with prior methods. However, existing solutions for computing HKPR either are prohibitively expensive or provide unsatisfactory error approximation on HKPR values, rendering them impractical especially on billion-edge graphs. In this paper, we present TEA and TEA+, two novel local graph clustering algorithms based on HKPR, to address the aforementioned limitations. Specifically, these algorithms provide non-trivial theoretical guarantees in relative error of HKPR values and the time complexity. The basic idea is to utilize deterministic graph traversal to produce a rough estimation of exact HKPR vector, and then exploit Monte-Carlo random walks to refine the results in an optimized and non-trivial way. In particular, TEA+ offers practical efficiency and effectiveness due to non-trivial optimizations. Extensive experiments on real-world datasets demonstrate that TEA+ outperforms the state-of-the-art algorithm by more than four times on most benchmark datasets in terms of computational time when achieving the same clustering quality, and in particular, is an order of magnitude faster on large graphs including the widely studied Twitter and Friendster datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2017

ProbeSim: Scalable Single-Source and Top-k SimRank Computations on Dynamic Graphs

Single-source and top-k SimRank queries are two important types of simil...
research
07/24/2023

Estimating Single-Node PageRank in Õ(min{d_t, √(m)}) Time

PageRank is a famous measure of graph centrality that has numerous appli...
research
05/25/2023

Efficient Approximation Algorithms for Spanning Centrality

Given a graph 𝒢, the spanning centrality (SC) of an edge e measures the ...
research
03/28/2019

Distributed Algorithms for Fully Personalized PageRank on Large Graphs

Personalized PageRank (PPR) has enormous applications, such as link pred...
research
02/14/2020

Scalable Dyadic Independence Models with Local and Global Constraints

An important challenge in the field of exponential random graphs (ERGs) ...
research
06/23/2022

Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth

Obtaining scalable algorithms for hierarchical agglomerative clustering ...
research
08/28/2019

Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries

Given a graph G, a source node s and a target node t, the personalized P...

Please sign up or login with your details

Forgot password? Click here to reset