PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs

05/07/2019
by   Zhewei Wei, et al.
0

SimRank is a classic measure of the similarities of nodes in a graph. Given a node u in graph G =(V, E), a single-source SimRank query returns the SimRank similarities s(u, v) between node u and each node v ∈ V. This type of queries has numerous applications in web search and social networks analysis, such as link prediction, web mining, and spam detection. Existing methods for single-source SimRank queries, however, incur query cost at least linear to the number of nodes n, which renders them inapplicable for real-time and interactive analysis. This paper proposes , an algorithm that exploits the structure of graphs to efficiently answer single-source SimRank queries. uses an index of size O(m), where m is the number of edges in the graph, and guarantees a query time that depends on the reverse PageRank distribution of the input graph. In particular, we prove that runs in sub-linear time if the degree distribution of the input graph follows the power-law distribution, a property possessed by many real-world graphs. Based on the theoretical analysis, we show that the empirical query time of all existing SimRank algorithms also depends on the reverse PageRank distribution of the graph. Finally, we present the first experimental study that evaluates the absolute errors of various SimRank algorithms on large graphs, and we show that outperforms the state of the art in terms of query time, accuracy, index size, and scalability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2017

ProbeSim: Scalable Single-Source and Top-k SimRank Computations on Dynamic Graphs

Single-source and top-k SimRank queries are two important types of simil...
research
02/19/2020

Realtime Index-Free Single Source SimRank Processing on Web-Scale Graphs

Given a graph G and a node u in G, a single source SimRank query evaluat...
research
06/21/2020

Personalized PageRank to a Target Node, Revisited

Personalized PageRank (PPR) is a widely used node proximity measure in g...
research
04/26/2019

Regular Expression Matching on billion-nodes Graphs

In many applications, it is necessary to retrieve pairs of vertices with...
research
10/24/2017

Provable and practical approximations for the degree distribution using sublinear graph samples

The degree distribution is one of the most fundamental properties used i...
research
12/18/2020

Fast and Efficient Parallel Breadth-First Search with Power-law Graph Transformation

In the big data era, graph computing is widely used to exploit the hidde...
research
12/13/2019

Fast Computation of Katz Index for Efficient Processing of Link Prediction Queries

Network proximity computations are among the most common operations in v...

Please sign up or login with your details

Forgot password? Click here to reset