Scalable Approximation Algorithm for Graph Summarization
Massive sizes of real-world graphs, such as social networks and web graph, impose serious challenges to process and perform analytics on them. These issues can be resolved by working on a small summary of the graph instead . A summary is a compressed version of the graph that removes several details, yet preserves it's essential structure. Generally, some predefined quality measure of the summary is optimized to bound the approximation error incurred by working on the summary instead of the whole graph. All known summarization algorithms are computationally prohibitive and do not scale to large graphs. In this paper we present an efficient randomized algorithm to compute graph summaries with the goal to minimize reconstruction error. We propose a novel weighted sampling scheme to sample vertices for merging that will result in the least reconstruction error. We provide analytical bounds on the running time of the algorithm and prove approximation guarantee for our score computation. Efficiency of our algorithm makes it scalable to very large graphs on which known algorithms cannot be applied. We test our algorithm on several real world graphs to empirically demonstrate the quality of summaries produced and compare to state of the art algorithms. We use the summaries to answer several structural queries about original graph and report their accuracies.
READ FULL TEXT