Universal Graph Compression: Stochastic Block Models

by   Alankrita Bhatt, et al.

Motivated by the prevalent data science applications of processing and mining large-scale graph data such as social networks, web graphs, and biological networks, as well as the high I/O and communication costs of storing and transmitting such data, this paper investigates lossless compression of data appearing in the form of a labeled graph. A universal graph compression scheme is proposed, which does not depend on the underlying statistics/distribution of the graph model. For graphs generated by a stochastic block model, which is a widely used random graph model capturing the clustering effects in social networks, the proposed scheme achieves the optimal theoretical limit of lossless compression without the need to know edge probabilities, community labels, or the number of communities. The key ideas in establishing universality for stochastic block models include: 1) block decomposition of the adjacency matrix of the graph; 2) generalization of the Krichevsky-Trofimov probability assignment, which was initially designed for i.i.d. random processes. In four benchmark graph datasets (protein-to-protein interaction, LiveJournal friendship, Flickr, and YouTube), the compressed files from competing algorithms (including CSR, Ligra+, PNG image compressor, and Lempel-Ziv compressor for two-dimensional data) take 2.4 to 27 times the space needed by the proposed scheme.


page 1

page 2

page 3

page 4


Universal Lossless Compression of Graphical Data

Graphical data is comprised of a graph with marks on its edges and verti...

Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations

Various graphs such as web or social networks may contain up to trillion...

Minimum entropy stochastic block models neglect edge distribution heterogeneity

The statistical inference of stochastic block models as emerged as a mat...

Random graphs with node and block effects: models, goodness-of-fit tests, and applications to biological networks

Many popular models from the networks literature can be viewed through a...

A Novel Scheme to Improve Lossless Image Coders by Explicit Description of Generative Model Classes

In this study, we propose a novel scheme for systematic improvement of l...

Random Geometric Graph: Some recent developments and perspectives

The Random Geometric Graph (RGG) is a random graph model for network dat...

A Universal Low Complexity Compression Algorithm for Sparse Marked Graphs

Many modern applications involve accessing and processing graphical data...

Please sign up or login with your details

Forgot password? Click here to reset