Universal Lossless Compression of Graphical Data
Graphical data is comprised of a graph with marks on its edges and vertices. The mark indicates the value of some attribute associated to the respective edge or vertex. Examples of such data arise in social networks, molecular and systems biology, and web graphs, as well as in several other application areas. Our goal is to design schemes that can efficiently compress such graphical data without making assumptions about its stochastic properties. Namely, we wish to develop a universal compression algorithm for graphical data sources. To formalize this goal, we employ the framework of local weak convergence, also called the objective method, which provides a technique to think of a marked graph as a kind of stationary stochastic processes, stationary with respect to movement between vertices of the graph. In recent work, we have generalized a notion of entropy for unmarked graphs in this framework, due to Bordenave and Caputo, to the case of marked graphs. We use this notion to evaluate the efficiency of a compression scheme. The lossless compression scheme we propose in this paper is then proved to be universally optimal in a precise technical sense. It is also capable of performing local data queries in the compressed form.
READ FULL TEXT