How to Store a Random Walk
Motivated by storage applications, we study the following data structure problem: An encoder wishes to store a collection of jointly-distributed files X:=(X_1,X_2,..., X_n) ∼μ which are correlated (H_μ(X) ≪∑_i H_μ(X_i)), using as little (expected) memory as possible, such that each individual file X_i can be recovered quickly with few (ideally constant) memory accesses. In the case of independent random files, a dramatic result by (FOCS'08) and subsequently by Dodis, and Thorup (STOC'10) shows that it is possible to store X using just a constant number of extra bits beyond the information-theoretic minimum space, while at the same time decoding each X_i in constant time. However, in the (realistic) case where the files are correlated, much weaker results are known, requiring at least Ω(n/poly n) extra bits for constant decoding time, even for "simple" joint distributions μ. We focus on the natural case of compressingMarkov chains, i.e., storing a length-n random walk on any (possibly directed) graph G. Denoting by κ(G,n) the number of length-n walks on G, we show that there is a succinct data structure storing a random walk using _2 κ(G,n) + O( n) bits of space, such that any vertex along the walk can be decoded in O(1) time on a word-RAM. For the harder task of matching the point-wise optimal space of the walk, i.e., the empirical entropy ∑_i=1^n-1 (deg(v_i)), we present a data structure with O(1) extra bits at the price of O( n) decoding time, and show that any improvement on this would lead to an improved solution on the long-standing Dictionary problem. All of our data structures support the online version of the problem with constant update and query time.
READ FULL TEXT