Entropy bounds for grammar compression

04/23/2018

∙

In grammar compression we represent a string as a context free grammar. This model is popular both in theoretical and practical applications due to its simplicity, good compression rate and suitability for processing of the compressed representations. In practice, achieving compression requires encoding such grammar as a binary string, there are a few commonly used. We bound the size of such encodings for several compression methods, along with well-known algorithm. For we prove that its standard encoding, which is a combination of entropy coding and special encoding of a grammar, achieves 1.5|S|H_k(S). We also show that by stopping after some iteration we can achieve |S|H_k(S). The latter is particularly important, as it explains the phenomenon observed in practice, that introducing too many nonterminals causes the bit-size to grow. We generalize our approach to other compressions methods like or wide class of irreducible grammars, and other bit encodings (including naive, which uses fixed-length codes). Our approach not only proves the bounds but also partially explains why and are much better in practice than the other grammar based methods. At last, we show that for a wide family of dictionary compression methods (including grammar compressors) Ω(nk σ/_σ n) bits of redundancy are required. This shows a separation between context-based/BWT methods and dictionary compression algorithms, as for the former there exists methods where redundancy does not depend on n, but only on k and σ.

READ FULL TEXT

Entropy bounds for grammar compression

Sign in with Google

Consider DeepAI Pro