Two-Dimensional Block Trees

03/04/2018 ∙ by Nieves R. Brisaboa, et al. ∙ Universidad de Chile Universidade da Coruña 0

The Block Tree (BT) is a novel compact data structure designed to compress sequence collections. It obtains compression ratios close to Lempel-Ziv and supports efficient direct access to any substring. The BT divides the text recursively into fixed-size blocks and those appearing earlier are represented with pointers. On repetitive collections, a few blocks can represent all the others, and thus the BT reduces the size by orders of magnitude. In this paper we extend the BT to two dimensions, to exploit repetitiveness in collections of images, graphs, and maps. This two-dimensional Block Tree divides the image regularly into subimages and replaces some of them by pointers to other occurrences thereof. We develop a specific variant aimed at compressing the adjacency matrices of Web graphs, obtaining space reductions of up to 50% compared with the k^2-tree, which is the best alternative supporting direct and reverse navigation in the graph.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


  • [1] A. Lempel and J. Ziv, “Compression of two-dimensional data,” IEEE Transactions on Information Theory, vol. 32, no. 1, pp. 2–8, 1986.
  • [2] R. Pajarola and P. Widmayer, “Spatial indexing into compressed raster images: how to answer range queries without decompression,” in Proc. International Workshop on Multimedia Database Management Systems, 1996, pp. 94–100.
  • [3] E. Ageenko and P. Fränti, “Lossless compression of large binary images in digital spatial libraries,” Computers & Graphics, vol. 24, no. 1, pp. 91–98, 2000.
  • [4] N. R. Brisaboa, S. Ladra, and G. Navarro, “Compact representation of Web graphs with extended functionality,” Information Systems, vol. 39, no. 1, pp. 152–174, 2014.
  • [5] C. Hernández and G. Navarro, “Compressed representations for Web and social graphs,” Knowledge and Information Systems, vol. 40, no. 2, pp. 279–313, 2014.
  • [6] S. Grabowski and W. Bieniecki, “Merging adjacency lists for efficient Web graph compression,” in Man-Machine Interactions 2.   Springer, 2011, pp. 385–392.
  • [7] P. Boldi and S. Vigna, “The WebGraph framework I: Compression techniques,” in Proc. 13th International Conference on World Wide Web (WWW), 2004, pp. 595–602.
  • [8] P. Bille, I. L. Gørtz, and S. Vind, “Compressed data structures for range searching,” in Proc. 42nd International Conference on Language and Automata Theory and Applications (ICALP).   Springer, 2015, pp. 577–586.
  • [9] D. Belazzougui, T. Gagie, P. Gawrychowski, J. Kärkkäinen, A. Ordónez, S. J. Puglisi, and Y. Tabei, “Queries on LZ-bounded encodings,” in Proc. Data Compression Conference (DCC), 2015, pp. 83–92.
  • [10] J. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Transactions on Information Theory, vol. 23, no. 3, pp. 337–343, 1977.
  • [11] D. Clark, “Compact, PAT trees,” Ph.D. dissertation, University of Waterloo, Canada, 1996.
  • [12]

    R. M. Karp and M. O. Rabin, “Efficient randomized pattern-matching algorithms,”

    IBM Journal of Research and Development, vol. 31, no. 2, pp. 249–260, 1987.
  • [13] R. S. Bird, “Two dimensional pattern matching,” Information Processing Letters, vol. 6, no. 5, pp. 168–170, 1977.
  • [14] T. P. Baker, “A technique for extending rapid exact-match string matching to arrays of more than one dimension,” SIAM Journal on Computing, vol. 7, no. 4, pp. 533–541, 1978.
  • [15] S. Gog, T. Beller, A. Moffat, and M. Petri, “From theory to practice: Plug and play with succinct data structures,” in Proc. 13th International Symposium on Experimental Algorithms (SEA), 2014, pp. 326–337.