Split block Bloom filters

01/04/2021 ∙ by Jim Apple, et al. ∙ The Apache Software Foundation 0

This short note describes a Bloom filter variant that takes advantage of modern SIMD instructions to increase speed by 30 block Bloom filter, is used by Apache Impala, Apache Kudu, Apache Parquet, and Apache Arrow.



There are no comments yet.


page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Thank you to Daniel Lemire for helpful discussions and inspiring questions.


  • [BFJ+12] M. A. Bender, M. Farach-Colton, R. Johnson, R. Kraner, B. C. Kuszmaul, D. Medjedovic, P. Montes, P. Shetty, R. P. Spillane, and E. Zadok (2012) Don’t thrash: how to cache your hash on flash. Proceedings of the VLDB Endowment 5 (11). Cited by: Split block Bloom filters.
  • [BM04] A. Broder and M. Mitzenmacher (2004) Network applications of Bloom filters: a survey. Internet mathematics 1 (4), pp. 485–509. Cited by: item 2, Split block Bloom filters.
  • [DHK+97] M. Dietzfelbinger, T. Hagerup, J. Katajainen, and M. Penttonen (1997) A reliable randomized algorithm for the closest-pair problem. Journal of Algorithms 25 (1), pp. 19–51. Cited by: Split block Bloom filters.
  • [FAK+14] B. Fan, D. G. Andersen, M. Kaminsky, and M. D. Mitzenmacher (2014) Cuckoo filter: practically better than Bloom. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, pp. 75–88. Cited by: Split block Bloom filters.
  • [GL20] T. M. Graf and D. Lemire (2020-03) Xor filters: faster and smaller than Bloom and cuckoo filters. ACM J. Exp. Algorithmics 25. External Links: ISSN 1084-6654, Link, Document Cited by: footnote 5.
  • [PSS10] F. Putze, P. Sanders, and J. Singler (2010) Cache-, hash-, and space-efficient Bloom filters. Journal of Experimental Algorithmics (JEA) 14, pp. 4–4. Cited by: item 1, Split block Bloom filters.