b-Bit Sketch Trie: Scalable Similarity Search on Integer Sketches

10/18/2019
by   Shunsuke Kanda, et al.
0

Recently, randomly mapping vectorial data to strings of discrete symbols (i.e., sketches) for fast and space-efficient similarity searches has become popular. Such random mapping is called similarity-preserving hashing and approximates a similarity metric by using the Hamming distance. Although many efficient similarity searches have been proposed, most of them are designed for binary sketches. Similarity searches on integer sketches are in their infancy. In this paper, we present a novel space-efficient trie named b-bit sketch trie on integer sketches for scalable similarity searches by leveraging the idea behind succinct data structures (i.e., space-efficient data structures while supporting various data operations in the compressed format) and a favorable property of integer sketches as fixed-length strings. Our experimental results obtained using real-world datasets show that a trie-based index is built from integer sketches and efficiently performs similarity searches on the index by pruning useless portions of the search space, which greatly improves the search time and space-efficiency of the similarity search. The experimental results show that our similarity search is at most one order of magnitude faster than state-of-the-art similarity searches. Besides, our method needs only 10 GiB of memory on a billion-scale database, while state-of-the-art similarity searches need 29 GiB of memory.

READ FULL TEXT
research
09/24/2020

Dynamic Similarity Search on Integer Sketches

Similarity-preserving hashing is a core technique for fast similarity se...
research
05/21/2020

Succinct Trit-array Trie for Scalable Trajectory Similarity Search

Massive datasets of spatial trajectories representing the mobility of a ...
research
04/12/2018

Fast Prefix Search in Little Space, with Applications

It has been shown in the indexing literature that there is an essential ...
research
03/25/2019

Algorithms to compute the Burrows-Wheeler Similarity Distribution

The Burrows-Wheeler transform (BWT) is a well studied text transformatio...
research
10/18/2019

The Bitwise Hashing Trick for Personalized Search

Many real world problems require fast and efficient lexical comparison o...
research
11/30/2016

Fast Supervised Discrete Hashing and its Analysis

In this paper, we propose a learning-based supervised discrete hashing m...
research
12/18/2022

AutoSlicer: Scalable Automated Data Slicing for ML Model Analysis

Automated slicing aims to identify subsets of evaluation data where a tr...

Please sign up or login with your details

Forgot password? Click here to reset