Lightweight Fingerprints for Fast Approximate Keyword Matching Using Bitwise Operations

11/22/2017
by   Aleksander Cisłak, et al.
0

We aim to speed up approximate keyword matching by storing a lightweight, fixed-size block of data for each string, called a fingerprint. These work in a similar way to hash values; however, they can be also used for matching with errors. They store information regarding symbol occurrences using individual bits, and they can be compared against each other with a constant number of bitwise operations. In this way, certain strings can be deduced to be at least within the distance k from each other (using Hamming or Levenshtein distance) without performing an explicit verification. We show experimentally that for a preprocessed collection of strings, fingerprints can provide substantial speedups for k = 1, namely over 2.5 times for the Hamming distance and over 10 times for the Levenshtein distance. Tests were conducted on synthetic and real-world English and URL data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2021

Levenshtein Graphs: Resolvability, Automorphisms Determining Sets

We introduce the notion of Levenshtein graphs, an analog to Hamming grap...
research
06/30/2021

String Comparison on a Quantum Computer Using Hamming Distance

The Hamming distance is ubiquitous in computing. Its computation gets ex...
research
05/06/2020

Quantum pattern matching Oracle construction

We propose a couple of oracle construction methods for quantum pattern m...
research
02/26/2023

Large-Block Modular Addition Checksum Algorithms

Checksum algorithms are widely employed due to their use of a simple alg...
research
11/27/2020

Limitations of Mean-Based Algorithms for Trace Reconstruction at Small Distance

Trace reconstruction considers the task of recovering an unknown string ...
research
11/06/2017

FAMOUS: Fast Approximate string Matching using OptimUm search Schemes

Finding approximate occurrences of a pattern in a text using a full-text...
research
02/15/2022

Constant-weight PIR: Single-round Keyword PIR via Constant-weight Equality Operators

Equality operators are an essential building block in tasks over secure ...

Please sign up or login with your details

Forgot password? Click here to reset