Approximating Text-to-Pattern Hamming Distances

01/01/2020
by   Timothy M. Chan, et al.
0

We revisit a fundamental problem in string matching: given a pattern of length m and a text of length n, both over an alphabet of size σ, compute the Hamming distance between the pattern and the text at every location. Several (1+ϵ)-approximation algorithms have been proposed in the literature, with running time of the form O(ϵ^-O(1)nlog nlog m), all using fast Fourier transform (FFT). We describe a simple (1+ϵ)-approximation algorithm that is faster and does not need FFT. Combining our approach with additional ideas leads to numerous new results: - We obtain the first linear-time approximation algorithm; the running time is O(ϵ^-2n). - We obtain a faster exact algorithm computing all Hamming distances up to a given threshold k; its running time improves previous results by logarithmic factors and is linear if k<√(m). - We obtain approximation algorithms with better ϵ-dependence using rectangular matrix multiplication. The time-bound is Õ(n) when the pattern is sufficiently long: m>ϵ^-28. Previous algorithms require Õ(ϵ^-1n) time. - When k is not too small, we obtain a truly sublinear-time algorithm to find all locations with Hamming distance approximately (up to a constant factor) less than k, in O((n/k^Ω(1)+occ)n^o(1)) time, where occ is the output size. The algorithm leads to a property tester, returning true if an exact match exists and false if the Hamming distance is more than δ m at every location, running in Õ(δ^-1/3n^2/3+δ^-1n/m) time. - We obtain a streaming algorithm to report all locations with Hamming distance approximately less than k, using Õ(ϵ^-2√(k)) space. Previously, streaming algorithms were known for the exact problem with Õ(k) space or for the approximate problem with Õ(ϵ^-O(1)√(m)) space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2020

Approximating Text-to-Pattern Distance via Dimensionality Reduction

Text-to-pattern distance is a fundamental problem in string matching, wh...
research
07/09/2019

L_p Pattern Matching in a Stream

We consider the problem of computing distance between a pattern of lengt...
research
11/10/2017

Hamming distance completeness and sparse matrix multiplication

We investigate relations between (+,) vector products for binary integer...
research
10/03/2018

Approximating Approximate Pattern Matching

Given a text T of length n and a pattern P of length m, the approximate ...
research
12/10/2021

Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding

The Dyck language, which consists of well-balanced sequences of parenthe...
research
01/28/2018

Faster Approximate(d) Text-to-Pattern L1 Distance

The problem of finding distance between pattern of length m and text of ...
research
11/22/2022

An Algorithmic Bridge Between Hamming and Levenshtein Distances

The edit distance between strings classically assigns unit cost to every...

Please sign up or login with your details

Forgot password? Click here to reset