Faster Approximate Pattern Matching: A Unified Approach

Approximate pattern matching is a natural and well-studied problem on strings: Given a text T, a pattern P, and a threshold k, find (the starting positions of) all substrings of T that are at distance at most k from P. We consider the two most fundamental string metrics: the Hamming distance and the edit distance. Under the Hamming distance, we search for substrings of T that have at most k mismatches with P, while under the edit distance, we search for substrings of T that can be transformed to P with at most k edits. Exact occurrences of P in T have a very simple structure: If we assume for simplicity that |T| < 3|P|/2 and trim T so that P occurs both as a prefix and as a suffix of T, then both P and T are periodic with a common period. However, an analogous characterization for the structure of occurrences with up to k mismatches was proved only recently by Bringmann et al. [SODA'19]: Either there are O(k^2)k-mismatch occurrences of P in T, or both P and T are at Hamming distance O(k) from strings with a common period O(m/k). We tighten this characterization by showing that there are O(k)k-mismatch occurrences in the case when the pattern is not (approximately) periodic, and we lift it to the edit distance setting, where we tightly bound the number of k-edit occurrences by O(k^2) in the non-periodic case. Our proofs are constructive and let us obtain a unified framework for approximate pattern matching for both considered distances. We showcase the generality of our framework with results for the fully-compressed setting (where T and P are given as a straight-line program) and for the dynamic setting (where we extend a data structure of Gawrychowski et al. [SODA'18]).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2022

Approximate Circular Pattern Matching

We consider approximate circular pattern matching (CPM, in short) under ...
research
07/27/2018

Faster Recovery of Approximate Periods over Edit Distance

The approximate period recovery problem asks to compute all approximate ...
research
04/06/2022

Faster Pattern Matching under Edit Distance

We consider the approximate pattern matching problem under the edit dist...
research
03/05/2021

Compressed Communication Complexity of Hamming Distance

We consider the communication complexity of the Hamming distance of two ...
research
05/13/2021

The Dynamic k-Mismatch Problem

The text-to-pattern Hamming distances problem asks to compute the Hammin...
research
06/13/2019

On Longest Common Property Preserved Substring Queries

We revisit the problem of longest common property preserving substring q...
research
11/18/2021

Hamming Distance Tolerant Content-Addressable Memory (HD-CAM) for Approximate Matching Applications

We propose a novel Hamming distance tolerant content-addressable memory ...

Please sign up or login with your details

Forgot password? Click here to reset