Faster Pattern Matching under Edit Distance

We consider the approximate pattern matching problem under the edit distance. Given a text T of length n, a pattern P of length m, and a threshold k, the task is to find the starting positions of all substrings of T that can be transformed to P with at most k edits. More than 20 years ago, Cole and Hariharan [SODA'98, J. Comput.'02] gave an 𝒪(n+k^4 · n/ m)-time algorithm for this classic problem, and this runtime has not been improved since. Here, we present an algorithm that runs in time 𝒪(n+k^3.5√(log m log k)· n/m), thus breaking through this long-standing barrier. In the case where n^1/4+ε≤ k ≤ n^2/5-ε for some arbitrarily small positive constant ε, our algorithm improves over the state-of-the-art by polynomial factors: it is polynomially faster than both the algorithm of Cole and Hariharan and the classic 𝒪(kn)-time algorithm of Landau and Vishkin [STOC'86, J. Algorithms'89]. We observe that the bottleneck case of the alternative 𝒪(n+k^4 · n/m)-time algorithm of Charalampopoulos, Kociumaka, and Wellnitz [FOCS'20] is when the text and the pattern are (almost) periodic. Our new algorithm reduces this case to a new dynamic problem (Dynamic Puzzle Matching), which we solve by building on tools developed by Tiskin [SODA'10, Algorithmica'15] for the so-called seaweed monoid of permutation matrices. Our algorithm relies only on a small set of primitive operations on strings and thus also applies to the fully-compressed setting (where text and pattern are given as straight-line programs) and to the dynamic setting (where we maintain a collection of strings under creation, splitting, and concatenation), improving over the state of the art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2020

Faster Approximate Pattern Matching: A Unified Approach

Approximate pattern matching is a natural and well-studied problem on st...
research
10/08/2018

Approximate Online Pattern Matching in Sub-linear Time

We consider the approximate pattern matching problem under edit distance...
research
08/31/2022

Pattern matching under DTW distance

In this work, we consider the problem of pattern matching under the dyna...
research
05/22/2019

Cartesian Tree Matching and Indexing

We introduce a new metric of match, called Cartesian tree matching, whic...
research
07/27/2018

Faster Recovery of Approximate Periods over Edit Distance

The approximate period recovery problem asks to compute all approximate ...
research
06/10/2021

Small space and streaming pattern matching with k edits

In this work, we revisit the fundamental and well-studied problem of app...
research
08/06/2020

Fine-Grained Complexity of Regular Expression Pattern Matching and Membership

The currently fastest algorithm for regular expression pattern matching ...

Please sign up or login with your details

Forgot password? Click here to reset