Approximate Circular Pattern Matching

We consider approximate circular pattern matching (CPM, in short) under the Hamming and edit distance, in which we are given a length-n text T, a length-m pattern P, and a threshold k>0, and we are to report all starting positions of fragments of T (called occurrences) that are at distance at most k from some cyclic rotation of P. In the decision version of the problem, we are to check if any such occurrence exists. All previous results for approximate CPM were either average-case upper bounds or heuristics, except for the work of Charalampopoulos et al. [CKP^+, JCSS'21], who considered only the Hamming distance. For the reporting version of the approximate CPM problem, under the Hamming distance we improve upon the main algorithm of [CKP^+, JCSS'21] from O(n+(n/m)· k^4) to O(n+(n/m)· k^3loglog k) time; for the edit distance, we give an O(nk^2)-time algorithm. We also consider the decision version of the approximate CPM problem. Under the Hamming distance, we obtain an O(n+(n/m)· k^2log k/loglog k)-time algorithm, which nearly matches the algorithm by Chan et al. [CGKKP, STOC'20] for the standard counterpart of the problem. Under the edit distance, the O(nklog^3 k) runtime of our algorithm nearly matches the O(nk) runtime of the Landau-Vishkin algorithm [LV, J. Algorithms'89]. As a stepping stone, we propose O(nklog^3 k)-time algorithm for the Longest Prefix k'-Approximate Match problem, proposed by Landau et al. [LMS, SICOMP'98], for all k'∈{1,…,k}. We give a conditional lower bound that suggests a polynomial separation between approximate CPM under the Hamming distance over the binary alphabet and its non-circular counterpart. We also show that a strongly subquadratic-time algorithm for the decision version of approximate CPM under edit distance would refute SETH.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2022

Elastic-Degenerate String Matching with 1 Error

An elastic-degenerate string is a sequence of n finite sets of strings o...
research
07/03/2019

Circular Pattern Matching with k Mismatches

The k-mismatch problem consists in computing the Hamming distance betwee...
research
04/17/2020

Faster Approximate Pattern Matching: A Unified Approach

Approximate pattern matching is a natural and well-studied problem on st...
research
07/27/2018

Faster Recovery of Approximate Periods over Edit Distance

The approximate period recovery problem asks to compute all approximate ...
research
06/24/2020

Improved Circular k-Mismatch Sketches

The shift distance 𝗌𝗁(S_1,S_2) between two strings S_1 and S_2 of the sa...
research
07/15/2022

Matching Patterns with Variables Under Edit Distance

A pattern α is a string of variables and terminal letters. We say that α...
research
05/13/2020

k-Approximate Quasiperiodicity under Hamming and Edit Distance

Quasiperiodicity in strings was introduced almost 30 years ago as an ext...

Please sign up or login with your details

Forgot password? Click here to reset