Generalised Pattern Matching Revisited

01/16/2020
by   Bartłomiej Dudek, et al.
0

In the problem of Generalised Pattern Matching (GPM) [STOC'94, Muthukrishnan and Palem], we are given a text T of length n over an alphabet Σ_T, a pattern P of length m over an alphabet Σ_P, and a matching relationship ⊆Σ_T ×Σ_P, and must return all substrings of T that match P (reporting) or the number of mismatches between each substring of T of length m and P (counting). In this work, we improve over all previously known algorithms for this problem for various parameters describing the input instance: * D being the maximum number of characters that match a fixed character, * S being the number of pairs of matching characters, * I being the total number of disjoint intervals of characters that match the m characters of the pattern P. At the heart of our new deterministic upper bounds for D and S lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest. To conclude, we demonstrate first lower bounds for GPM. We start by showing that any deterministic or Monte Carlo algorithm for GPM must use Ω(S) time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2019

Cartesian Tree Matching and Indexing

We introduce a new metric of match, called Cartesian tree matching, whic...
research
12/18/2020

The Parameterized Suffix Tray

Let Σ and Π be disjoint alphabets, respectively called the static alphab...
research
08/18/2023

Wheeler maps

Motivated by challenges in pangenomic read alignment, we propose a gener...
research
12/28/2017

On the Decision Tree Complexity of String Matching

String matching is one of the most fundamental problems in computer scie...
research
08/26/2020

Combinatorial Communication in the Locker Room

The reader may be familiar with various problems involving prisoners and...
research
03/02/2022

CNF Encodings of Parity

The minimum number of clauses in a CNF representation of the parity func...
research
11/05/2021

Long paths make pattern-counting hard, and deep trees make it harder

We study the counting problem known as #PPM, whose input is a pair of pe...

Please sign up or login with your details

Forgot password? Click here to reset