DeepAI AI Chat
Log In Sign Up

Lower bounds for text indexing with mismatches and differences

12/21/2018
by   Vincent Cohen-Addad, et al.
0

In this paper we study lower bounds for the fundamental problem of text indexing with mismatches and differences. In this problem we are given a long string of length n, the "text", and the task is to preprocess it into a data structure such that given a query string Q, one can quickly identify substrings that are within Hamming or edit distance at most k from Q. This problem is at the core of various problems arising in biology and text processing. While exact text indexing allows linear-size data structures with linear query time, text indexing with k mismatches (or k differences) seems to be much harder: All known data structures have exponential dependency on k either in the space, or in the time bound. We provide conditional and pointer-machine lower bounds that make a step toward explaining this phenomenon. We start by demonstrating lower bounds for k = Θ( n). We show that assuming the Strong Exponential Time Hypothesis, any data structure for text indexing that can be constructed in polynomial time cannot have O(n^1-δ) query time, for any δ>0. This bound also extends to the setting where we only ask for (1+ε)-approximate solutions for text indexing. However, in many applications the value of k is rather small, and one might hope that for small k we can develop more efficient solutions. We show that this would require a radically new approach as using the current methods one cannot avoid exponential dependency on k either in the space, or in the time bound for all even 8/√(3)√( n)< k = o( n). Our lower bounds also apply to the dictionary look-up problem, where instead of a text one is given a set of strings.

READ FULL TEXT

page 1

page 2

page 3

page 4

03/17/2022

Stronger 3SUM-Indexing Lower Bounds

The 3SUM-Indexing problem was introduced as a data structure version of ...
06/25/2018

Fast entropy-bounded string dictionary look-up with mismatches

We revisit the fundamental problem of dictionary look-up with mismatches...
07/19/2019

3SUM with Preprocessing: Algorithms, Lower Bounds and Cryptographic Applications

Given a set of integers {a_1, ..., a_N}, the 3SUM problem requires findi...
11/30/2022

Gapped String Indexing in Subquadratic Space and Sublinear Query Time

In Gapped String Indexing, the goal is to compactly represent a string S...
02/03/2020

Conditional Indexing Lower Bounds Through Self-Reducibility

We provide a general technique to turn a conditional lower bound result ...
02/12/2020

On the I/O complexity of the k-nearest neighbor problem

We consider static, external memory indexes for exact and approximate ve...
02/18/2021

Data Structures Lower Bounds and Popular Conjectures

In this paper, we investigate the relative power of several conjectures ...