Gapped String Indexing in Subquadratic Space and Sublinear Query Time

11/30/2022
by   Philip Bille, et al.
0

In Gapped String Indexing, the goal is to compactly represent a string S of length n such that given queries consisting of two strings P_1 and P_2, called patterns, and an integer interval [α, β], called gap range, we can quickly find occurrences of P_1 and P_2 in S with distance in [α, β]. Due to the many applications of this fundamental problem in computational biology and elsewhere, there is a great body of work for restricted or parameterised variants of the problem. However, for the general problem statement, no improvements upon the trivial 𝒪(n)-space 𝒪(n)-query time or Ω(n^2)-space Õ(|P_1| + |P_2| + occ)-query time solutions were known so far. We break this barrier obtaining interesting trade-offs with polynomially subquadratic space and polynomially sublinear query time. In particular, we show that, for every 0≤δ≤ 1, there is a data structure for Gapped String Indexing with either Õ(n^2-δ/3) or Õ(n^3-2δ) space and Õ(|P_1| + |P_2| + n^δ· (occ+1)) query time, where occ is the number of reported occurrences. As a new fundamental tool towards obtaining our main result, we introduce the Shifted Set Intersection problem: preprocess a collection of sets S_1, …, S_k of integers such that given queries consisting of three integers i,j,s, we can quickly output YES if and only if there exist a ∈ S_i and b ∈ S_j with a+s = b. We start by showing that the Shifted Set Intersection problem is equivalent to the indexing variant of 3SUM (3SUM Indexing) [Golovnev et al., STOC 2020]. Via several steps of reduction we then show that the Gapped String Indexing problem reduces to polylogarithmically many instances of the Shifted Set Intersection problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2020

String Indexing for Top-k Close Consecutive Occurrences

The classic string indexing problem is to preprocess a string S into a c...
research
07/25/2019

The Strong 3SUM-INDEXING Conjecture is False

In the 3SUM-Indexing problem the goal is to preprocess two lists of elem...
research
02/04/2021

Gapped Indexing for Consecutive Occurrences

The classic string indexing problem is to preprocess a string S into a c...
research
04/03/2023

Compressed Indexing for Consecutive Occurrences

The fundamental question considered in algorithms on strings is that of ...
research
12/21/2018

Lower bounds for text indexing with mismatches and differences

In this paper we study lower bounds for the fundamental problem of text ...
research
06/28/2020

Random Access in Persistent Strings

We consider compact representations of collections of similar strings th...
research
04/09/2019

Suffix Trees, DAWGs and CDAWGs for Forward and Backward Tries

The suffix tree, DAWG, and CDAWG are fundamental indexing structures of ...

Please sign up or login with your details

Forgot password? Click here to reset