Time-Space Tradeoffs for Finding a Long Common Substring

03/04/2020
by   Stav Ben Nun, et al.
0

We consider the problem of finding, given two documents of total length n, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic O(n)-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require Θ(n) space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildhøj (CPM 2013) showed that for n^2/3< s < n^1-o(1), the LCS problem can be solved in O(s) space and O(n^2/s) time. Kociumaka et al. (ESA 2014) generalized this tradeoff to 1 ≤ s ≤ n, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length L of the sought LCS is large. For 1 ≤ s ≤ n, we show that the LCS problem can be solved in O(s) space and Õ(n^2/L· s+n) time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2018

Linear-Time Algorithm for Long LCF with k Mismatches

In the Longest Common Factor with k Mismatches (LCF_k) problem, we are g...
research
01/20/2016

Sub-Optimal Multi-Phase Path Planning: A Method for Solving Rubik's Revenge

Rubik's Revenge, a 4x4x4 variant of the Rubik's puzzles, remains to date...
research
07/31/2018

Efficient Computation of Sequence Mappability

Sequence mappability is an important task in genome re-sequencing. In th...
research
01/18/2022

Computing Longest (Common) Lyndon Subsequences

Given a string T with length n whose characters are drawn from an ordere...
research
05/12/2023

Matching Statistics speed up BWT construction

Due to the exponential growth of genomic data, constructing dedicated da...
research
04/06/2021

Sorted Range Reporting

In sorted range selection problem, the aim is to preprocess a given arra...
research
05/16/2023

Finding Maximal Exact Matches in Graphs

We study the problem of finding maximal exact matches (MEMs) between a q...

Please sign up or login with your details

Forgot password? Click here to reset