Compact Data Structures for Shortest Unique Substring Queries
Given a string T of length n, a substring u = T[i.. j] of T is called a shortest unique substring (SUS) for an interval [s, t] if (a) u occurs exactly once in T, (b) u contains the interval [s, t] (i.e. i ≤ s ≤ t ≤ j), and (c) every substring v of T with |v| < |u| containing [s, t] occurs at least twice in T. Given a query interval [s, t] ⊂ [1, n], the interval SUS problem is to output all the SUSs for the interval [s, t]. In this article, we propose a 4n + o(n) bits data structure answering an interval SUS query in output-sensitive O(occ) time, where occ is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for s = t. Here, we propose a (log2 3 + 1)n + o(n) bits data structure answering a point SUS query in the same output-sensitive time.
READ FULL TEXT