Counting Distinct Patterns in Internal Dictionary Matching

05/12/2020
โˆ™
by   Panagiotis Charalampopoulos, et al.
โˆ™
0
โˆ™

We consider the problem of preprocessing a text T of length n and a dictionary ๐’Ÿ in order to be able to efficiently answer queries CountDistinct(i,j), that is, given i and j return the number of patterns from ๐’Ÿ that occur in the fragment T[i . . j]. The dictionary is internal in the sense that each pattern in ๐’Ÿ is given as a fragment of T. This way, the dictionary takes space proportional to the number of patterns d=|๐’Ÿ| rather than their total length, which could be ฮ˜(nยท d). An ๐’ชฬƒ(n+d)-size data structure that answers CountDistinct(i,j) queries ๐’ช(log n)-approximately in ๐’ชฬƒ(1) time was recently proposed in a work that introduced internal dictionary matching [ISAAC 2019]. Here we present an ๐’ชฬƒ(n+d)-size data structure that answers CountDistinct(i,j) queries 2-approximately in ๐’ชฬƒ(1) time. Using range queries, for any m, we give an ๐’ชฬƒ(min(nd/m,n^2/m^2)+d)-size data structure that answers CountDistinct(i,j) queries exactly in ๐’ชฬƒ(m) time. We also consider the special case when the dictionary consists of all square factors of the string. We design an ๐’ช(n log^2 n)-size data structure that allows us to count distinct squares in a text fragment T[i . . j] in ๐’ช(log n) time.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset