Counting Distinct Patterns in Internal Dictionary Matching
We consider the problem of preprocessing a text T of length n and a dictionary ๐ in order to be able to efficiently answer queries CountDistinct(i,j), that is, given i and j return the number of patterns from ๐ that occur in the fragment T[i . . j]. The dictionary is internal in the sense that each pattern in ๐ is given as a fragment of T. This way, the dictionary takes space proportional to the number of patterns d=|๐| rather than their total length, which could be ฮ(nยท d). An ๐ชฬ(n+d)-size data structure that answers CountDistinct(i,j) queries ๐ช(log n)-approximately in ๐ชฬ(1) time was recently proposed in a work that introduced internal dictionary matching [ISAAC 2019]. Here we present an ๐ชฬ(n+d)-size data structure that answers CountDistinct(i,j) queries 2-approximately in ๐ชฬ(1) time. Using range queries, for any m, we give an ๐ชฬ(min(nd/m,n^2/m^2)+d)-size data structure that answers CountDistinct(i,j) queries exactly in ๐ชฬ(m) time. We also consider the special case when the dictionary consists of all square factors of the string. We design an ๐ช(n log^2 n)-size data structure that allows us to count distinct squares in a text fragment T[i . . j] in ๐ช(log n) time.
READ FULL TEXT