Counting Distinct Patterns in Internal Dictionary Matching

05/12/2020
βˆ™
by   Panagiotis Charalampopoulos, et al.
βˆ™
0
βˆ™

We consider the problem of preprocessing a text T of length n and a dictionary π’Ÿ in order to be able to efficiently answer queries CountDistinct(i,j), that is, given i and j return the number of patterns from π’Ÿ that occur in the fragment T[i . . j]. The dictionary is internal in the sense that each pattern in π’Ÿ is given as a fragment of T. This way, the dictionary takes space proportional to the number of patterns d=|π’Ÿ| rather than their total length, which could be Θ(nΒ· d). An π’ͺΜƒ(n+d)-size data structure that answers CountDistinct(i,j) queries π’ͺ(log n)-approximately in π’ͺΜƒ(1) time was recently proposed in a work that introduced internal dictionary matching [ISAAC 2019]. Here we present an π’ͺΜƒ(n+d)-size data structure that answers CountDistinct(i,j) queries 2-approximately in π’ͺΜƒ(1) time. Using range queries, for any m, we give an π’ͺΜƒ(min(nd/m,n^2/m^2)+d)-size data structure that answers CountDistinct(i,j) queries exactly in π’ͺΜƒ(m) time. We also consider the special case when the dictionary consists of all square factors of the string. We design an π’ͺ(n log^2 n)-size data structure that allows us to count distinct squares in a text fragment T[i . . j] in π’ͺ(log n) time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 09/25/2019

Internal Dictionary Matching

We introduce data structures answering queries concerning the occurrence...
research
βˆ™ 07/27/2020

Internal Quasiperiod Queries

Internal pattern matching requires one to answer queries about factors o...
research
βˆ™ 03/09/2023

Direct Access for Answers to Conjunctive Queries with Aggregation

We study the fine-grained complexity of conjunctive queries with groupin...
research
βˆ™ 06/21/2020

PFP Data Structures

Prefix-free parsing (PFP) was introduced by Boucher et al. (2019) as a p...
research
βˆ™ 03/29/2018

Prefix-Free Parsing for Building Big BWTs

High-throughput sequencing technologies have led to explosive growth of ...
research
βˆ™ 05/05/2020

A Space-Efficient Dynamic Dictionary for Multisets with Constant Time Operations

We consider the dynamic dictionary problem for multisets. Given an upper...
research
βˆ™ 09/17/2020

1D and 2D Flow Routing on a Terrain

An important problem in terrain analysis is modeling how water flows acr...

Please sign up or login with your details

Forgot password? Click here to reset