
Fast entropybounded string dictionary lookup with mismatches
We revisit the fundamental problem of dictionary lookup with mismatches...
read it

A DataStructure for Approximate Longest Common Subsequence of A Set of Strings
Given a set of k strings I, their longest common subsequence (LCS) is th...
read it

Generalized Dictionary Matching under Substring Consistent Equivalence Relations
Given a set of patterns called a dictionary and a text, the dictionary m...
read it

Righttoleft online construction of parameterized position heaps
Two strings of equal length are said to parameterized match if there is ...
read it

Cadences in GrammarCompressed Strings
Cadences are structurally maximal arithmetic progressions of indices cor...
read it

A QPTAS for Gapless MEC
We consider the problem Minimum Error Correction (MEC). A MEC instance i...
read it

On Longest Common Property Preserved Substring Queries
We revisit the problem of longest common property preserving substring q...
read it
Pattern Masking for Dictionary Matching
In the Pattern Masking for Dictionary Matching (PMDM) problem, we are given a dictionary π of d strings, each of length β, a query string q of length β, and a positive integer z, and we are asked to compute a smallest set Kβ{1,β¦,β}, so that if q[i], for all iβ K, is replaced by a wildcard, then q matches at least z strings from π. The PMDM problem lies at the heart of two important applications featured in largescale realworld systems: record linkage of databases that contain sensitive information, and query term dropping. In both applications, solving PMDM allows for providing data utility guarantees as opposed to existing approaches. We first show, through a reduction from the wellknown kClique problem, that a decision version of the PMDM problem is NPcomplete, even for strings over a binary alphabet. We present a data structure for PMDM that answers queries over π in time πͺ(2^β/2(2^β/2+Ο)β) and requires space πͺ(2^βd^2/Ο^2+2^β/2d), for any parameter Οβ[1,d]. We also approach the problem from a more practical perspective. We show an πͺ((dβ)^k/3+dβ)time and πͺ(dβ)space algorithm for PMDM if k=K=πͺ(1). We generalize our exact algorithm to mask multiple query strings simultaneously. We complement our results by showing a twoway polynomialtime reduction between PMDM and the Minimum Union problem [ChlamtΓ‘Δ et al., SODA 2017]. This gives a polynomialtime πͺ(d^1/4+Ο΅)approximation algorithm for PMDM, which is tight under plausible complexity conjectures.
READ FULL TEXT
Comments
There are no comments yet.