Hidden Words Statistics for Large Patterns

03/21/2020
by   Svante Janson, et al.
0

We study here the so called subsequence pattern matching also known as hidden pattern matching in which one searches for a given pattern w of length m as a subsequence in a random text of length n. The quantity of interest is the number of occurrences of w as a subsequence (i.e., occurring in not necessarily consecutive text locations). This problem finds many applications from intrusion detection, to trace reconstruction, to deletion channel, and to DNA-based storage systems. In all of these applications, the pattern w is of variable length. To the best of our knowledge this problem was only tackled for a fixed length m=O(1) [Flajolet, Szpankowski and Vallée, 2006]. In our main result we prove that for m=o(n^1/3) the number of subsequence occurrences is normally distributed. In addition, we show that under some constraints on the structure of w the asymptotic normality can be extended to m=o(√(n)). For a special pattern w consisting of the same symbol, we indicate that for m=o(n) the distribution of number of subsequences is either asymptotically normal or asymptotically log normal. We conjecture that this dichotomy is true for all patterns. We use Hoeffding's projection method for U-statistics to prove our findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2020

Pattern statistics in faro words and permutations

We study the distribution and the popularity of some patterns in words o...
research
03/18/2021

The equidistribution of some Mahonian statistics over permutations avoiding a pattern of length three

We prove the equidistribution of several multistatistics over some class...
research
02/14/2019

On long words avoiding Zimin patterns

A pattern is encountered in a word if some infix of the word is the imag...
research
02/27/2022

Parallel algorithm for pattern matching problems under substring consistent equivalence relations

Given a text and a pattern over an alphabet, the pattern matching proble...
research
07/30/2018

A Proof of Entropy Minimization for Outputs in Deletion Channels via Hidden Word Statistics

From the output produced by a memoryless deletion channel from a uniform...
research
11/24/2018

OCLEP+: One-class Anomaly and Intrusion Detection Using Minimal Length of Emerging Patterns

This paper presents a method called One-class Classification using Lengt...
research
06/16/2022

On the Size of Balls and Anticodes of Small Diameter under the Fixed-Length Levenshtein Metric

The rapid development of DNA storage has brought the deletion and insert...

Please sign up or login with your details

Forgot password? Click here to reset