
Substring Query Complexity of String Reconstruction
Suppose an oracle knows a string S that is unknown to us and we want to ...
read it

Towards a Definitive Measure of Repetitiveness
Unlike in statistical compression, where Shannon's entropy is a definiti...
read it

Smoothed Analysis of Trie Height by Starlike PFAs
Tries are general purpose data structures for information retrieval. The...
read it

String Attractors for Automatic Sequences
We show that it is decidable, given an automatic sequence s and a consta...
read it

Approximate Online Pattern Matching in Sublinear Time
We consider the approximate pattern matching problem under edit distance...
read it

Dynamic Palindrome Detection
Lately, there is a growing interest in dynamic string matching problems....
read it

Practical evaluation of Lyndon factors via alphabet reordering
We evaluate the influence of different alphabet orderings on the Lyndon ...
read it
Sensitivity of string compressors and repetitiveness measures
The sensitivity of a string compression algorithm C asks how much the output size C(T) for an input string T can increase when a single character edit operation is performed on T. This notion enables one to measure the robustness of compression algorithms in terms of errors and/or dynamic changes occurring in the input string. In this paper, we analyze the worstcase multiplicative sensitivity of string compression algorithms, defined by max_T ∈Σ^n{C(T')/C(T) : ed(T, T') = 1}, where ed(T, T') denotes the edit distance between T and T'. For the most common versions of the LempelZiv 77 compressors, we prove that the worstcase multiplicative sensitivity is only a small constant (2 or 3, depending on the version of the LempelZiv 77 and the edit operation type). We strengthen our upper bound results by presenting matching lower bounds on the worstcase sensitivity for all these major versions of the LempelZiv 77 factorizations. This contrasts with the previously known related results such that the size z_ 78 of the LempelZiv 78 factorization can increase by a factor of Ω(n^3/4) [Lagarde and Perifel, 2018], and the number r of runs in the BurrowsWheeler transform can increase by a factor of Ω(log n) [Giuliani et al., 2021] when a character is prepended to an input string of length n. We also study the worstcase sensitivity of several grammar compression algorithms including Bisection, AVLgrammar, GCIS, and CDAWG. Further, we extend the notion of the worstcase sensitivity to string repetitiveness measures such as the smallest string attractor size γ and the substring complexity δ, and present matching upper and lower bounds of the worstcase multiplicative sensitivity for γ and δ.
READ FULL TEXT
Comments
There are no comments yet.