String Attractors: Verification and Optimization

by   Dominik Kempa, et al.

String attractors [STOC 2018] are combinatorial objects recently introduced to unify all known dictionary compression techniques in a single theory. A set Γ⊆ [1..n] is a k-attractor for a string S∈[1..σ]^n if and only if every distinct substring of S of length at most k has an occurrence straddling at least one of the positions in Γ. Finding the smallest k-attractor is NP-hard for k≥3, but polylogarithmic approximations can be found using reductions from dictionary compressors. It is easy to reduce the k-attractor problem to a set-cover instance where string's positions are interpreted as sets of substrings. The main result of this paper is a much more powerful reduction based on the truncated suffix tree. Our new characterization of the problem leads to more efficient algorithms for string attractors: we show how to check the validity and minimality of a k-attractor in near-optimal time and how to quickly compute exact and approximate solutions. For example, we prove that a minimum 3-attractor can be found in optimal O(n) time when σ∈ O(√( n)) for any constant ϵ>0, and 2.45-approximation can be computed in O(n) time on general alphabets. To conclude, we introduce and study the complexity of the closely-related sharp-k-attractor problem: to find the smallest set of positions capturing all distinct substrings of length exactly k. We show that the problem is in P for k=1,2 and is NP-complete for constant k≥ 3.


page 1

page 2

page 3

page 4


At the Roots of Dictionary Compression: String Attractors

A well-known fact in the field of lossless text compression is that high...

The 2-Attractor Problem is NP-Complete

A k-attractor is a combinatorial object unifying dictionary-based compre...

When a Dollar Makes a BWT

The Burrows-Wheeler-Transform (BWT) is a reversible string transformatio...

String Attractors and Combinatorics on Words

The notion of string attractor has recently been introduced in [Prezza, ...

A theory of incremental compression

The ability to find short representations, i.e. to compress data, is cru...

Algorithms for Anti-Powers in Strings

A string S[1,n] is a power (or tandem repeat) of order k and period n/k ...

Compressibility measures for two-dimensional data

In this paper we extend to two-dimensional data two recently introduced ...

Please sign up or login with your details

Forgot password? Click here to reset