Computing NP-hard Repetitiveness Measures via MAX-SAT

07/06/2022
by   Hideo Bannai, et al.
0

Repetitiveness measures reveal profound characteristics of datasets, and give rise to compressed data structures and algorithms working in compressed space. Alas, the computation of some of these measures is NP-hard, and straight-forward computation is infeasible for datasets of even small sizes. Three such measures are the smallest size of a string attractor, the smallest size of a bidirectional macro scheme, and the smallest size of a straight-line program. While a vast variety of implementations for heuristically computing approximations exist, exact computation of these measures has received little to no attention. In this paper, we present MAX-SAT formulations that provide the first non-trivial implementations for exact computation of smallest string attractors, smallest bidirectional macro schemes, and smallest straight-line programs. Computational experiments show that our implementations work for texts of length up to a few hundred for straight-line programs and bidirectional macro schemes, and texts even over a million for string attractors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2021

A Separation of γ and b via Thue–Morse Words

We prove that for n≥ 2, the size b(t_n) of the smallest bidirectional sc...
research
03/04/2020

Approximating Optimal Bidirectional Macro Schemes

Lempel-Ziv is an easy-to-compute member of a wide family of so-called ma...
research
05/28/2021

On Stricter Reachable Repetitiveness Measures*

The size b of the smallest bidirectional macro scheme, which is arguably...
research
10/31/2021

Computing Matching Statistics on Repetitive Texts

Computing the matching statistics of a string P[1..m] with respect to a ...
research
06/27/2022

Balancing Run-Length Straight-Line Programs*

It was recently proved that any SLP generating a given string w can be t...
research
06/03/2022

L-systems for Measuring Repetitiveness*

An L-system (for lossless compression) is a CPD0L-system extended with t...
research
10/30/2017

At the Roots of Dictionary Compression: String Attractors

A well-known fact in the field of lossless text compression is that high...

Please sign up or login with your details

Forgot password? Click here to reset