RePair in Compressed Space and Time

11/05/2018
by   Kensuke Sakai, et al.
0

Given a string T of length N, the goal of grammar compression is to construct a small context-free grammar generating only T. Among existing grammar compression methods, RePair (recursive paring) [Larsson and Moffat, 1999] is notable for achieving good compression ratios in practice. Although the original paper already achieved a time-optimal algorithm to compute the RePair grammar RePair(T) in expected O(N) time, the study to reduce its working space is still active so that it is applicable to large-scale data. In this paper, we propose the first RePair algorithm working in compressed space, i.e., potentially o(N) space for highly compressible texts. The key idea is to give a new way to restructure an arbitrary grammar S for T into RePair(T) in compressed space and time. Based on the recompression technique, we propose an algorithm for RePair(T) in O((N, nm N)) space and expected O((N, nm N) m) time or O((N, nm N) N) time, where n is the size of S and m is the number of variables in RePair(T). We implemented our algorithm running in O((N, nm N) m) time and show it can actually run in compressed space. We also present a new approach to reduce the peak memory usage of existing RePair algorithms combining with our algorithms, and show that the new approach outperforms, both in computation time and space, the most space efficient linear-time RePair implementation to date.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2019

Practical Repetition-Aware Grammar Compression

The goal of grammar compression is to construct a small sized context fr...
research
05/24/2021

Fast and Space-Efficient Construction of AVL Grammars from the LZ77 Parsing

Grammar compression is, next to Lempel-Ziv (LZ77) and run-length Burrows...
research
07/17/2023

Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data

Grammar compression is a general compression framework in which a string...
research
06/01/2023

ITR: A grammar-based graph compressor supporting fast neighborhood queries

Neighborhood queries are the most common queries on graphs; thus, it is ...
research
08/14/2019

Re-Pair In-Place

Re-Pair is a grammar compression scheme with favorably good compression ...
research
08/09/2023

A Hierarchical Destroy and Repair Approach for Solving Very Large-Scale Travelling Salesman Problem

For prohibitively large-scale Travelling Salesman Problems (TSPs), exist...
research
06/03/2019

Rpair: Rescaling RePair with Rsync

Data compression is a powerful tool for managing massive but repetitive ...

Please sign up or login with your details

Forgot password? Click here to reset