MR-RePair: Grammar Compression based on Maximal Repeats

11/12/2018
by   Isamu Furuya, et al.
0

We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpus. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2019

Practical Repetition-Aware Grammar Compression

The goal of grammar compression is to construct a small sized context fr...
research
02/17/2022

RePair Grammars are the Smallest Grammars for Fibonacci Words

Grammar-based compression is a loss-less data compression scheme that re...
research
06/03/2019

Rpair: Rescaling RePair with Rsync

Data compression is a powerful tool for managing massive but repetitive ...
research
08/17/2022

Input Repair via Synthesis and Lightweight Error Feedback

Often times, input data may ostensibly conform to a given input format, ...
research
04/24/2022

Model Repair via Symmetry

The symmetry of a Kripke structure ℳ has been exploited to replace a mod...
research
08/24/2021

Context-aware Telco Outdoor Localization

Recent years have witnessed the fast growth in telecommunication (Telco)...
research
11/01/2022

E2E Refined Dataset

Although the well-known MR-to-text E2E dataset has been used by many res...

Please sign up or login with your details

Forgot password? Click here to reset