A Grammar Compression Algorithm based on Induced Suffix Sorting

We introduce GCIS, a grammar compression algorithm based on the induced suffix sorting algorithm SAIS, introduced by Nong et al. in 2009. Our solution builds on the factorization performed by SAIS during suffix sorting. We construct a context-free grammar on the input string which can be further reduced into a shorter string by substituting each substring by its correspondent factor. The resulting grammar is encoded by exploring some redundancies, such as common prefixes between suffix rules, which are sorted according to SAIS framework. When compared to well-known compression tools such as Re-Pair and 7-zip, our algorithm is competitive and very effective at handling repetitive string regarding compression ratio, compression and decompression running time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/25/2020

Grammar Compression By Induced Suffix Sorting

A grammar compression algorithm, called GCIS, is introduced in this work...
research
04/12/2022

Efficient Construction of the BWT for Repetitive Text Using String Compression

We present a new semi-external algorithm that builds the Burrows-Wheeler...
research
06/13/2018

O(n n)-time text compression by LZ-style longest first substitution

Mauer et al. [A Lempel-Ziv-style Compression Method for Repetitive Texts...
research
02/22/2023

RNA secondary structures: from ab initio prediction to better compression, and back

In this paper, we use the biological domain knowledge incorporated into ...
research
11/13/2020

A grammar compressor for collections of reads with applications to the construction of the BWT

We describe a grammar for DNA sequencing reads from which we can compute...
research
06/20/2016

A Data-Driven Approach for Semantic Role Labeling from Induced Grammar Structures in Language

Semantic roles play an important role in extracting knowledge from text....
research
05/28/2021

Grammar Index By Induced Suffix Sorting

Pattern matching is the most central task for text indices. Most recent ...

Please sign up or login with your details

Forgot password? Click here to reset