Minimal Absent Words on Run-Length Encoded Strings

02/28/2022
by   Tooru Akagi, et al.
0

A string w is called a minimal absent word (MAW) for another string T if w does not occur (as a substring) in T and any proper substring of w occurs in T. State-of-the-art data structures for reporting the set 𝖬𝖠𝖶(T) of MAWs from a given string T of length n require O(n) space, can be built in O(n) time, and can report all MAWs in O(|𝖬𝖠𝖶(T)|) time upon a query. This paper initiates the problem of computing MAWs from a compressed representation of a string. In particular, we focus on the most basic compressed representation of a string, run-length encoding (RLE), which represents each maximal run of the same characters a by a^p where p is the length of the run. Let m be the RLE-size of string T. After categorizing the MAWs into five disjoint sets ℳ_1, ℳ_2, ℳ_3, ℳ_4, ℳ_5 using RLE, we present matching upper and lower bounds for the number of MAWs in ℳ_i for i = 1,2,4,5 in terms of RLE-size m, except for ℳ_3 whose size is unbounded by m. We then present a compact O(m)-space data structure that can report all MAWs in optimal O(|𝖬𝖠𝖶(T)|) time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2022

Substring Complexities on Run-length Compressed Strings

Let S_T(k) denote the set of distinct substrings of length k in a string...
research
05/16/2018

On Computing Average Common Substring Over Run Length Encoded Sequences

The Average Common Substring (ACS) is a popular alignment-free distance ...
research
03/14/2019

Shortest Unique Palindromic Substring Queries on Run-Length Encoded Strings

For a string S, a palindromic substring S[i..j] is said to be a shortest...
research
02/14/2019

Conversion from RLBWT to LZ77

Converting a compressed format of a string into another compressed forma...
research
01/29/2019

Simulating the DNA String Graph in Succinct Space

Converting a set of sequencing reads into a lossless compact data struct...
research
01/25/2022

The development of a portable elbow exoskeleton with a Twisted Strings Actuator to assist patients with upper limb inhabitation

Over the years, the number of exoskeleton devices utilized for upper-lim...
research
04/03/2020

Enumeration of LCP values, LCP intervals and Maximal repeats in BWT-runs Bounded Space

Lcp-values, lcp-intervals, and maximal repeats are powerful tools in var...

Please sign up or login with your details

Forgot password? Click here to reset