Constructing Antidictionaries in Output-Sensitive Space

02/13/2019
by   Lorraine A. K. Ayad, et al.
0

A word x that is absent from a word y is called minimal if all its proper factors occur in y. Given a collection of k words y_1,y_2,...,y_k over an alphabet Σ, we are asked to compute the set M^ℓ_y_1#...#y_k of minimal absent words of length at most ℓ of word y=y_1#y_2#...#y_k, #∉Σ. In data compression, this corresponds to computing the antidictionary of k documents. In bioinformatics, it corresponds to computing words that are absent from a genome of k chromosomes. This computation generally requires Ω(n) space for n=|y| using any of the plenty available O(n)-time algorithms. This is because an Ω(n)-sized text index is constructed over y which can be impractical for large n. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when ||M^ℓ_y_1#...#y_N||=o(n), for all N∈[1,k]. For instance, in the human genome, n ≈ 3× 10^9 but ||M^12_y_1#...#y_k|| ≈ 10^6. We consider a constant-sized alphabet for stating our results. We show that all M^ℓ_y_1,...,M^ℓ_y_1#...#y_k can be computed in O(kn+∑^k_N=1||M^ℓ_y_1#...#y_N||) total time using O(MaxIn+MaxOut) space, where MaxIn is the length of the longest word in {y_1,...,y_k} and MaxOut={||M^ℓ_y_1#...#y_N||:N∈[1,k]}. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2023

Ranking and unranking bordered and unbordered words

A border of a word w is a word that is both a non-empty proper prefix an...
research
06/07/2018

Alignment-free sequence comparison using absent words

Sequence comparison is a prerequisite to virtually all comparative genom...
research
09/07/2020

On prefix palindromic length of automatic words

The prefix palindromic length PPL_𝐮(n) of an infinite word 𝐮 is the mini...
research
10/27/2020

Mutual Borders and Overlaps

A word is said to be bordered if it contains a non-empty proper prefix t...
research
06/22/2019

Prefix palindromic length of the Thue-Morse word

The prefix palindromic length PPL_u(n) of an infinite word u is the mini...
research
06/09/2019

Borders, Palindrome Prefixes, and Square Prefixes

We show that the number of length-n words over a k-letter alphabet havin...

Please sign up or login with your details

Forgot password? Click here to reset