On Computing Average Common Substring Over Run Length Encoded Sequences

05/16/2018
by   Sahar Hooshmand, et al.
Georgia Institute of Technology
University of Central Florida
0

The Average Common Substring (ACS) is a popular alignment-free distance measure for phylogeny reconstruction. The ACS can be computed in O(n) space and time, where n=x+y is the input size. The compressed string matching is the study of string matching problems with the following twist: the input data is in a compressed format and the underling task must be performed with little or no decompression. In this paper, we revisit the ACS problem under this paradigm where the input sequences are given in their run-length encoded format. We present an algorithm to compute ACS(X,Y) in O(Nlog N) time using O(N) space, where N is the total length of sequences after run-length encoding.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/28/2022

Minimal Absent Words on Run-Length Encoded Strings

A string w is called a minimal absent word (MAW) for another string T if...
03/11/2020

Rényi entropy and pattern matching for run-length encoded sequences

In this note, we studied the asymptotic behaviour of the length of the l...
10/17/2019

Unshuffling fields in data formats

Data format reverse engineering commonly involves identifying conserved ...
12/03/2017

ALLSAT compressed with wildcards. Part 4: An invitation for C-programmers

The model set of a general Boolean function in CNF is calculated in a co...
03/31/2023

The Many Qualities of a New Directly Accessible Compression Scheme

We present a new variable-length computation-friendly encoding scheme, n...
08/31/2022

Computing all-vs-all MEMs in run-length encoded collections of HiFi reads

We describe an algorithm to find maximal exact matches (MEMs) among HiFi...
02/14/2019

Conversion from RLBWT to LZ77

Converting a compressed format of a string into another compressed forma...

Please sign up or login with your details

Forgot password? Click here to reset