On Computing Average Common Substring Over Run Length Encoded Sequences

by   Sahar Hooshmand, et al.
Georgia Institute of Technology
University of Central Florida

The Average Common Substring (ACS) is a popular alignment-free distance measure for phylogeny reconstruction. The ACS can be computed in O(n) space and time, where n=x+y is the input size. The compressed string matching is the study of string matching problems with the following twist: the input data is in a compressed format and the underling task must be performed with little or no decompression. In this paper, we revisit the ACS problem under this paradigm where the input sequences are given in their run-length encoded format. We present an algorithm to compute ACS(X,Y) in O(Nlog N) time using O(N) space, where N is the total length of sequences after run-length encoding.


page 1

page 2

page 3

page 4


Minimal Absent Words on Run-Length Encoded Strings

A string w is called a minimal absent word (MAW) for another string T if...

Rényi entropy and pattern matching for run-length encoded sequences

In this note, we studied the asymptotic behaviour of the length of the l...

Unshuffling fields in data formats

Data format reverse engineering commonly involves identifying conserved ...

ALLSAT compressed with wildcards. Part 4: An invitation for C-programmers

The model set of a general Boolean function in CNF is calculated in a co...

The Many Qualities of a New Directly Accessible Compression Scheme

We present a new variable-length computation-friendly encoding scheme, n...

Computing all-vs-all MEMs in run-length encoded collections of HiFi reads

We describe an algorithm to find maximal exact matches (MEMs) among HiFi...

Conversion from RLBWT to LZ77

Converting a compressed format of a string into another compressed forma...

Please sign up or login with your details

Forgot password? Click here to reset