On Computing Average Common Substring Over Run Length Encoded Sequences

05/16/2018
by   Sahar Hooshmand, et al.
0

The Average Common Substring (ACS) is a popular alignment-free distance measure for phylogeny reconstruction. The ACS can be computed in O(n) space and time, where n=x+y is the input size. The compressed string matching is the study of string matching problems with the following twist: the input data is in a compressed format and the underling task must be performed with little or no decompression. In this paper, we revisit the ACS problem under this paradigm where the input sequences are given in their run-length encoded format. We present an algorithm to compute ACS(X,Y) in O(Nlog N) time using O(N) space, where N is the total length of sequences after run-length encoding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2022

Minimal Absent Words on Run-Length Encoded Strings

A string w is called a minimal absent word (MAW) for another string T if...
research
03/11/2020

Rényi entropy and pattern matching for run-length encoded sequences

In this note, we studied the asymptotic behaviour of the length of the l...
research
10/17/2019

Unshuffling fields in data formats

Data format reverse engineering commonly involves identifying conserved ...
research
12/03/2017

ALLSAT compressed with wildcards. Part 4: An invitation for C-programmers

The model set of a general Boolean function in CNF is calculated in a co...
research
03/31/2023

The Many Qualities of a New Directly Accessible Compression Scheme

We present a new variable-length computation-friendly encoding scheme, n...
research
08/31/2022

Computing all-vs-all MEMs in run-length encoded collections of HiFi reads

We describe an algorithm to find maximal exact matches (MEMs) among HiFi...
research
02/14/2019

Conversion from RLBWT to LZ77

Converting a compressed format of a string into another compressed forma...

Please sign up or login with your details

Forgot password? Click here to reset