Computing Matching Statistics on Repetitive Texts

10/31/2021
by   Younan Gao, et al.
0

Computing the matching statistics of a string P[1..m] with respect to a text T[1..n] is a fundamental problem which has application to genome sequence comparison. In this paper, we study the problem of computing the matching statistics upon highly repetitive texts. We design three different data structures that are similar to LZ-compressed indexes. The space costs of all of them can be measured by γ, the size of the smallest string attractor [STOC'2018] and δ, a better measure of repetitiveness [LATIN'2020].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2023

Computing matching statistics on Wheeler DFAs

Matching statistics were introduced to solve the approximate string matc...
research
11/11/2020

PHONI: Streamed Matching Statistics with Multi-Genome References

Computing the matching statistics of patterns with respect to a text is ...
research
07/06/2022

Computing NP-hard Repetitiveness Measures via MAX-SAT

Repetitiveness measures reveal profound characteristics of datasets, and...
research
07/03/2022

Suffix sorting via matching statistics

We introduce a new algorithm for constructing the generalized suffix arr...
research
06/27/2022

Balancing Run-Length Straight-Line Programs*

It was recently proved that any SLP generating a given string w can be t...
research
08/18/2018

The Capacity of Some Pólya String Models

We study random string-duplication systems, which we call Pólya string m...
research
05/12/2023

Matching Statistics speed up BWT construction

Due to the exponential growth of genomic data, constructing dedicated da...

Please sign up or login with your details

Forgot password? Click here to reset