Computing matching statistics on Wheeler DFAs

01/13/2023
by   Alessio Conte, et al.
0

Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for computing matching statistics which relies on some components of a compressed suffix tree - notably, the longest common prefix (LCP) array. In this paper, we show how their algorithm can be generalized from strings to Wheeler deterministic finite automata. Most importantly, we introduce a notion of LCP array for Wheeler automata, thus establishing a first clear step towards extending (compressed) suffix tree functionalities to labeled graphs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2022

Suffix sorting via matching statistics

We introduce a new algorithm for constructing the generalized suffix arr...
research
10/31/2021

Computing Matching Statistics on Repetitive Texts

Computing the matching statistics of a string P[1..m] with respect to a ...
research
11/11/2020

PHONI: Streamed Matching Statistics with Multi-Genome References

Computing the matching statistics of patterns with respect to a text is ...
research
05/12/2023

Matching Statistics speed up BWT construction

Due to the exponential growth of genomic data, constructing dedicated da...
research
09/11/2021

The Labeled Direct Product Optimally Solves String Problems on Graphs

Suffix trees are an important data structure at the core of optimal solu...
research
09/17/2019

Generalized Dictionary Matching under Substring Consistent Equivalence Relations

Given a set of patterns called a dictionary and a text, the dictionary m...
research
07/12/2020

Fiducial Matching for the Approximate Posterior: F-ABC

F-ABC is introduced, using universal sufficient statistics, unlike previ...

Please sign up or login with your details

Forgot password? Click here to reset