Linear Time Construction of Indexable Elastic Founder Graphs

01/17/2022
βˆ™
by   Nicola Rizzo, et al.
βˆ™
0
βˆ™

Pattern matching on graphs has been widely studied lately due to its importance in genomics applications. Unfortunately, even the simplest problem of deciding if a string appears as a subpath of a graph admits a quadratic lower bound under the Orthogonal Vectors Hypothesis (Equi et al. ICALP 2019, SOFSEM 2021). To avoid this bottleneck, the research has shifted towards more specific graph classes, e.g. those induced from multiple sequence alignments (MSAs). Consider segmenting 𝖬𝖲𝖠[1..m,1..n] into b blocks 𝖬𝖲𝖠[1..m,1..j_1], 𝖬𝖲𝖠[1..m,j_1+1..j_2], …, 𝖬𝖲𝖠[1..m,j_b-1+1..n]. The distinct strings in the rows of the blocks, after the removal of gap symbols, form the nodes of an elastic founder graph (EFG) where the edges represent the original connections observed in the MSA. An EFG is called indexable if a node label occurs as a prefix of only those paths that start from a node of the same block. Equi et al. (ISAAC 2021) showed that such EFGs support fast pattern matching and gave an O(mn log m)-time algorithm for preprocessing the MSA in a way that allows the construction of indexable EFGs maximizing the number of blocks and, alternatively, minimizing the maximum length of a block, in O(n) and O(n loglog n) time respectively. Using the suffix tree and solving a novel ancestor problem on trees, we improve the preprocessing to O(mn) time and the O(n loglog n)-time EFG construction to O(n) time, thus showing that both types of indexable EFGs can be constructed in time linear in the input size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 03/09/2023

Elastic Founder Graphs Improved and Enhanced

Indexing labeled graphs for pattern matching is a central challenge of p...
research
βˆ™ 02/25/2021

Algorithms and Complexity on Indexing Founder Graphs

We study the problem of matching a string in a labeled graph. Previous r...
research
βˆ™ 05/19/2020

Linear Time Construction of Indexable Founder Block Graphs

We introduce a compact pangenome representation based on an optimal segm...
research
βˆ™ 05/16/2023

Finding Maximal Exact Matches in Graphs

We study the problem of finding maximal exact matches (MEMs) between a q...
research
βˆ™ 02/03/2023

Chaining of Maximal Exact Matches in Graphs

We study the problem of finding maximal exact matches (MEMs) between a q...
research
βˆ™ 08/30/2020

Tight Bound for the Number of Distinct Palindromes in a Tree

For an undirected tree with n edges labelled by single letters, we consi...
research
βˆ™ 06/21/2021

Computing the original eBWT faster, simpler, and with less memory

Mantaci et al. [TCS 2007] defined the eBWT to extend the definition of t...

Please sign up or login with your details

Forgot password? Click here to reset