Elastic Founder Graphs Improved and Enhanced

03/09/2023
βˆ™
by   Nicola Rizzo, et al.
βˆ™
0
βˆ™

Indexing labeled graphs for pattern matching is a central challenge of pangenomics. Equi et al. (Algorithmica, 2022) developed the Elastic Founder Graph (𝖀π–₯𝖦) representing an alignment of m sequences of length n, drawn from alphabet Ξ£ plus the special gap character: the paths spell the original sequences or their recombination. By enforcing the semi-repeat-free property, the 𝖀π–₯𝖦 admits a polynomial-space index for linear-time pattern matching, breaking through the conditional lower bounds on indexing labeled graphs (Equi et al., SOFSEM 2021). In this work we improve the space of the 𝖀π–₯𝖦 index answering pattern matching queries in linear time, from linear in the length of all strings spelled by three consecutive node labels, to linear in the size of the edge labels. Then, we develop linear-time construction algorithms optimizing for different metrics: we improve the existing linearithmic construction algorithms to O(mn), by solving the novel exclusive ancestor set problem on trees; we propose, for the simplified gapless setting, an O(mn)-time solution minimizing the maximum block height, that we generalize by substituting block height with prefix-aware height. Finally, to show the versatility of the framework, we develop a BWT-based 𝖀π–₯𝖦 index and study how to encode and perform document listing queries on a set of paths of the graphs, reporting which paths present a given pattern as a substring. We propose the 𝖀π–₯𝖦 framework as an improved and enhanced version of the framework for the gapless setting, along with construction methods that are valid in any setting concerned with the segmentation of aligned sequences.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 05/19/2020

Linear Time Construction of Indexable Founder Block Graphs

We introduce a compact pangenome representation based on an optimal segm...
research
βˆ™ 02/25/2021

Algorithms and Complexity on Indexing Founder Graphs

We study the problem of matching a string in a labeled graph. Previous r...
research
βˆ™ 01/17/2022

Linear Time Construction of Indexable Elastic Founder Graphs

Pattern matching on graphs has been widely studied lately due to its imp...
research
βˆ™ 02/03/2023

Chaining of Maximal Exact Matches in Graphs

We study the problem of finding maximal exact matches (MEMs) between a q...
research
βˆ™ 09/26/2022

Inferring strings from position heaps in linear time

Position heaps are index structures of text strings used for the exact s...
research
βˆ™ 02/03/2020

Graphs cannot be indexed in polynomial time for sub-quadratic time string matching, unless SETH fails

We consider the following string matching problem on a node-labeled grap...
research
βˆ™ 11/19/2020

Subpath Queries on Compressed Graphs: a Survey

Text indexing is a classical algorithmic problem that has been studied f...

Please sign up or login with your details

Forgot password? Click here to reset