Finding Maximal Exact Matches in Graphs

05/16/2023
by   Nicola Rizzo, et al.
0

We study the problem of finding maximal exact matches (MEMs) between a query string Q and a labeled graph G. MEMs are an important class of seeds, often used in seed-chain-extend type of practical alignment methods because of their strong connections to classical metrics. A principled way to speed up chaining is to limit the number of MEMs by considering only MEMs of length at least κ (κ-MEMs). However, on arbitrary input graphs, the problem of finding MEMs cannot be solved in truly sub-quadratic time under SETH (Equi et al., ICALP 2019) even on acyclic graphs. In this paper we show an O(n· L · d^L-1 + m + M_κ,L)-time algorithm finding all κ-MEMs between Q and G spanning exactly L nodes in G, where n is the total length of node labels, d is the maximum degree of a node in G, m = |Q|, and M_κ,L is the number of output MEMs. We use this algorithm to develop a κ-MEM finding solution on indexable Elastic Founder Graphs (Equi et al., Algorithmica 2022) running in time O(nH^2 + m + M_κ), where H is the maximum number of nodes in a block, and M_κ is the total number of κ-MEMs. Our results generalize to the analysis of multiple query strings (MEMs between G and any of the strings). Additionally, we provide some preliminary experimental results showing that the number of graph MEMs is an order of magnitude smaller than the number of string MEMs of the corresponding concatenated collection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2023

Chaining of Maximal Exact Matches in Graphs

We study the problem of finding maximal exact matches (MEMs) between a q...
research
12/15/2022

Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond

The problem of String Matching to Labeled Graphs (SMLG) asks to find all...
research
01/17/2022

Linear Time Construction of Indexable Elastic Founder Graphs

Pattern matching on graphs has been widely studied lately due to its imp...
research
01/30/2019

Faster queries for longest substring palindrome after block edit

Palindromes are important objects in strings which have been extensively...
research
11/09/2022

Computing (1+epsilon)-Approximate Degeneracy in Sublinear Time

The problem of finding the degeneracy of a graph is a subproblem of the ...
research
03/04/2020

Time-Space Tradeoffs for Finding a Long Common Substring

We consider the problem of finding, given two documents of total length ...
research
09/07/2020

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

Finding the common subsequences of L multiple strings has many applicati...

Please sign up or login with your details

Forgot password? Click here to reset