Assembling Omnitigs using Hidden-Order de Bruijn Graphs

05/14/2018
by   Diego Díaz-Domínguez, et al.
0

De novo DNA assembly is a fundamental task in Bioinformatics, and finding Eulerian paths on de Bruijn graphs is one of the dominant approaches to it. In most of the cases, there may be no one order for the de Bruijn graph that works well for assembling all of the reads. For this reason, some de Bruijn-based assemblers try assembling on several graphs of increasing order, in turn. Boucher et al. (2015) went further and gave a representation making it possible to navigate in the graph and change order on the fly, up to a maximum K, but they can use up to K extra bits per edge because they use an LCP array. In this paper, we replace the LCP array by a succinct representation of that array's Cartesian tree, which takes only 2 extra bits per edge and still lets us support interesting navigation operations efficiently. These operations are not enough to let us easily extract unitigs and only unitigs from the graph but they do let us extract a set of safe strings that contains all unitigs. Suppose we are navigating in a variable-order de Bruijn graph representation, following these rules: if there are no outgoing edges then we reduce the order, hoping one appears; if there is exactly one outgoing edge then we take it (increasing the current order, up to K); if there are two or more outgoing edges then we stop. Then we traverse a (variable-order) path such that we cross edges only when we have no choice or, equivalently, we generate a string appending characters only when we have no choice. It follows that the strings we extract are safe. Our experiments show we extract a set of strings more informative than the unitigs, while using a reasonable amount of memory.

READ FULL TEXT
research
06/09/2023

Space-time Trade-offs for the LCP Array of Wheeler DFAs

Recently, Conte et al. generalized the longest-common prefix (LCP) array...
research
07/17/2018

On maximum k-edge-colorable subgraphs of bipartite graphs

If k≥ 0, then a k-edge-coloring of a graph G is an assignment of colors ...
research
07/17/2023

Phase Transitions of Structured Codes of Graphs

We consider the symmetric difference of two graphs on the same vertex se...
research
12/04/2021

Quasiplanar graphs, string graphs, and the Erdos-Gallai problem

An r-quasiplanar graph is a graph drawn in the plane with no r pairwise ...
research
11/19/2022

Monitoring the edges of product networks using distances

Foucaud et al. recently introduced and initiated the study of a new grap...
research
07/12/2021

Finding a Maximum Clique in a Grounded 1-Bend String Graph

A grounded 1-bend string graph is an intersection graph of a set of poly...
research
02/12/2021

Safety of Flow Decompositions in DAGs

Network flows are one of the most studied combinatorial optimization pro...

Please sign up or login with your details

Forgot password? Click here to reset