Assembling Omnitigs using Hidden-Order de Bruijn Graphs

by   Diego Díaz-Domínguez, et al.

De novo DNA assembly is a fundamental task in Bioinformatics, and finding Eulerian paths on de Bruijn graphs is one of the dominant approaches to it. In most of the cases, there may be no one order for the de Bruijn graph that works well for assembling all of the reads. For this reason, some de Bruijn-based assemblers try assembling on several graphs of increasing order, in turn. Boucher et al. (2015) went further and gave a representation making it possible to navigate in the graph and change order on the fly, up to a maximum K, but they can use up to K extra bits per edge because they use an LCP array. In this paper, we replace the LCP array by a succinct representation of that array's Cartesian tree, which takes only 2 extra bits per edge and still lets us support interesting navigation operations efficiently. These operations are not enough to let us easily extract unitigs and only unitigs from the graph but they do let us extract a set of safe strings that contains all unitigs. Suppose we are navigating in a variable-order de Bruijn graph representation, following these rules: if there are no outgoing edges then we reduce the order, hoping one appears; if there is exactly one outgoing edge then we take it (increasing the current order, up to K); if there are two or more outgoing edges then we stop. Then we traverse a (variable-order) path such that we cross edges only when we have no choice or, equivalently, we generate a string appending characters only when we have no choice. It follows that the strings we extract are safe. Our experiments show we extract a set of strings more informative than the unitigs, while using a reasonable amount of memory.


Space-time Trade-offs for the LCP Array of Wheeler DFAs

Recently, Conte et al. generalized the longest-common prefix (LCP) array...

On maximum k-edge-colorable subgraphs of bipartite graphs

If k≥ 0, then a k-edge-coloring of a graph G is an assignment of colors ...

Phase Transitions of Structured Codes of Graphs

We consider the symmetric difference of two graphs on the same vertex se...

Quasiplanar graphs, string graphs, and the Erdos-Gallai problem

An r-quasiplanar graph is a graph drawn in the plane with no r pairwise ...

Monitoring the edges of product networks using distances

Foucaud et al. recently introduced and initiated the study of a new grap...

Finding a Maximum Clique in a Grounded 1-Bend String Graph

A grounded 1-bend string graph is an intersection graph of a set of poly...

Safety of Flow Decompositions in DAGs

Network flows are one of the most studied combinatorial optimization pro...

Please sign up or login with your details

Forgot password? Click here to reset