Π-cyc: A Reference-free SNP Discovery Application using Parallel Graph Search

09/18/2018
by   Reda Younsi, et al.
0

Motivation: Working with a large number of genomes simultaneously is of great interest in genetic population and comparative genomics research. Bubbles discovery in multi-genomes coloured de bruijn graph for de novo genome assembly is a problem that can be translated to cycles enumeration in graph theory. Cycle enumerations algorithms in big and complex de Bruijn graphs are time consuming. Specialised fast algorithms for efficient bubble search are needed for coloured de bruijn graph variant calling applications. In coloured de Bruijn graphs, bubble paths coverages are used in downstream variants calling analysis. Results: In this paper, we introduce a fast parallel graph search for different K-mer cycle sizes. Coloured path coverages are used for SNP prediction. The graph search method uses a combined multi-node and multi-core design to speeds up cycles enumeration. The search algorithm uses an index extracted from the raw assembly of a coloured de Bruijn graph stored in a hash table. The index is distributed across different CPU-cores, in a shared memory HPC compute node, to build undirected subgraphs then search independently and simultaneously specific cycle sizes. This same index can also be split between several HPC compute nodes to take advantage of as many CPU-cores available to the user. The local neighbourhood parallel search approach reduces the graph's complexity and facilitate cycles search of a multi-colour de Bruijn graph. The search algorithm is incorporated into Π-cyc application and tested on a number of Schizosaccharomyces Pombe genomes. Availability: Π-cyc is an open-source software available at www.github.com/2kplus2P

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2018

Index Codes for Interlinked Cycle Structures with Outer Cycles

Index code construction for a class of side-information graphs called in...
research
12/21/2017

Tight Hardness for Shortest Cycles and Paths in Sparse Graphs

Fine-grained reductions have established equivalences between many core ...
research
05/28/2021

High Performance and Scalable NAT System on Commodity Platforms

Quick network address translation (NAT) is proposed to improve the netwo...
research
04/19/2023

Nearly Work-Efficient Parallel DFS in Undirected Graphs

We present the first parallel depth-first search algorithm for undirecte...
research
11/10/2020

A step towards neural genome assembly

De novo genome assembly focuses on finding connections between a vast am...
research
06/14/2021

MetaCache-GPU: Ultra-Fast Metagenomic Classification

The cost of DNA sequencing has dropped exponentially over the past decad...
research
01/22/2020

Characterizing cycle structure in complex networks

The ubiquitous existence of cycles is one of important originations of n...

Please sign up or login with your details

Forgot password? Click here to reset