Scalable De Novo Genome Assembly Using Pregel

01/13/2018
by   Da Yan, et al.
0

De novo genome assembly is the process of stitching short DNA sequences to generate longer DNA sequences, without using any reference sequence for alignment. It enables high-throughput genome sequencing and thus accelerates the discovery of new genomes. In this paper, we present a toolkit, called PPA-assembler, for de novo genome assembly in a distributed setting. The operations in our toolkit provide strong performance guarantees, and can be assembled to implement various sequencing strategies. PPA-assembler adopts the popular de Bruijn graph based approach for sequencing, and each operation is implemented as a program in Google's Pregel framework for big graph processing. Experiments on large real and simulated datasets demonstrate that PPA-assembler is much more efficient than the state-of-the-arts and provides good sequencing quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2021

GapPredict: A Language Model for Resolving Gaps in Draft Genome Assemblies

Short-read DNA sequencing instruments can yield over 1e+12 bases per run...
research
08/14/2020

PANDA: Processing-in-MRAM Accelerated De Bruijn Graph based DNA Assembly

Spurred by widening gap between data processing speed and data communica...
research
11/11/2019

Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons

Jaccard Similarity index is an important measure of the overlap of two s...
research
05/10/2019

Alignment- and reference-free phylogenomics with colored de-Bruijn graphs

We present a new whole-genome based approach to infer large-scale phylog...
research
01/24/2022

Inferring taxonomic placement from DNA barcoding allowing discovery of new taxa

In ecology it has become common to apply DNA barcoding to biological sam...
research
06/01/2022

Learning to Untangle Genome Assembly with Graph Convolutional Networks

A quest to determine the complete sequence of a human DNA from telomere ...
research
11/13/2019

A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction

Reconstructing components of a genomic mixture from data obtained by mea...

Please sign up or login with your details

Forgot password? Click here to reset