diBELLA: Distributed Long Read to Long Read Alignment

01/27/2020
by   Marquita Ellis, et al.
0

We present a parallel algorithm and scalable implementation for genome analysis, specifically the problem of finding overlaps and alignments for data from "third generation" long read sequencers. While long sequences of DNA offer enormous advantages for biological analysis and insight, current long read sequencing instruments have high error rates and therefore require different approaches to analysis than their short read counterparts. Our work focuses on an efficient distributed-memory parallelization of an accurate single-node algorithm for overlapping and aligning long reads. We achieve scalability of this irregular algorithm by addressing the competing issues of increasing parallelism, minimizing communication, constraining the memory footprint, and ensuring good load balance. The resulting application, diBELLA, is the first distributed memory overlapper and aligner specifically designed for long reads and parallel scalability. We describe and present analyses for high level design trade-offs and conduct an extensive empirical analysis that compares performance characteristics across state-of-the-art HPC systems as well as a commercial cloud architectures, highlighting the advantages of state-of-the-art network technologies.

READ FULL TEXT

page 8

page 9

research
07/10/2022

Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly

De novo genome assembly, i.e., rebuilding the sequence of an unknown gen...
research
11/10/2022

RAPIDx: High-performance ReRAM Processing in-Memory Accelerator for Sequence Alignment

Genome sequence alignment is the core of many biological applications. T...
research
01/10/2023

HQAlign: Aligning nanopore reads for SV detection using current-level modeling

Motivation: Detection of structural variants (SV) from the alignment of ...
research
02/12/2020

LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment

Pairwise sequence alignment is one of the most computationally intensive...
research
04/30/2018

FPGA Acceleration of Short Read Alignment

Aligning millions of short DNA or RNA reads, of 75 to 250 base pairs eac...
research
09/04/2023

Blind Biological Sequence Denoising with Self-Supervised Set Learning

Biological sequence analysis relies on the ability to denoise the imprec...
research
04/05/2016

Designing robust watermark barcodes for multiplex long-read sequencing

A method for designing sequencing barcodes that can withstand a large nu...

Please sign up or login with your details

Forgot password? Click here to reset