GPU Scheduler for De Novo Genome Assembly with Multiple MPI Processes

09/13/2023
by   Minhao Li, et al.
0

De Novo Genome assembly is one of the most important tasks in computational biology. ELBA is the state-of-the-art distributed-memory parallel algorithm for overlap detection and layout simplification steps of De Novo genome assembly but exists a performance bottleneck in pairwise alignment. In this work, we introduce 3 GPU schedulers for ELBA to accommodate multiple MPI processes and multiple GPUs. The GPU schedulers enable multiple MPI processes to perform computation on GPUs in a round-robin fashion. Both strong and weak scaling experiments show that 3 schedulers are able to significantly improve the performance of baseline while there is a trade-off between parallelism and GPU scheduler overhead. For the best performance implementation, the one-to-one scheduler achieves ∼7-8× speed-up using 25 MPI processes compared with the baseline vanilla ELBA GPU scheduler.

READ FULL TEXT
research
10/20/2020

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

One of the most computationally intensive tasks in computational biology...
research
11/06/2022

Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI

We assess the performance of the hybrid Open Accelerator (OpenACC) and M...
research
08/02/2020

P-Cloth: Interactive Complex Cloth Simulation on Multi-GPU Systems using Dynamic Matrix Assembly and Pipelined Implicit Integrators

We present a novel parallel algorithm for cloth simulation that exploits...
research
03/03/2020

A GPU-Accelerated Barycentric Lagrange Treecode

We present an MPI + OpenACC implementation of the kernel-independent bar...
research
09/19/2023

Julia as a unifying end-to-end workflow language on the Frontier exascale system

We evaluate using Julia as a single language and ecosystem paradigm powe...
research
06/04/2020

Multi-GPU Performance Optimization of a CFD Code using OpenACC on Different Platforms

This paper investigates the multi-GPU performance of a 3D buoyancy drive...
research
12/13/2020

A GPU-Accelerated Fast Summation Method Based on Barycentric Lagrange Interpolation and Dual Tree Traversal

We present the barycentric Lagrange dual tree traversal (BLDTT) fast sum...

Please sign up or login with your details

Forgot password? Click here to reset