GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping

09/18/2022
by   Haiyu Mao, et al.
0

Nanopore sequencing is a widely-used high-throughput genome sequencing technology that can sequence long fragments of a genome into raw electrical signals at low cost. Nanopore sequencing requires two computationally-costly processing steps for accurate downstream genome analysis. The first step, basecalling, translates the raw electrical signals into nucleotide bases (i.e., A, C, G, T). The second step, read mapping, finds the correct location of a read in a reference genome. In existing genome analysis pipelines, basecalling and read mapping are executed separately. We observe in this work that such separate execution of the two most time-consuming steps inherently leads to (1) significant data movement and (2) redundant computations on the data, slowing down the genome analysis pipeline. This paper proposes GenPIP, an in-memory genome analysis accelerator that tightly integrates basecalling and read mapping. GenPIP improves the performance of the genome analysis pipeline with two key mechanisms: (1) in-memory fine-grained collaborative execution of the major genome analysis steps in parallel; (2) a new technique for early-rejection of low-quality and unmapped reads to timely stop the execution of genome analysis for such reads, reducing inefficient computation. Our experiments show that, for the execution of the genome analysis pipeline, GenPIP provides 41.6X (8.4X) speedup and 32.8X (20.8X) energy savings with negligible accuracy loss compared to the state-of-the-art software genome analysis tools executed on a state-of-the-art CPU (GPU). Compared to a design that combines state-of-the-art in-memory basecalling and read mapping accelerators, GenPIP provides 1.39X speedup and 1.37X energy savings.

READ FULL TEXT

page 3

page 6

page 8

page 9

page 10

page 11

research
07/30/2020

Accelerating Genome Analysis: A Primer on an Ongoing Journey

Genome analysis fundamentally starts with a process known as read mappin...
research
12/09/2022

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

Basecalling is an essential step in nanopore sequencing analysis where t...
research
02/21/2022

GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

Read mapping is a fundamental, yet computationally-expensive step in man...
research
06/03/2018

Design and evaluation of a genomics variant analysis pipeline using GATK Spark tools

Scalable and efficient processing of genome sequence data, i.e. for vari...
research
12/18/2019

AirLift: A Fast and Comprehensive Technique for Translating Alignments between Reference Genomes

As genome sequencing tools and techniques improve, researchers are able ...
research
07/27/2019

Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems

Innovations in Next-Generation Sequencing are enabling generation of DNA...

Please sign up or login with your details

Forgot password? Click here to reset