GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping

04/06/2016
by   Mohammed Alser, et al.
0

Motivation: High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments -called short reads- that cause significant computational burden. To analyze the entire genome, each of the billions of short reads must be mapped to a reference genome based on the similarity between a read and "candidate" locations in that reference genome. The similarity measurement, called alignment, formulated as an approximate string matching problem, is the computational bottleneck because: (1) it is implemented using quadratic-time dynamic programming algorithms, and (2) the majority of candidate locations in the reference genome do not align with a given read due to high dissimilarity. Calculating the alignment of such incorrect candidate locations consumes an overwhelming majority of a modern read mapper's execution time. Therefore, it is crucial to develop a fast and effective filter that can detect incorrect candidate locations and eliminate them before using computationally costly alignment operations. Results: We propose GateKeeper, a new hardware accelerator that functions as a pre-alignment step that quickly filters out most incorrect candidate locations. GateKeeper is the first design to accelerate pre-alignment using Field-Programmable Gate Arrays (FPGAs), which can perform pre-alignment much faster than software. GateKeeper can be integrated with any mapper that performs sequence alignment for verification. When implemented on a single FPGA chip, GateKeeper maintains high accuracy (on average >96 to 105-fold and 215-fold speedup over the state-of-the-art software pre-alignment techniques, Adjacency Filter and Shifted Hamming Distance (SHD), respectively. Availability: GateKeeper is available at: https://github.com/BilkentCompGen/GateKeeper.

READ FULL TEXT
research
10/08/2019

Accelerating the Understanding of Life's Code Through Better Algorithms and Hardware Design

Calculating the similarities between a pair of genomic sequences is one ...
research
03/27/2021

GateKeeper-GPU: Fast and Accurate Pre-Alignment Filtering in Short Read Mapping

At the last step of short read mapping, the candidate locations of the r...
research
04/30/2018

FPGA Acceleration of Short Read Alignment

Aligning millions of short DNA or RNA reads, of 75 to 250 base pairs eac...
research
12/09/2022

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

Basecalling is an essential step in nanopore sequencing analysis where t...
research
05/26/2015

Large-scale Machine Learning for Metagenomics Sequence Classification

Metagenomics characterizes the taxonomic diversity of microbial communit...
research
06/07/2022

Fast Exact String to D-Texts Alignments

In recent years, aligning a sequence to a pangenome has become a central...
research
06/30/2018

Fast Characterization of Segmental Duplications in Genome Assemblies

Segmental duplications (SDs), or low-copy repeats (LCR), are segments of...

Please sign up or login with your details

Forgot password? Click here to reset