SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping

by   Damla Senol Cali, et al.

A critical step of genome sequence analysis is the mapping of sequenced DNA fragments (i.e., reads) collected from an individual to a known linear reference genome sequence (i.e., sequence-to-sequence mapping). Recent works replace the linear reference sequence with a graph-based representation of the reference genome, which captures the genetic variations and diversity across many individuals in a population. Mapping reads to the graph-based reference genome (i.e., sequence-to-graph mapping) results in notable quality improvements in genome analysis. Unfortunately, while sequence-to-sequence mapping is well studied with many available tools and accelerators, sequence-to-graph mapping is a more difficult computational problem, with a much smaller number of practical software tools currently available. We analyze two state-of-the-art sequence-to-graph mapping tools and reveal four key issues. We find that there is a pressing need to have a specialized, high-performance, scalable, and low-cost algorithm/hardware co-design that alleviates bottlenecks in both the seeding and alignment steps of sequence-to-graph mapping. To this end, we propose SeGraM, a universal algorithm/hardware co-designed genomic mapping accelerator that can effectively and efficiently support both sequence-to-graph mapping and sequence-to-sequence mapping, for both short and long reads. To our knowledge, SeGraM is the first algorithm/hardware co-design for accelerating sequence-to-graph mapping. SeGraM consists of two main components: (1) MinSeed, the first minimizer-based seeding accelerator; and (2) BitAlign, the first bitvector-based sequence-to-graph alignment accelerator. We demonstrate that SeGraM provides significant improvements for multiple steps of the sequence-to-graph and sequence-to-sequence mapping pipelines.



page 6

page 12


Accelerating Genome Sequence Analysis via Efficient Hardware/Algorithm Co-Design

Genome sequence analysis plays a pivotal role in enabling many medical a...

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis

Genome sequence analysis has enabled significant advancements in medical...

GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

Read mapping is a fundamental, yet computationally-expensive step in man...

Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems

Innovations in Next-Generation Sequencing are enabling generation of DNA...

Random Fragments Classification of Microbial Marker Clades with Multi-class SVM and N-Best Algorithm

Microbial clades modeling is a challenging problem in biology based on m...

Sequencing by Emergence: Modeling and Estimation

Sequencing by Emergence (SEQE) is a new single-molecule nucleic acid (DN...

Alignment- and reference-free phylogenomics with colored de-Bruijn graphs

We present a new whole-genome based approach to infer large-scale phylog...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.