DeepAI AI Chat
Log In Sign Up

A Memory-Efficient FM-Index Constructor for Next-Generation Sequencing Applications on FPGAs

by   Nae-Chyun Chen, et al.
National Taiwan University

FM-index is an efficient data structure for string search and is widely used in next-generation sequencing (NGS) applications such as sequence alignment and de novo assembly. Recently, FM-indexing is even performed down to the read level, raising a demand of an efficient algorithm for FM-index construction. In this work, we propose a hardware-compatible Self-Aided Incremental Indexing (SAII) algorithm and its hardware architecture. This novel algorithm builds FM-index with no memory overhead, and the hardware system for realizing the algorithm can be very compact. Parallel architecture and a special prefetch controller is designed to enhance computational efficiency. An SAII-based FM-index constructor is implemented on an Altera Stratix V FPGA board. The presented constructor can support DNA sequences of sizes up to 131,072-bp, which is enough for small-scale references and reads obtained from current major platforms. Because the proposed constructor needs very few hardware resource, it can be easily integrated into different hardware accelerators designed for FM-index-based applications.


page 1

page 2

page 3

page 4


Efficient Construction of a Complete Index for Pan-Genomics Read Alignment

While short read aligners, which predominantly use the FM-index, are abl...

Bounding the Last Mile: Efficient Learned String Indexing

We introduce the RadixStringSpline (RSS) learned index structure for eff...

EXMA: A Genomics Accelerator for Exact-Matching

Genomics is the foundation of precision medicine, global food security a...

An FPGA-Based Hardware Accelerator for Energy-Efficient Bitmap Index Creation

Bitmap index is recognized as a promising candidate for online analytics...

A Two-level Spatial In-Memory Index

Very large volumes of spatial data increasingly become available and dem...

Nucleotide String Indexing using Range Matching

The two most common data-structures for genome indexing, FM-indices and ...

E2FM: an encrypted and compressed full-text index for collections of genomic sequences

Next Generation Sequencing (NGS) platforms and, more generally, high-thr...