A Memory-Efficient FM-Index Constructor for Next-Generation Sequencing Applications on FPGAs

02/05/2021
by   Nae-Chyun Chen, et al.
0

FM-index is an efficient data structure for string search and is widely used in next-generation sequencing (NGS) applications such as sequence alignment and de novo assembly. Recently, FM-indexing is even performed down to the read level, raising a demand of an efficient algorithm for FM-index construction. In this work, we propose a hardware-compatible Self-Aided Incremental Indexing (SAII) algorithm and its hardware architecture. This novel algorithm builds FM-index with no memory overhead, and the hardware system for realizing the algorithm can be very compact. Parallel architecture and a special prefetch controller is designed to enhance computational efficiency. An SAII-based FM-index constructor is implemented on an Altera Stratix V FPGA board. The presented constructor can support DNA sequences of sizes up to 131,072-bp, which is enough for small-scale references and reads obtained from current major platforms. Because the proposed constructor needs very few hardware resource, it can be easily integrated into different hardware accelerators designed for FM-index-based applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2018

Efficient Construction of a Complete Index for Pan-Genomics Read Alignment

While short read aligners, which predominantly use the FM-index, are abl...
research
11/29/2021

Bounding the Last Mile: Efficient Learned String Indexing

We introduce the RadixStringSpline (RSS) learned index structure for eff...
research
01/13/2021

EXMA: A Genomics Accelerator for Exact-Matching

Genomics is the foundation of precision medicine, global food security a...
research
03/13/2018

An FPGA-Based Hardware Accelerator for Energy-Efficient Bitmap Index Creation

Bitmap index is recognized as a promising candidate for online analytics...
research
05/18/2020

A Two-level Spatial In-Memory Index

Very large volumes of spatial data increasingly become available and dem...
research
08/06/2023

Nucleotide String Indexing using Range Matching

The two most common data-structures for genome indexing, FM-indices and ...
research
10/10/2019

E2FM: an encrypted and compressed full-text index for collections of genomic sequences

Next Generation Sequencing (NGS) platforms and, more generally, high-thr...

Please sign up or login with your details

Forgot password? Click here to reset