Vectorized Character Counting for Faster Pattern Matching

11/15/2018
by   Roman Snytsar, et al.
0

Many modern sequence alignment tools implement fast string matching using the space efficient data structure called FM-index. The succinct nature of this data structure presents unique challenges for the algorithm designers. In this paper, we explore the opportunities for parallelization of the exact and inexact matches and present an efficient SIMD solution for the Occ portion of the algorithm. Our implementation computes all eight Occ values required for the inexact match algorithm step in a single pass. We showcase the algorithm performance in a multi-core genome aligner and discuss effects of the memory prefetch.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2019

Cartesian Tree Matching and Indexing

We introduce a new metric of match, called Cartesian tree matching, whic...
research
11/08/2020

Scout Algorithm For Fast Substring Matching

Exact substring matching is a common task in many software applications....
research
01/23/2023

Sliding Window String Indexing in Streams

Given a string S over an alphabet Σ, the 'string indexing problem' is to...
research
05/22/2018

copMEM: Finding maximal exact matches via sampling both genomes

Genome-to-genome comparisons require designating anchor points, which ar...
research
03/17/2020

An Efficient Implementation of Manacher's Algorithm

Manacher's algorithm has been shown to be optimal to the longest palindr...
research
07/19/2018

The colored longest common prefix array computed via sequential scans

Due to the increased availability of large datasets of biological sequen...
research
04/06/2020

SOPanG 2: online searching over a pan-genome without false positives

The pan-genome can be stored as elastic-degenerate (ED) string, a recent...

Please sign up or login with your details

Forgot password? Click here to reset