conLSH: Context based Locality Sensitive Hashing for Mapping of noisy SMRT Reads

03/11/2019
by   Angana Chakraborty, et al.
16

Single Molecule Real-Time (SMRT) sequencing is a recent advancement of Next Gen technology developed by Pacific Bio (PacBio). It comes with an explosion of long and noisy reads demanding cutting edge research to get most out of it. To deal with the high error probability of SMRT data, a novel contextual Locality Sensitive Hashing (conLSH) based algorithm is proposed in this article, which can effectively align the noisy SMRT reads to the reference genome. Here, sequences are hashed together based not only on their closeness, but also on similarity of context. The algorithm has O(n^ρ+1) space requirement, where n is the number of sequences in the corpus and ρ is a constant. The indexing time and querying time are bounded by O( n^ρ+1· n/1/P_2) and O(n^ρ) respectively, where P_2 > 0, is a probability value. This algorithm is particularly useful for retrieving similar sequences, a widely used task in biology. The proposed conLSH based aligner is compared with rHAT, popularly used for aligning SMRT reads, and is found to comprehensively beat it in speed as well as in memory requirements. In particular, it takes approximately 24.2% less processing time, while saving about 70.3% in peak memory requirement for H.sapiens PacBio dataset.

READ FULL TEXT

page 1

page 2

page 3

page 5

page 6

research
04/05/2020

Locality Sensitive Hashing-based Sequence Alignment Using Deep Bidirectional LSTM Models

Bidirectional Long Short-Term Memory (LSTM) is a special kind of Recurre...
research
05/10/2017

An Improved Video Analysis using Context based Extension of LSH

Locality Sensitive Hashing (LSH) based algorithms have already shown the...
research
12/05/2018

Improving Similarity Search with High-dimensional Locality-sensitive Hashing

We propose a new class of data-independent locality-sensitive hashing (L...
research
12/04/2018

Optimal Boolean Locality-Sensitive Hashing

For 0 ≤β < α < 1 the distribution H over Boolean functions h {-1, 1}^d →...
research
09/07/2018

When Hashing Met Matching: Efficient Search for Potential Matches in Ride Sharing

We study the problem of matching rides in a ride sharing platform. Such ...
research
06/07/2022

Locality-sensitive bucketing functions for the edit distance

Many bioinformatics applications involve bucketing a set of sequences wh...
research
10/13/2020

It's the Best Only When It Fits You Most: Finding Related Models for Serving Based on Dynamic Locality Sensitive Hashing

In recent, deep learning has become the most popular direction in machin...

Please sign up or login with your details

Forgot password? Click here to reset