HQAlign: Aligning nanopore reads for SV detection using current-level modeling

01/10/2023
by   Dhaivat Joshi, et al.
0

Motivation: Detection of structural variants (SV) from the alignment of sample DNA reads to the reference genome is an important problem in understanding human diseases. Long reads that can span repeat regions, along with an accurate alignment of these long reads play an important role in identifying novel SVs. Long read sequencers such as nanopore sequencing can address this problem by providing very long reads but with high error rates, making accurate alignment challenging. Many errors induced by nanopore sequencing have a bias because of the physics of the sequencing process and proper utilization of these error characteristics can play an important role in designing a robust aligner for SV detection problems. In this paper, we design and evaluate HQAlign, an aligner for SV detection using nanopore sequenced reads. The key ideas of HQAlign include (i) using basecalled nanopore reads along with the nanopore physics to improve alignments for SVs (ii) incorporating SV specific changes to the alignment pipeline (iii) adapting these into existing state-of-the-art long read aligner pipeline, minimap2 (v2.24), for efficient alignments. Results: We show that HQAlign captures about 4 different datasets which are missed by minimap2 alignments while having a standalone performance at par with minimap2 for real nanopore reads data. For the common SV calls between HQAlign and minimap2, HQAlign improves the start and the end breakpoint accuracy for about 10 datasets. Moreover, HQAlign improves the alignment rate to 89.35 85.64 assembly, and it improves to 86.65 GRCh37 human genome.

READ FULL TEXT
research
02/12/2019

Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm

A large proportion of the basepairs in the long reads that third-generat...
research
02/05/2020

FPGA Acceleration of Sequence Alignment: A Survey

Genomics is changing our understanding of humans, evolution, diseases, a...
research
12/19/2021

Lerna: Transformer Architectures for Configuring Error Correction Tools for Short- and Long-Read Genome Sequencing

Sequencing technologies are prone to errors, making error correction (EC...
research
01/27/2020

diBELLA: Distributed Long Read to Long Read Alignment

We present a parallel algorithm and scalable implementation for genome a...
research
04/30/2018

FPGA Acceleration of Short Read Alignment

Aligning millions of short DNA or RNA reads, of 75 to 250 base pairs eac...
research
05/10/2019

Alignment- and reference-free phylogenomics with colored de-Bruijn graphs

We present a new whole-genome based approach to infer large-scale phylog...
research
04/05/2016

Designing robust watermark barcodes for multiplex long-read sequencing

A method for designing sequencing barcodes that can withstand a large nu...

Please sign up or login with your details

Forgot password? Click here to reset