ComHapDet: A Spatial Community Detection Algorithm for Haplotype Assembly

11/27/2019
by   Abishek Sankararaman, et al.
0

Background: Haplotypes, the ordered lists of single nucleotide variations that distinguish chromosomal sequences from their homologous pairs, may reveal an individual's susceptibility to hereditary and complex diseases and affect how our bodies respond to therapeutic drugs. Reconstructing haplotypes of an individual from short sequencing reads is an NP-hard problem that becomes even more challenging in the case of polyploids. While increasing lengths of sequencing reads and insert sizes black helps improve accuracy of reconstruction, it also exacerbates computational complexity of the haplotype assembly task. This has motivated the pursuit of algorithmic frameworks capable of accurate yet efficient assembly of haplotypes from high-throughput sequencing data. Results: We propose a novel graphical representation of sequencing reads and pose the haplotype assembly problem as an instance of community detection on a spatial random graph. To this end, we construct a graph where each read is a node with an unknown community label associating the read with the haplotype it samples. Haplotype reconstruction can then be thought of as a two-step procedure: first, one recovers the community labels on the nodes (i.e., the reads), and then uses the estimated labels to assemble the haplotypes. Based on this observation, we propose ComHapDet - a novel assembly algorithm for diploid and ployploid haplotypes which allows both bialleleic and multi-allelic variants. Conclusions: Performance of the proposed algorithm is benchmarked on simulated as well as experimental data obtained by sequencing Chromosome 5 of tetraploid biallelic Solanum-Tuberosum (Potato). The results demonstrate the efficacy of the proposed method and that it compares favorably with the existing techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2022

A Crowdsourced Gameplay for Whole-Genome Assembly via Short Reads

Next-generation sequencing has revolutionized the field of genomics by p...
research
12/13/2021

ViQUF: de novo Viral Quasispecies reconstruction using Unitig-based Flow networks

During viral infection, intrahost mutation and recombination can lead to...
research
10/21/2022

Graph Coloring via Neural Networks for Haplotype Assembly and Viral Quasispecies Reconstruction

Understanding genetic variation, e.g., through mutations, in organisms i...
research
06/13/2016

Evidential Label Propagation Algorithm for Graphs

Community detection has attracted considerable attention crossing many a...
research
03/17/2023

Efficient and Feasible Robotic Assembly Sequence Planning via Graph Representation Learning

Automatic Robotic Assembly Sequence Planning (RASP) can significantly im...
research
11/13/2019

A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction

Reconstructing components of a genomic mixture from data obtained by mea...
research
09/20/2013

mTim: Rapid and accurate transcript reconstruction from RNA-Seq data

Recent advances in high-throughput cDNA sequencing (RNA-Seq) technology ...

Please sign up or login with your details

Forgot password? Click here to reset