Graph Coloring via Neural Networks for Haplotype Assembly and Viral Quasispecies Reconstruction

10/21/2022
by   Hansheng Xue, et al.
0

Understanding genetic variation, e.g., through mutations, in organisms is crucial to unravel their effects on the environment and human health. A fundamental characterization can be obtained by solving the haplotype assembly problem, which yields the variation across multiple copies of chromosomes. Variations among fast evolving viruses that lead to different strains (called quasispecies) are also deciphered with similar approaches. In both these cases, high-throughput sequencing technologies that provide oversampled mixtures of large noisy fragments (reads) of genomes, are used to infer constituent components (haplotypes or quasispecies). The problem is harder for polyploid species where there are more than two copies of chromosomes. State-of-the-art neural approaches to solve this NP-hard problem do not adequately model relations among the reads that are important for deconvolving the input signal. We address this problem by developing a new method, called NeurHap, that combines graph representation learning with combinatorial optimization. Our experiments demonstrate substantially better performance of NeurHap in real and synthetic datasets compared to competing approaches.

READ FULL TEXT
research
12/22/2021

RepBin: Constraint-based Graph Representation Learning for Metagenomic Binning

Mixed communities of organisms are found in many environments (from the ...
research
11/27/2019

ComHapDet: A Spatial Community Detection Algorithm for Haplotype Assembly

Background: Haplotypes, the ordered lists of single nucleotide variation...
research
11/13/2019

A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction

Reconstructing components of a genomic mixture from data obtained by mea...
research
10/18/2022

Phase transition in the computational complexity of the shortest common superstring and genome assembly

Genome assembly, the process of reconstructing a long genetic sequence b...
research
11/25/2019

Orienting Ordered Scaffolds: Complexity and Algorithms

Despite the recent progress in genome sequencing and assembly, many of t...
research
12/13/2021

ViQUF: de novo Viral Quasispecies reconstruction using Unitig-based Flow networks

During viral infection, intrahost mutation and recombination can lead to...

Please sign up or login with your details

Forgot password? Click here to reset