A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction

11/13/2019
by   Ziqi Ke, et al.
0

Reconstructing components of a genomic mixture from data obtained by means of DNA sequencing is a challenging problem encountered in a variety of applications including single individual haplotyping and studies of viral communities. High-throughput DNA sequencing platforms oversample mixture components to provide massive amounts of reads whose relative positions can be determined by mapping the reads to a known reference genome; assembly of the components, however, requires discovery of the reads' origin – an NP-hard problem that the existing methods struggle to solve with the required level of accuracy. In this paper, we present a learning framework based on a graph auto-encoder designed to exploit structural properties of sequencing data. The algorithm is a neural network which essentially trains to ignore sequencing errors and infers the posteriori probabilities of the origin of sequencing reads. Mixture components are then reconstructed by finding consensus of the reads determined to originate from the same genomic component. Results on realistic synthetic as well as experimental data demonstrate that the proposed framework reliably assembles haplotypes and reconstructs viral communities, often significantly outperforming state-of-the-art techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2020

Real-Time Radio Technology and Modulation Classification via an LSTM Auto-Encoder

Identification of the type of communication technology and/or modulation...
research
01/13/2018

Scalable De Novo Genome Assembly Using Pregel

De novo genome assembly is the process of stitching short DNA sequences ...
research
06/01/2022

Learning to Untangle Genome Assembly with Graph Convolutional Networks

A quest to determine the complete sequence of a human DNA from telomere ...
research
10/21/2022

Graph Coloring via Neural Networks for Haplotype Assembly and Viral Quasispecies Reconstruction

Understanding genetic variation, e.g., through mutations, in organisms i...
research
06/13/2018

Matrix Completion and Performance Guarantees for Single Individual Haplotyping

Single individual haplotyping is an NP-hard problem that emerges when at...
research
11/27/2019

ComHapDet: A Spatial Community Detection Algorithm for Haplotype Assembly

Background: Haplotypes, the ordered lists of single nucleotide variation...

Please sign up or login with your details

Forgot password? Click here to reset