Learning to Untangle Genome Assembly with Graph Convolutional Networks

06/01/2022
by   Lovro Vrček, et al.
20

A quest to determine the complete sequence of a human DNA from telomere to telomere started three decades ago and was finally completed in 2021. This accomplishment was a result of a tremendous effort of numerous experts who engineered various tools and performed laborious manual inspection to achieve the first gapless genome sequence. However, such method can hardly be used as a general approach to assemble different genomes, especially when the assembly speed is critical given the large amount of data. In this work, we explore a different approach to the central part of the genome assembly task that consists of untangling a large assembly graph from which a genomic sequence needs to be reconstructed. Our main motivation is to reduce human-engineered heuristics and use deep learning to develop more generalizable reconstruction techniques. Precisely, we introduce a new learning framework to train a graph convolutional network to resolve assembly graphs by finding a correct path through them. The training is supervised with a dataset generated from the resolved CHM13 human sequence and tested on assembly graphs built using real human PacBio HiFi reads. Experimental results show that a model, trained on simulated graphs generated solely from a single chromosome, is able to remarkably resolve all other chromosomes. Moreover, the model outperforms hand-crafted heuristics from a state-of-the-art de novo assembler on the same graphs. Reconstructed chromosomes with graph networks are more accurate on nucleotide level, report lower number of contigs, higher genome reconstructed fraction and NG50/NGA50 assessment metrics.

READ FULL TEXT

page 20

page 21

page 22

page 23

research
11/10/2020

A step towards neural genome assembly

De novo genome assembly focuses on finding connections between a vast am...
research
01/13/2018

Scalable De Novo Genome Assembly Using Pregel

De novo genome assembly is the process of stitching short DNA sequences ...
research
10/20/2020

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

One of the most computationally intensive tasks in computational biology...
research
11/13/2019

A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction

Reconstructing components of a genomic mixture from data obtained by mea...
research
11/25/2020

Genome assembly, a universal theoretical framework: unifying and generalizing the safe and complete algorithms

Genome assembly is a fundamental problem in Bioinformatics, requiring to...
research
10/03/2022

Sequential Brick Assembly with Efficient Constraint Satisfaction

We address the problem of generating a sequence of LEGO brick assembly w...
research
09/19/2018

Extreme Scale De Novo Metagenome Assembly

Metagenome assembly is the process of transforming a set of short, overl...

Please sign up or login with your details

Forgot password? Click here to reset