ViQUF: de novo Viral Quasispecies reconstruction using Unitig-based Flow networks

12/13/2021
by   Borja Freire, et al.
0

During viral infection, intrahost mutation and recombination can lead to significant evolution, resulting in a population of viruses that harbor multiple haplotypes. The task of reconstructing these haplotypes from short-read sequencing data is called viral quasispecies assembly, and it can be categorized as a multiassembly problem. We consider the de novo version of the problem, where no reference is available. We present ViQUF, a de novo viral quasispecies assembler that addresses haplotype assembly and quantification. ViQUF obtains a first draft of the assembly graph from a de Bruijn graph. Then, solving a min-cost flow over a flow network built for each pair of adjacent vertices based on their paired-end information creates an approximate paired assembly graph with suggested frequency values as edge labels, which is the first frequency estimation. Then, original haplotypes are obtained through a greedy path reconstruction guided by a min-cost flow solution in the approximate paired assembly graph. ViQUF outputs the contigs with their frequency estimations. Results on real and simulated data show that ViQUF is at least four times faster using at most half of the memory than previous methods, while maintaining, and in some cases outperforming, the high quality of assembly and frequency estimation of overlap graph-based methodologies, which are known to be more accurate but slower than the de Bruijn graph-based approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2019

ComHapDet: A Spatial Community Detection Algorithm for Haplotype Assembly

Background: Haplotypes, the ordered lists of single nucleotide variation...
research
11/25/2022

Shotgun assembly of random graphs

Graph shotgun assembly refers to the problem of reconstructing a graph f...
research
02/12/2019

Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm

A large proportion of the basepairs in the long reads that third-generat...
research
11/10/2020

A step towards neural genome assembly

De novo genome assembly focuses on finding connections between a vast am...
research
10/21/2022

Graph Coloring via Neural Networks for Haplotype Assembly and Viral Quasispecies Reconstruction

Understanding genetic variation, e.g., through mutations, in organisms i...
research
09/13/2021

Specified Certainty Classification, with Application to Read Classification for Reference-Guided Metagenomic Assembly

Specified Certainty Classification (SCC) is a new paradigm for employing...
research
01/24/2019

Deterministic 2-Dimensional Temperature-1 Tile Assembly Systems Cannot Compute

We consider non cooperative binding in so called `temperature 1', in det...

Please sign up or login with your details

Forgot password? Click here to reset