A Crowdsourced Gameplay for Whole-Genome Assembly via Short Reads

03/06/2022
by   Dulani Meedeniya, et al.
0

Next-generation sequencing has revolutionized the field of genomics by producing accurate, rapid and cost-effective genome analysis with the use of high throughput sequencing technologies. This has intensified the need for accurate and performance efficient genome assemblers to assemble a large set of short reads produced by next-generation sequencing technology. Genome assembly is an NP-hard problem that is computationally challenging. Therefore, the current methods that rely on heuristic and approximation algorithms to assemble genomes prevent them from arriving at the most accurate solution. This paper presents a novel approach by gamifying whole-genome shotgun assembly from next-generation sequencing data; we present "Geno", a human-computing game designed with the aim of improving the accuracy of whole-genome shotgun assembly. We evaluate the feasibility of crowdsourcing the problem of whole-genome shotgun assembly by breaking the problem into small subtasks. The evaluation results, for single-cell Escherichia coli K-12 substr. MG1655 with a read length of 25 bp that produced 144,867 game instances of mean 25 sequences per instance at 40x coverage indicate the feasibility of sub-tasking the problem of genome assembly to be solved using crowdsourcing.

READ FULL TEXT
research
02/12/2019

Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm

A large proportion of the basepairs in the long reads that third-generat...
research
10/18/2022

Phase transition in the computational complexity of the shortest common superstring and genome assembly

Genome assembly, the process of reconstructing a long genetic sequence b...
research
11/27/2019

ComHapDet: A Spatial Community Detection Algorithm for Haplotype Assembly

Background: Haplotypes, the ordered lists of single nucleotide variation...
research
05/21/2021

GapPredict: A Language Model for Resolving Gaps in Draft Genome Assemblies

Short-read DNA sequencing instruments can yield over 1e+12 bases per run...
research
09/20/2013

mTim: Rapid and accurate transcript reconstruction from RNA-Seq data

Recent advances in high-throughput cDNA sequencing (RNA-Seq) technology ...
research
01/13/2019

Machine-learning a virus assembly fitness landscape

Realistic evolutionary fitness landscapes are notoriously difficult to c...
research
10/13/2022

Fast genomic optical map assembly algorithm using binary representation

Reducing the cost of sequencing genomes provided by next-generation sequ...

Please sign up or login with your details

Forgot password? Click here to reset