GapPredict: A Language Model for Resolving Gaps in Draft Genome Assemblies

05/21/2021
by   Eric Chen, et al.
9

Short-read DNA sequencing instruments can yield over 1e+12 bases per run, typically composed of reads 150 bases long. Despite this high throughput, de novo assembly algorithms have difficulty reconstructing contiguous genome sequences using short reads due to both repetitive and difficult-to-sequence regions in these genomes. Some of the short read assembly challenges are mitigated by scaffolding assembled sequences using paired-end reads. However, unresolved sequences in these scaffolds appear as "gaps". Here, we introduce GapPredict, a tool that uses a character-level language model to predict unresolved nucleotides in scaffold gaps. We benchmarked GapPredict against the state-of-the-art gap-filling tool Sealer, and observed that the former can fill 65.6 the practical utility of deep learning approaches to the gap-filling problem in genome sequence assembly.

READ FULL TEXT
research
01/13/2018

Scalable De Novo Genome Assembly Using Pregel

De novo genome assembly is the process of stitching short DNA sequences ...
research
03/06/2022

A Crowdsourced Gameplay for Whole-Genome Assembly via Short Reads

Next-generation sequencing has revolutionized the field of genomics by p...
research
06/30/2018

Fast Characterization of Segmental Duplications in Genome Assemblies

Segmental duplications (SDs), or low-copy repeats (LCR), are segments of...
research
10/18/2022

Phase transition in the computational complexity of the shortest common superstring and genome assembly

Genome assembly, the process of reconstructing a long genetic sequence b...
research
11/15/2022

Taming Large-Scale Genomic Analyses via Sparsified Genomics

Searching for similar genomic sequences is an essential and fundamental ...
research
09/20/2013

mTim: Rapid and accurate transcript reconstruction from RNA-Seq data

Recent advances in high-throughput cDNA sequencing (RNA-Seq) technology ...

Please sign up or login with your details

Forgot password? Click here to reset