mTim: Rapid and accurate transcript reconstruction from RNA-Seq data

09/20/2013
by   Georg Zeller, et al.
0

Recent advances in high-throughput cDNA sequencing (RNA-Seq) technology have revolutionized transcriptome studies. A major motivation for RNA-Seq is to map the structure of expressed transcripts at nucleotide resolution. With accurate computational tools for transcript reconstruction, this technology may also become useful for genome (re-)annotation, which has mostly relied on de novo gene finding where gene structures are primarily inferred from the genome sequence. We developed a machine-learning method, called mTim (margin-based transcript inference method) for transcript reconstruction from RNA-Seq read alignments that is based on discriminatively trained hidden Markov support vector machines. In addition to features derived from read alignments, it utilizes characteristic genomic sequences, e.g. around splice sites, to improve transcript predictions. mTim inferred transcripts that were highly accurate and relatively robust to alignment errors in comparison to those from Cufflinks, a widely used transcript assembly method.

READ FULL TEXT

page 2

page 5

page 6

research
02/12/2019

Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm

A large proportion of the basepairs in the long reads that third-generat...
research
03/06/2022

A Crowdsourced Gameplay for Whole-Genome Assembly via Short Reads

Next-generation sequencing has revolutionized the field of genomics by p...
research
06/14/2023

MIXALIME: MIXture models for ALlelic IMbalance Estimation in high-throughput sequencing data

Modern high-throughput sequencing assays efficiently capture not only ge...
research
05/21/2021

GapPredict: A Language Model for Resolving Gaps in Draft Genome Assemblies

Short-read DNA sequencing instruments can yield over 1e+12 bases per run...
research
07/02/2019

Machine Learning based Prediction of Hierarchical Classification of Transposable Elements

Transposable Elements (TEs) or jumping genes are the DNA sequences that ...
research
12/24/2021

Application of Markov Structure of Genomes to Outlier Identification and Read Classification

In this paper we apply the structure of genomes as second-order Markov p...
research
11/27/2019

ComHapDet: A Spatial Community Detection Algorithm for Haplotype Assembly

Background: Haplotypes, the ordered lists of single nucleotide variation...

Please sign up or login with your details

Forgot password? Click here to reset