PromID: human promoter prediction by deep learning

10/02/2018
by   Ramzan Umarov, et al.
4

Computational identification of promoters is notoriously difficult as human genes often have unique promoter sequences that provide regulation of transcription and interaction with transcription initiation complex. While there are many attempts to develop computational promoter identification methods, we have no reliable tool to analyze long genomic sequences. In this work we further develop our deep learning approach that was relatively successful to discriminate short promoter and non-promoter sequences. Instead of focusing on the classification accuracy, in this work we predict the exact positions of the TSS inside the genomic sequences testing every possible location. We studied human promoters to find effective regions for discrimination and built corresponding deep learning models. These models use adaptively constructed negative set which iteratively improves the models discriminative ability. The developed promoter identification models significantly outperform the previously developed promoter prediction programs by considerably reducing the number of false positive predictions. The best model we have built has recall 0.76, precision 0.77 and MCC 0.76, while the next best tool FPROM achieved precision 0.48 and MCC 0.60 for the recall of 0.75. Our method is available at http://www.cbrc.kaust.edu.sa/PromID/.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2020

The Effect of Class Imbalance on Precision-Recall Curves

In this note I study how the precision of a classifier depends on the ra...
research
03/23/2022

A Supervised Machine Learning Approach for Sequence Based Protein-protein Interaction (PPI) Prediction

Computational protein-protein interaction (PPI) prediction techniques ca...
research
08/30/2021

An Enhanced Machine Learning Topic Classification Methodology for Cybersecurity

In this research, we use user defined labels from three internet text so...
research
03/16/2022

DePS: An improved deep learning model for de novo peptide sequencing

De novo peptide sequencing from mass spectrometry data is an important m...
research
10/11/2022

High-precision Density Mapping of Marine Debris and Floating Plastics via Satellite Imagery

Combining multi-spectral satellite data and machine learning has been su...
research
10/22/2018

LAMVI-2: A Visual Tool for Comparing and Tuning Word Embedding Models

Tuning machine learning models, particularly deep learning architectures...
research
05/15/2018

Neural Classification of Malicious Scripts: A study with JavaScript and VBScript

Malicious scripts are an important computer infection threat vector. Our...

Please sign up or login with your details

Forgot password? Click here to reset