Protein identification with deep learning: from abc to xyz

by   Ngoc Hieu Tran, et al.

Proteins are the main workhorses of biological functions in a cell, a tissue, or an organism. Identification and quantification of proteins in a given sample, e.g. a cell type under normal/disease conditions, are fundamental tasks for the understanding of human health and disease. In this paper, we present DeepNovo, a deep learning-based tool to address the problem of protein identification from tandem mass spectrometry data. The idea was first proposed in the context of de novo peptide sequencing [1] in which convolutional neural networks and recurrent neural networks were applied to predict the amino acid sequence of a peptide from its spectrum, a similar task to generating a caption from an image. We further develop DeepNovo to perform sequence database search, the main technique for peptide identification that greatly benefits from numerous existing protein databases. We combine two modules de novo sequencing and database search into a single deep learning framework for peptide identification, and integrate de Bruijn graph assembly technique to offer a complete solution to reconstruct protein sequences from tandem mass spectrometry data. This paper describes a comprehensive protocol of DeepNovo for protein identification, including training neural network models, dynamic programming search, database querying, estimation of false discovery rate, and de Bruijn graph assembly. Training and testing data, model implementations, and comprehensive tutorials in form of IPython notebooks are available in our GitHub repository (


page 1

page 2

page 3

page 4


Computational prediction and analysis of protein-protein interaction networks

Biological networks provide insight into the complex organization of bio...

PGR: A Graph Repository of Protein 3D-Structures

Graph theory and graph mining constitute rich fields of computational te...

Deep Contextual Learners for Protein Networks

Spatial context is central to understanding health and disease. Yet refe...

EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation

During the past decade, with the significant progress of computational p...

DeepIso: A Deep Learning Model for Peptide Feature Detection

Liquid chromatography with tandem mass spectrometry (LC-MS/MS) based pro...

MutaGAN: A Seq2seq GAN Framework to Predict Mutations of Evolving Protein Populations

The ability to predict the evolution of a pathogen would significantly i...

DePS: An improved deep learning model for de novo peptide sequencing

De novo peptide sequencing from mass spectrometry data is an important m...

Code Repositories


Protein Identification with Deep Learning

view repo