Improving the Results of De novo Peptide Identification via Tandem Mass Spectrometry Using a Genetic Programming-based Scoring Function for Re-ranking Peptide-Spectrum Matches

08/12/2019
by   Samaneh Azari, et al.
0

De novo peptide sequencing algorithms have been widely used in proteomics to analyse tandem mass spectra (MS/MS) and assign them to peptides, but quality-control methods to evaluate the confidence of de novo peptide sequencing are lagging behind. A fundamental part of a quality-control method is the scoring function used to evaluate the quality of peptide-spectrum matches (PSMs). Here, we propose a genetic programming (GP) based method, called GP-PSM, to learn a PSM scoring function for improving the rate of confident peptide identification from MS/MS data. The GP method learns from thousands of MS/MS spectra. Important characteristics about goodness of the matches are extracted from the learning set and incorporated into the GP scoring functions. We compare GP-PSM with two methods including Support Vector Regression (SVR) and Random Forest (RF). The GP method along with RF and SVR, each is used for post-processing the results of peptide identification by PEAKS, a commonly used de novo sequencing method. The results show that GP-PSM outperforms RF and SVR and discriminates accurately between correct and incorrect PSMs. It correctly assigns peptides to 10 evaluation dataset containing 120 MS/MS spectra and decreases the false positive rate (FPR) of peptide identification.

READ FULL TEXT
research
02/03/2019

GA-Novo: De Novo Peptide Sequencing via Tandem Mass Spectrometry using Genetic Algorithm

Proteomics is the large-scale analysis of the proteins. The common metho...
research
04/04/2023

De-novo Identification of Small Molecules from Their GC-EI-MS Spectra

Identification of experimentally acquired mass spectra of unknown compou...
research
10/15/2021

A novel framework to quantify uncertainty in peptide-tandem mass spectrum matches with application to nanobody peptide identification

Nanobodies are small antibody fragments derived from camelids that selec...
research
09/04/2019

Gradients of Generative Models for Improved Discriminative Analysis of Tandem Mass Spectra

Tandem mass spectrometry (MS/MS) is a high-throughput technology used to...
research
12/28/2015

GELATO and SAGE: An Integrated Framework for MS Annotation

Several algorithms and tools have been developed to (semi) automate the ...
research
05/08/2018

Efficient online learning for large-scale peptide identification

Motivation: Post-database searching is a key procedure in peptide dentif...
research
03/23/2022

DPST: De Novo Peptide Sequencing with Amino-Acid-Aware Transformers

De novo peptide sequencing aims to recover amino acid sequences of a pep...

Please sign up or login with your details

Forgot password? Click here to reset