Peptide-Spectra Matching from Weak Supervision

08/20/2018
by   Samuel S. Schoenholz, et al.
4

As in many other scientific domains, we face a fundamental problem when using machine learning to identify proteins from mass spectrometry data: large ground truth datasets mapping inputs to correct outputs are extremely difficult to obtain. Instead, we have access to imperfect hand-coded models crafted by domain experts. In this paper, we apply deep neural networks to an important step of the protein identification problem, the pairing of mass spectra with short sequences of amino acids called peptides. We train our model to differentiate between top scoring results from a state-of-the art classical system and hard-negative second and third place results. Our resulting model is much better at identifying peptides with spectra than the model used to generate its training data. In particular, we achieve a 43 standard matching methods and a 10 matching method and an industry standard cross-spectra reranking tool. Importantly, in a more difficult experimental regime that reflects current challenges facing biologists, our advantage over the previous state-of-the-art grows to 15 other challenging scientific problems.

READ FULL TEXT
11/21/2018

Predicting Electron-Ionization Mass Spectrometry using Neural Networks

When confronted with a substance of unknown identity, researchers often ...
10/02/2020

Machine-learning-enhanced time-of-flight mass spectrometry analysis

Mass spectrometry is a widespread approach to work out what are the cons...
11/30/2020

AGNet: Weighing Black Holes with Machine Learning

Supermassive black holes (SMBHs) are ubiquitously found at the centers o...
11/08/2021

MassFormer: Tandem Mass Spectrum Prediction with Graph Transformers

Mass spectrometry is a key tool in the study of small molecules, playing...
03/22/2019

Artificial intelligence-based process for metal scrap sorting

Machine learning offers remarkable benefits for improving workplaces and...
01/23/2019

Rapid identification of pathogenic bacteria using Raman spectroscopy and deep learning

Rapid identification of bacteria is essential to prevent the spread of i...
09/04/2019

Gradients of Generative Models for Improved Discriminative Analysis of Tandem Mass Spectra

Tandem mass spectrometry (MS/MS) is a high-throughput technology used to...