Peptide-Spectra Matching from Weak Supervision

by   Samuel S. Schoenholz, et al.

As in many other scientific domains, we face a fundamental problem when using machine learning to identify proteins from mass spectrometry data: large ground truth datasets mapping inputs to correct outputs are extremely difficult to obtain. Instead, we have access to imperfect hand-coded models crafted by domain experts. In this paper, we apply deep neural networks to an important step of the protein identification problem, the pairing of mass spectra with short sequences of amino acids called peptides. We train our model to differentiate between top scoring results from a state-of-the art classical system and hard-negative second and third place results. Our resulting model is much better at identifying peptides with spectra than the model used to generate its training data. In particular, we achieve a 43 standard matching methods and a 10 matching method and an industry standard cross-spectra reranking tool. Importantly, in a more difficult experimental regime that reflects current challenges facing biologists, our advantage over the previous state-of-the-art grows to 15 other challenging scientific problems.


Predicting Electron-Ionization Mass Spectrometry using Neural Networks

When confronted with a substance of unknown identity, researchers often ...

Machine-learning-enhanced time-of-flight mass spectrometry analysis

Mass spectrometry is a widespread approach to work out what are the cons...

AGNet: Weighing Black Holes with Machine Learning

Supermassive black holes (SMBHs) are ubiquitously found at the centers o...

MassFormer: Tandem Mass Spectrum Prediction with Graph Transformers

Mass spectrometry is a key tool in the study of small molecules, playing...

Artificial intelligence-based process for metal scrap sorting

Machine learning offers remarkable benefits for improving workplaces and...

Rapid identification of pathogenic bacteria using Raman spectroscopy and deep learning

Rapid identification of bacteria is essential to prevent the spread of i...

Gradients of Generative Models for Improved Discriminative Analysis of Tandem Mass Spectra

Tandem mass spectrometry (MS/MS) is a high-throughput technology used to...