Peptide-Spectra Matching from Weak Supervision

08/20/2018
by   Samuel S. Schoenholz, et al.
4

As in many other scientific domains, we face a fundamental problem when using machine learning to identify proteins from mass spectrometry data: large ground truth datasets mapping inputs to correct outputs are extremely difficult to obtain. Instead, we have access to imperfect hand-coded models crafted by domain experts. In this paper, we apply deep neural networks to an important step of the protein identification problem, the pairing of mass spectra with short sequences of amino acids called peptides. We train our model to differentiate between top scoring results from a state-of-the art classical system and hard-negative second and third place results. Our resulting model is much better at identifying peptides with spectra than the model used to generate its training data. In particular, we achieve a 43 standard matching methods and a 10 matching method and an industry standard cross-spectra reranking tool. Importantly, in a more difficult experimental regime that reflects current challenges facing biologists, our advantage over the previous state-of-the-art grows to 15 other challenging scientific problems.

READ FULL TEXT
research
11/21/2018

Predicting Electron-Ionization Mass Spectrometry using Neural Networks

When confronted with a substance of unknown identity, researchers often ...
research
10/02/2020

Machine-learning-enhanced time-of-flight mass spectrometry analysis

Mass spectrometry is a widespread approach to work out what are the cons...
research
11/30/2020

AGNet: Weighing Black Holes with Machine Learning

Supermassive black holes (SMBHs) are ubiquitously found at the centers o...
research
11/08/2021

MassFormer: Tandem Mass Spectrum Prediction with Graph Transformers

Mass spectrometry is a key tool in the study of small molecules, playing...
research
09/04/2019

Gradients of Generative Models for Improved Discriminative Analysis of Tandem Mass Spectra

Tandem mass spectrometry (MS/MS) is a high-throughput technology used to...
research
01/23/2019

Rapid identification of pathogenic bacteria using Raman spectroscopy and deep learning

Rapid identification of bacteria is essential to prevent the spread of i...
research
03/22/2019

Artificial intelligence-based process for metal scrap sorting

Machine learning offers remarkable benefits for improving workplaces and...

Please sign up or login with your details

Forgot password? Click here to reset