Scalable Gaussian process regression enables accurate prediction of protein and small molecule properties with uncertainty quantitation

02/07/2023
by   Jonathan Parkinson, et al.
0

Gaussian process (GP) is a Bayesian model which provides several advantages for regression tasks in machine learning such as reliable quantitation of uncertainty and improved interpretability. Their adoption has been precluded by their excessive computational cost and by the difficulty in adapting them for analyzing sequences (e.g. amino acid and nucleotide sequences) and graphs (e.g. ones representing small molecules). In this study, we develop efficient and scalable approaches for fitting GP models as well as fast convolution kernels which scale linearly with graph or sequence size. We implement these improvements by building an open-source Python library called xGPR. We compare the performance of xGPR with the reported performance of various deep learning models on 20 benchmarks, including small molecule, protein sequence and tabular data. We show that xGRP achieves highly competitive performance with much shorter training time. Furthermore, we also develop new kernels for sequence and graph data and show that xGPR generally outperforms convolutional neural networks on predicting key properties of proteins and small molecules. Importantly, xGPR provides uncertainty information not available from typical deep learning models. Additionally, xGPR provides a representation of the input data that can be used for clustering and data visualization. These results demonstrate that xGPR provides a powerful and generic tool that can be broadly useful in protein engineering and drug discovery.

READ FULL TEXT

page 28

page 29

page 30

research
02/08/2018

mGPfusion: Predicting protein stability changes with Gaussian process kernel learning and data fusion

Proteins are commonly used by biochemical industry for numerous processe...
research
11/06/2018

DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

Identification of drug-target interactions (DTIs) plays a key role in dr...
research
12/15/2020

Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction

The characterization of drug-protein interactions is crucial in the high...
research
01/29/2021

BridgeDPI: A Novel Graph Neural Network for Predicting Drug-Protein Interactions

Motivation: Exploring drug-protein interactions (DPIs) work as a pivotal...
research
06/21/2023

Predicting protein variants with equivariant graph neural networks

Pre-trained models have been successful in many protein engineering task...
research
02/21/2023

On Inductive Biases for Machine Learning in Data Constrained Settings

Learning with limited data is one of the biggest problems of machine lea...

Please sign up or login with your details

Forgot password? Click here to reset