Development and Evaluation of Conformal Prediction Methods for QSAR

04/03/2023
by   Yuting Xu, et al.
0

The quantitative structure-activity relationship (QSAR) regression model is a commonly used technique for predicting biological activities of compounds using their molecular descriptors. Predictions from QSAR models can help, for example, to optimize molecular structure; prioritize compounds for further experimental testing; and estimate their toxicity. In addition to the accurate estimation of the activity, it is highly desirable to obtain some estimate of the uncertainty associated with the prediction, e.g., calculate a prediction interval (PI) containing the true molecular activity with a pre-specified probability, say 70 (ML) algorithms that achieve superior predictive performance require some add-on methods for estimating uncertainty of their prediction. The development of these algorithms is an active area of research by statistical and ML communities but their implementation for QSAR modeling remains limited. Conformal prediction (CP) is a promising approach. It is agnostic to the prediction algorithm and can produce valid prediction intervals under some weak assumptions on the data distribution. We proposed computationally efficient CP algorithms tailored to the most advanced ML models, including Deep Neural Networks and Gradient Boosting Machines. The validity and efficiency of proposed conformal predictors are demonstrated on a diverse collection of QSAR datasets as well as simulation studies.

READ FULL TEXT

page 5

page 13

page 14

page 16

page 17

page 19

page 25

page 26

research
12/30/2022

Conformal Prediction Intervals for Remaining Useful Lifetime Estimation

The main objective of Prognostics and Health Management is to estimate t...
research
02/15/2023

Activity Cliff Prediction: Dataset and Benchmark

Activity cliffs (ACs), which are generally defined as pairs of structura...
research
04/10/2019

Classification of signaling proteins based on molecular star graph descriptors using Machine Learning models

Signaling proteins are an important topic in drug development due to the...
research
04/23/2023

Quantile Extreme Gradient Boosting for Uncertainty Quantification

As the availability, size and complexity of data have increased in recen...
research
01/31/2023

Exploring QSAR Models for Activity-Cliff Prediction

Pairs of similar compounds that only differ by a small structural modifi...
research
04/23/2018

Descriptor Selection via Self-Paced Learning for Bioactivity of Molecular Structure in QSAR Classification

Quantitative structure-activity relationship (QSAR) modelling is effecti...
research
04/23/2018

QSAR Classification Modeling for Bioactivity of Molecular Structure via SPL-Logsum

Quantitative structure-activity relationship (QSAR) modelling is effecti...

Please sign up or login with your details

Forgot password? Click here to reset