A semi-supervised learning framework for quantitative structure-activity regression modelling

01/07/2020
by   Oliver P Watson, et al.
0

Supervised learning models, also known as quantitative structure-activity regression (QSAR) models, are increasingly used in assisting the process of preclinical, small molecule drug discovery. The models are trained on data consisting of a finite dimensional representation of molecular structures and their corresponding target specific activities. These models can then be used to predict the activity of previously unmeasured novel compounds. In this work we address two problems related to this approach. The first is to estimate the extent to which the quality of the model predictions degrades for compounds very different from the compounds in the training data. The second is to adjust for the screening dependent selection bias inherent in many training data sets. In the most extreme cases, only compounds which pass an activity-dependent screening are reported. By using a semi-supervised learning framework, we show that it is possible to make predictions which take into account the similarity of the testing compounds to those in the training data and adjust for the reporting selection bias. We illustrate this approach using publicly available structure-activity data on a large set of compounds reported by GlaxoSmithKline (the Tres Cantos AntiMalarial Set) to inhibit in vitro P. falciparum growth.

READ FULL TEXT
research
11/11/2022

Semi-supervised Variational Autoencoder for Regression: Application on Soft Sensors

We present the development of a semi-supervised regression method using ...
research
02/03/2019

Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning

Predicting bioactivity and physical properties of small molecules is a c...
research
07/28/2022

Learning to Adapt Classifier for Imbalanced Semi-supervised Learning

Pseudo-labeling has proven to be a promising semi-supervised learning (S...
research
04/19/2019

Semi-Supervised First-Person Activity Recognition in Body-Worn Video

Body-worn cameras are now commonly used for logging daily life, sports, ...
research
04/06/2018

Ensemble Manifold Segmentation for Model Distillation and Semi-supervised Learning

Manifold theory has been the central concept of many learning methods. H...
research
07/24/2018

A decision theoretic approach to model evaluation in computational drug discovery

Artificial intelligence, trained via machine learning or computational s...
research
08/27/2021

Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function

DNA-encoded library (DEL) screening and quantitative structure-activity ...

Please sign up or login with your details

Forgot password? Click here to reset