Validating the Validation: Reanalyzing a large-scale comparison of Deep Learning and Machine Learning models for bioactivity prediction

05/28/2019
by   Matthew C. Robinson, et al.
0

Machine learning methods may have the potential to significantly accelerate drug discovery. However, the increasing rate of new methodological approaches being published in the literature raises the fundamental question of how models should be benchmarked and validated. We reanalyze the data generated by a recently published large-scale comparison of machine learning models for bioactivity prediction and arrive at a somewhat different conclusion. We show that the performance of support vector machines is competitive with that of deep learning methods. Additionally, using a series of numerical experiments, we question the relevance of area under the receiver operating characteristic curve as a metric in virtual screening, and instead suggest that area under the precision-recall curve should be used in conjunction with the receiver operating characteristic. Our numerical experiments also highlight challenges in estimating the uncertainty in model performance via scaffold-split nested cross validation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2021

Predicting Participation in Cancer Screening Programs with Machine Learning

In this paper, we present machine learning models based on random forest...
research
08/23/2019

Bayesian Receiver Operating Characteristic Metric for Linear Classifiers

We propose a novel classifier accuracy metric: the Bayesian Area Under t...
research
01/29/2018

Tournament Leave-pair-out Cross-validation for Receiver Operating Characteristic (ROC) Analysis

Receiver operating characteristic (ROC) analysis is widely used for eval...
research
05/04/2016

Accelerating Deep Learning with Shrinkage and Recall

Deep Learning is a very powerful machine learning model. Deep Learning t...
research
02/02/2016

Development of an Ideal Observer that Incorporates Nuisance Parameters and Processes List-Mode Data

Observer models were developed to process data in list-mode format in or...
research
04/19/2022

Investigation of a Data Split Strategy Involving the Time Axis in Adverse Event Prediction Using Machine Learning

Adverse events are a serious issue in drug development and many predicti...

Please sign up or login with your details

Forgot password? Click here to reset