Most Ligand-Based Benchmarks Measure Overfitting Rather than Accuracy

06/20/2017
by   Izhar Wallach, et al.
0

Undetected overfitting can occur when there are significant redundancies between training and validation data. We describe AVE, a new measure of training-validation redundancy for ligand-based classification problems that accounts for the similarity amongst inactive molecules as well as active. We investigated nine widely-used benchmarks for virtual screening and QSAR, and show that the amount of AVE bias strongly correlates with the performance of ligand-based predictive methods irrespective of the predicted property, chemical fingerprint, similarity measure, or previously-applied unbiasing techniques. Therefore, it is likely that the previously-reported performance of most ligand-based methods can be explained by overfitting to benchmarks rather than good prospective accuracy.

READ FULL TEXT

page 7

page 8

page 13

page 27

page 28

research
01/27/2022

The Implicit Bias of Benign Overfitting

The phenomenon of benign overfitting, where a predictor perfectly fits n...
research
06/06/2016

ROCS-Derived Features for Virtual Screening

Rapid overlay of chemical structures (ROCS) is a standard tool for the c...
research
11/12/2019

SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery

In drug-discovery-related tasks such as virtual screening, machine learn...
research
12/13/2021

Addressing Bias in Active Learning with Depth Uncertainty Networks... or Not

Farquhar et al. [2021] show that correcting for active learning bias wit...
research
05/24/2019

The advantages of multiple classes for reducing overfitting from test set reuse

Excessive reuse of holdout data can lead to overfitting. However, there ...
research
08/08/2019

Optimal multiclass overfitting by sequence reconstruction from Hamming queries

A primary concern of excessive reuse of test datasets in machine learnin...
research
05/24/2019

Perturbed Model Validation: A New Framework to Validate Model Relevance

This paper introduces PMV (Perturbed Model Validation), a new technique ...

Please sign up or login with your details

Forgot password? Click here to reset