Enrichment Score: a better quantitative metric for evaluating the enrichment capacity of molecular docking models

10/19/2022
by   Ian Scott Knight, et al.
0

The standard quantitative metric for evaluating enrichment capacity known as LogAUC depends on a cutoff parameter that controls what the minimum value of the log-scaled x-axis is. Unless this parameter is chosen carefully for a given ROC curve, one of the two following problems occurs: either (1) some fraction of the first inter-decoy intervals of the ROC curve are simply thrown away and do not contribute to the metric at all, or (2) the very first inter-decoy interval contributes too much to the metric at the expense of all following inter-decoy intervals. We fix this problem with LogAUC by showing a simple way to choose the cutoff parameter based on the number of decoys which forces the first inter-decoy interval to always have a stable, sensible contribution to the total value. Moreover, we introduce a normalized version of LogAUC known as enrichment score, which (1) enforces stability by selecting the cutoff parameter in the manner described, (2) yields scores which are more intuitively meaningful, and (3) allows reliably accurate comparison of the enrichment capacities exhibited by different ROC curves, even those produced using different numbers of decoys. Finally, we demonstrate the advantage of enrichment score over unbalanced metrics using data from a real retrospective docking study performed using the program DOCK 3.7 on the target receptor TRYB1 included in the DUDE-Z benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/08/2023

Student's t-Distribution: On Measuring the Inter-Rater Reliability When the Observations are Scarce

In natural language processing (NLP) we always rely on human judgement a...
research
04/08/2014

A Stochastic Temporal Model of Polyphonic MIDI Performance with Ornaments

We study indeterminacies in realization of ornaments and how they can be...
research
05/17/2023

A Better Way to Do Masked Language Model Scoring

Estimating the log-likelihood of a given sentence under an autoregressiv...
research
11/02/2012

Learning curves for multi-task Gaussian process regression

We study the average case performance of multi-task Gaussian process (GP...
research
07/07/2022

Comparing Confidence Intervals for a Binomial Proportion with the Interval Score

There are over 55 different ways to construct a confidence respectively ...
research
09/22/2021

Estimating the number of serial killers that were never caught

Many serial killers commit tens of murders. At the same time inter-murde...

Please sign up or login with your details

Forgot password? Click here to reset