DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery – A Focus on Affinity Prediction Problems with Noise Annotations

01/24/2022
by   Yuanfeng Ji, et al.
1

AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with noise, which is inevitable in real world AIDD applications. In this work, we present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for graph OOD learning problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.

READ FULL TEXT
research
09/16/2022

ImDrug: A Benchmark for Deep Imbalanced Learning in AI-aided Drug Discovery

The last decade has witnessed a prosperous development of computational ...
research
01/15/2023

Geometric Graph Learning with Extended Atom-Types Features for Protein-Ligand Binding Affinity Prediction

Understanding and accurately predicting protein-ligand binding affinity ...
research
03/12/2018

Spatial Graph Convolutions for Drug Discovery

Predicting the binding free energy, or affinity, of a small molecule for...
research
08/17/2023

Embracing assay heterogeneity with neural processes for markedly improved bioactivity predictions

Predicting the bioactivity of a ligand is one of the hardest and most im...
research
10/29/2021

DOCKSTRING: easy molecular docking yields better benchmarks for ligand design

The field of machine learning for drug discovery is witnessing an explos...
research
02/19/2016

Learning to SMILE(S)

This paper shows how one can directly apply natural language processing ...
research
11/07/2022

ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery

In computer-aided drug discovery (CADD), virtual screening (VS) is used ...

Please sign up or login with your details

Forgot password? Click here to reset