ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery

by   Andac Demir, et al.
The University of Texas at Dallas

In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93 for DUD-E Diverse dataset).


Ranking Structured Objects with Graph Neural Networks

Graph neural networks (GNNs) have been successfully applied in many stru...

CardiGraphormer: Unveiling the Power of Self-Supervised Learning in Revolutionizing Drug Discovery

In the expansive realm of drug discovery, with approximately 15,000 know...

MoleHD: Automated Drug Discovery using Brain-Inspired Hyperdimensional Computing

Modern drug discovery is often time-consuming, complex and cost-ineffect...

Deep Learning with Topological Signatures

Inferring topological and geometrical information from data can offer an...

Deep Surrogate Docking: Accelerating Automated Drug Discovery with Graph Neural Networks

The process of screening molecules for desirable properties is a key ste...

Please sign up or login with your details

Forgot password? Click here to reset