ALMERIA: Boosting pairwise molecular contrasts with scalable methods

04/28/2023
by   Rafael Mena-Yedra, et al.
0

Searching for potential active compounds in large databases is a necessary step to reduce time and costs in modern drug discovery pipelines. Such virtual screening methods seek to provide predictions that allow the search space to be narrowed down. Although cheminformatics has made great progress in exploiting the potential of available big data, caution is needed to avoid introducing bias and provide useful predictions with new compounds. In this work, we propose the decision-support tool ALMERIA (Advanced Ligand Multiconformational Exploration with Robust Interpretable Artificial Intelligence) for estimating compound similarities and activity prediction based on pairwise molecular contrasts while considering their conformation variability. The methodology covers the entire pipeline from data preparation to model selection and hyperparameter optimization. It has been implemented using scalable software and methods to exploit large volumes of data – in the order of several terabytes – , offering a very quick response even for a large batch of queries. The implementation and experiments have been performed in a distributed computer cluster using a benchmark, the public access DUD-E database. In addition to cross-validation, detailed data split criteria have been used to evaluate the models on different data partitions to assess their true generalization ability with new compounds. Experiments show state-of-the-art performance for molecular activity prediction (ROC AUC: 0.99, 0.96, 0.87), proving that the chosen data representation and modeling have good properties to generalize. Molecular conformations – prediction performance and sensitivity analysis – have also been evaluated. Finally, an interpretability analysis has been performed using the SHAP method.

READ FULL TEXT
research
07/25/2023

Curvature-based Transformer for Molecular Property Prediction

The prediction of molecular properties is one of the most important and ...
research
01/18/2019

Tunable Approximations to Control Time-to-Solution in an HPC Molecular Docking Mini-App

The drug discovery process involves several tasks to be performed in viv...
research
04/23/2020

MolTrans: Molecular Interaction Transformer for Drug Target Interaction Prediction

Drug target interaction (DTI) prediction is a foundational task for in s...
research
04/24/2023

Uni-QSAR: an Auto-ML Tool for Molecular Property Prediction

Recently deep learning based quantitative structure-activity relationshi...
research
06/12/2018

ToxicBlend: Virtual Screening of Toxic Compounds with Ensemble Predictors

Timely assessment of compound toxicity is one of the biggest challenges ...

Please sign up or login with your details

Forgot password? Click here to reset