ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation

09/11/2023
by   Gregory W. Kyro, et al.
0

The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. It is therefore of tremendous interest to develop methodologies that enhance the abilities and applicability of these powerful tools. In this work, we present a novel and efficient semi-supervised active learning methodology that allows for the fine-tuning of a generative model with respect to an objective function by strategically operating within a constructed representation of the sample space. In the context of targeted molecular generation, we demonstrate the ability to fine-tune a GPT-based molecular generator with respect to an attractive interaction-based scoring function by strategically operating within a chemical space proxy, thereby maximizing attractive interactions between the generated molecules and a protein target. Importantly, our approach does not require the individual evaluation of all data points that are used for fine-tuning, enabling the incorporation of computationally expensive metrics. We are hopeful that the inherent generality of this methodology ensures that it will remain applicable as this exciting field evolves. To facilitate implementation and reproducibility, we have made all of our software available through the open-source ChemSpaceAL Python package.

READ FULL TEXT

page 5

page 14

page 16

page 17

page 29

page 32

page 33

page 36

research
04/05/2022

In-Pocket 3D Graphs Enhance Ligand-Target Compatibility in Generative Small-Molecule Creation

Proteins in complex with small molecule ligands represent the core of st...
research
01/18/2018

Multi-Objective De Novo Drug Design with Conditional Graph Generative Model

Recently, deep generative models have revealed itself as a promising way...
research
12/01/2022

Re-evaluating sample efficiency in de novo molecule generation

De novo molecule generation can suffer from data inefficiency; requiring...
research
07/23/2022

A Ligand-and-structure Dual-driven Deep Learning Method for the Discovery of Highly Potent GnRH1R Antagonist to treat Uterine Diseases

Gonadotrophin-releasing hormone receptor (GnRH1R) is a promising therape...
research
05/27/2020

Targeted design of antiviral compounds against SARS-CoV-2 with conditional generative models

With the fast development of COVID-19 into a global pandemic, scientists...
research
02/17/2021

Accelerated Simulations of Molecular Systems through Learning of their Effective Dynamics

Simulations are vital for understanding and predicting the evolution of ...

Please sign up or login with your details

Forgot password? Click here to reset