An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries

10/19/2022
by   Aryan Pedawi, et al.
0

Virtual, make-on-demand chemical libraries have transformed early-stage drug discovery by unlocking vast, synthetically accessible regions of chemical space. Recent years have witnessed rapid growth in these libraries from millions to trillions of compounds, hiding undiscovered, potent hits for a variety of therapeutic targets. However, they are quickly approaching a size beyond that which permits explicit enumeration, presenting new challenges for virtual screening. To overcome these challenges, we propose the Combinatorial Synthesis Library Variational Auto-Encoder (CSLVAE). The proposed generative model represents such libraries as a differentiable, hierarchically-organized database. Given a compound from the library, the molecular encoder constructs a query for retrieval, which is utilized by the molecular decoder to reconstruct the compound by first decoding its chemical reaction and subsequently decoding its reactants. Our design minimizes autoregression in the decoder, facilitating the generation of large, valid molecular graphs. Our method performs fast and parallel batch inference for ultra-large synthesis libraries, enabling a number of important applications in early-stage drug discovery. Compounds proposed by our method are guaranteed to be in the library, and thus synthetically and cost-effectively accessible. Importantly, CSLVAE can encode out-of-library compounds and search for in-library analogues. In experiments, we demonstrate the capabilities of the proposed method in the navigation of massive combinatorial synthesis libraries.

READ FULL TEXT
research
04/26/2020

Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning

Over the last decade, there has been significant progress in the field o...
research
07/03/2023

CardiGraphormer: Unveiling the Power of Self-Supervised Learning in Revolutionizing Drug Discovery

In the expansive realm of drug discovery, with approximately 15,000 know...
research
03/31/2020

Application and Assessment of Deep Learning for the Generation of Potential NMDA Receptor Antagonists

Uncompetitive antagonists of the N-methyl D-aspartate receptor (NMDAR) h...
research
09/29/2020

ChemoVerse: Manifold traversal of latent spaces for novel molecule discovery

In order to design a more potent and effective chemical entity, it is es...
research
06/25/2018

Improving Chemical Autoencoder Latent Space and Molecular De novo Generation Diversity with Heteroencoders

Chemical autoencoders are attractive models as they combine chemical spa...
research
06/13/2019

FPScreen: A Rapid Similarity Search Tool for Massive Molecular Library Based on Molecular Fingerprint Comparison

We designed a fast similarity search engine for large molecular librarie...
research
11/25/2022

Synthesis Cost-Optimal Targeted Mutant Protein Libraries

Protein variant libraries produced by site-directed mutagenesis are a us...

Please sign up or login with your details

Forgot password? Click here to reset