Improving Small Molecule Generation using Mutual Information Machine

08/18/2022
by   Danny Reidenbach, et al.
0

We address the task of controlled generation of small molecules, which entails finding novel molecules with desired properties under certain constraints (e.g., similarity to a reference molecule). Here we introduce MolMIM, a probabilistic auto-encoder for small molecule drug discovery that learns an informative and clustered latent space. MolMIM is trained with Mutual Information Machine (MIM) learning, and provides a fixed length representation of variable length SMILES strings. Since encoder-decoder models can learn representations with “holes” of invalid samples, here we propose a novel extension to the training procedure which promotes a dense latent space, and allows the model to sample valid molecules from random perturbations of latent codes. We provide a thorough comparison of MolMIM to several variable-size and fixed-size encoder-decoder models, demonstrating MolMIM's superior generation as measured in terms of validity, uniqueness, and novelty. We then utilize CMA-ES, a naive black-box and gradient free search algorithm, over MolMIM's latent space for the task of property guided molecule optimization. We achieve state-of-the-art results in several constrained single property optimization tasks as well as in the challenging task of multi-objective optimization, improving over previous success rate SOTA by more than 5% . We attribute the strong results to MolMIM's latent representation which clusters similar molecules in the latent space, whereas CMA-ES is often used as a baseline optimization method. We also demonstrate MolMIM to be favourable in a compute limited regime, making it an attractive model for such cases.

READ FULL TEXT
research
05/01/2022

Conditional β-VAE for De Novo Molecular Generation

Deep learning has significantly advanced and accelerated de novo molecul...
research
06/30/2021

Improving black-box optimization in VAE latent space using decoder uncertainty

Optimization in the latent space of variational autoencoders is a promis...
research
06/08/2019

A Two-Step Graph Convolutional Decoder for Molecule Generation

We propose a simple auto-encoder framework for molecule generation. The ...
research
05/23/2019

A COLD Approach to Generating Optimal Samples

Optimising discrete data for a desired characteristic using gradient-bas...
research
02/18/2020

SentenceMIM: A Latent Variable Language Model

We introduce sentenceMIM, a probabilistic auto-encoder for language mode...
research
06/25/2018

Improving Chemical Autoencoder Latent Space and Molecular De novo Generation Diversity with Heteroencoders

Chemical autoencoders are attractive models as they combine chemical spa...
research
09/03/2019

Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures

Generative models have achieved impressive results in many domains inclu...

Please sign up or login with your details

Forgot password? Click here to reset