Keeping it Simple: Language Models can learn Complex Molecular Distributions

by   Daniel Flam-Shepherd, et al.

Deep generative models of molecules have grown immensely in popularity, trained on relevant datasets, these models are used to search through chemical space. The downstream utility of generative models for the inverse design of novel functional compounds depends on their ability to learn a training distribution of molecules. The most simple example is a language model that takes the form of a recurrent neural network and generates molecules using a string representation. More sophisticated are graph generative models, which sequentially construct molecular graphs and typically achieve state of the art results. However, recent work has shown that language models are more capable than once thought, particularly in the low data regime. In this work, we investigate the capacity of simple language models to learn distributions of molecules. For this purpose, we introduce several challenging generative modeling tasks by compiling especially complex distributions of molecules. On each task, we evaluate the ability of language models as compared with two widely used graph generative models. The results demonstrate that language models are powerful generative models, capable of adeptly learning complex molecular distributions – and yield better performance than the graph models. Language models can accurately generate: distributions of the highest scoring penalized LogP molecules in ZINC15, multi-modal molecular distributions as well as the largest molecules in PubChem.



page 1

page 2

page 3

page 4


Learning to design drug-like molecules in three-dimensional space using deep generative models

Recently, deep generative models for molecular graphs are gaining more a...

GEN: Highly Efficient SMILES Explorer Using Autodidactic Generative Examination Networks

Recurrent neural networks have been widely used to generate millions of ...

Data-Efficient Graph Grammar Learning for Molecular Generation

The problem of molecular generation has received significant attention r...

Deep learning for molecular generation and optimization - a review of the state of the art

In the space of only a few years, deep generative modeling has revolutio...

Generating equilibrium molecules with deep neural networks

Discovery of atomistic systems with desirable properties is a major chal...

Molecular Identification from AFM images using the IUPAC Nomenclature and Attribute Multimodal Recurrent Neural Networks

Despite being the main tool to visualize molecules at the atomic scale, ...

Generative Enriched Sequential Learning (ESL) Approach for Molecular Design via Augmented Domain Knowledge

Deploying generative machine learning techniques to generate novel chemi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.