Data-Efficient Graph Grammar Learning for Molecular Generation

03/15/2022
by   Minghao Guo, et al.
14

The problem of molecular generation has received significant attention recently. Existing methods are typically based on deep neural networks and require training on large datasets with tens of thousands of samples. In practice, however, the size of class-specific chemical datasets is usually limited (e.g., dozens of samples) due to labor-intensive experimentation and data collection. This presents a considerable challenge for the deep learning generative models to comprehensively describe the molecular design space. Another major challenge is to generate only physically synthesizable molecules. This is a non-trivial task for neural network-based generative models since the relevant chemical knowledge can only be extracted and generalized from the limited training data. In this work, we propose a data-efficient generative model that can be learned from datasets with orders of magnitude smaller sizes than common benchmarks. At the heart of this method is a learnable graph grammar that generates molecules from a sequence of production rules. Without any human assistance, these production rules are automatically constructed from training data. Furthermore, additional chemical knowledge can be incorporated in the model by further grammar optimization. Our learned graph grammar yields state-of-the-art results on generating high-quality molecules for three monomer datasets that contain only ∼20 samples each. Our approach also achieves remarkable performance in a challenging polymer generation task with only 117 training samples and is competitive against existing methods using 81k data points. Code is available at https://github.com/gmh14/data_efficient_grammar.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2023

Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction

The prediction of molecular properties is a crucial task in the field of...
research
05/15/2023

MolHF: A Hierarchical Normalizing Flow for Molecular Graph Generation

Molecular de novo design is a critical yet challenging task in scientifi...
research
12/06/2021

Keeping it Simple: Language Models can learn Complex Molecular Distributions

Deep generative models of molecules have grown immensely in popularity, ...
research
01/26/2020

GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation

Molecular graph generation is a fundamental problem for drug discovery a...
research
02/03/2022

Direct Molecular Conformation Generation

Molecular conformation generation aims to generate three-dimensional coo...
research
09/29/2017

ChemTS: An Efficient Python Library for de novo Molecular Generation

Automatic design of organic materials requires black-box optimization in...
research
11/27/2018

Grammars and reinforcement learning for molecule optimization

We seek to automate the design of molecules based on specific chemical p...

Please sign up or login with your details

Forgot password? Click here to reset