Probabilistic hypergraph grammars for efficient molecular optimization

06/05/2019
by   Egor Kraev, et al.
0

We present an approach to make molecular optimization more efficient. We infer a hypergraph replacement grammar from the ChEMBL database, count the frequencies of particular rules being used to expand particular nonterminals in other rules, and use these as conditional priors for the policy model. Simulating random molecules from the resulting probabilistic grammar, we show that conditional priors result in a molecular distribution closer to the training set than using equal rule probabilities or unconditional priors. We then treat molecular optimization as a reinforcement learning problem, using a novel modification of the policy gradient algorithm - batch-advantage: using individual rewards minus the batch average reward to weight the log probability loss. The reinforcement learning agent is tasked with building molecules using this grammar, with the goal of maximizing benchmark scores available from the literature. To do so, the agent has policies both to choose the next node in the graph to expand and to select the next grammar rule to apply. The policies are implemented using the Transformer architecture with the partially expanded graph as the input at each step. We show that using the empirical priors as the starting point for a policy eliminates the need for pre-training, and allows us to reach optima faster. We achieve competitive performance on common benchmarks from the literature, such as penalized logP and QED, with only hundreds of training steps on a budget GPU instance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2018

Grammars and reinforcement learning for molecule optimization

We seek to automate the design of molecules based on specific chemical p...
research
09/08/2018

Molecular Hypergraph Grammar with its Application to Molecular Optimization

This paper is concerned with a molecular optimization framework using va...
research
11/14/2020

Reinforced Molecular Optimization with Neighborhood-Controlled Grammars

A major challenge in the pharmaceutical industry is to design novel mole...
research
06/07/2018

Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation

Generating novel graph structures that optimize given objectives while o...
research
09/04/2023

Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction

The prediction of molecular properties is a crucial task in the field of...
research
08/11/2023

Scaling Up Toward Automated Black-box Reverse Engineering of Context-Free Grammars

Black-box context-free grammar inference is a hard problem as in many pr...

Please sign up or login with your details

Forgot password? Click here to reset