Composing Molecules with Multiple Property Constraints

02/08/2020 ∙ by Wengong Jin, et al. ∙ MIT 7

Drug discovery aims to find novel compounds with specified chemical property profiles. In terms of generative modeling, the goal is to learn to sample molecules in the intersection of multiple property constraints. This task becomes increasingly challenging when there are many property constraints. We propose to offset this complexity by composing molecules from a vocabulary of substructures that we call molecular rationales. These rationales are identified from molecules as substructures that are likely responsible for each property of interest. We then learn to expand rationales into a full molecule using graph generative models. Our final generative model composes molecules as mixtures of multiple rationale completions, and this mixture is fine-tuned to preserve the properties of interest. We evaluate our model on various drug design tasks and demonstrate significant improvements over state-of-the-art baselines in terms of accuracy, diversity, and novelty of generated compounds.



There are no comments yet.


page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: Illustration of our rationale based generative model. To generate a dual inhibitor against biological targets GSK3 and JNK3, our model first identifies rationale substructures for each property, and then learns to compose them into a full molecule . Note that rationales are not provided as domain knowledge.

The key challenge in drug discovery is to find molecules that satisfy multiple constraints, from potency, safety, to desired metabolic profiles. Optimizing these constraints simultaneously is challenging for existing computational models. The primary difficulty lies in the lack of training instances of molecules that conform to all the constraints. For example, for this reason, Jin et al. (2019a) reports over 60% performance loss when moving beyond the single-constraint setting.

In this paper, we propose a novel approach to multi-property molecular optimization. Our strategy is inspired by fragment-based drug discovery (Murray & Rees, 2009) often followed by medicinal chemists. The idea is to start with substructures (e.g., functional groups or later pieces) that drive specific properties of interest, and then combine these building blocks into a target molecule. To automate this process, our model has to learn two complementary tasks illustrated in Figure 1: (1) identification of the building blocks that we call rationales, and (2) assembling multiple rationales together into a fully formed target molecule. In contrast to competing methods, our generative model does not build molecules from scratch, but instead assembles them from automatically extracted rationales already implicated for specific properties (see Figure 1).

We implement this idea using a generative model of molecules where the rationale choices play the role of latent variables. Specifically, a molecular graph is generated from underlying rationale sets according to:


As ground truth rationales (e.g., functional groups or subgraphs) are not provided, the model has to extract candidate rationales from molecules with the help of a property predictor. We formulate this task as a discrete optimization problem efficiently solved by Monte Carlo tree search. Our rationale conditioned graph generator,

, is initially trained on a large collection of real molecules so that it is capable of expanding any subgraph into a full molecule. The mixture model is then fine-tuned using reinforcement learning to ensure that the generated molecules preserve all the properties of interest. This training paradigm enables us to realize molecules that satisfy multiple constraints without observing any such instances in the training set.

The proposed model is evaluated on molecule design tasks under different combinations of property constraints. Our baselines include state-of-the-art molecule generation methods (Olivecrona et al., 2017; You et al., 2018a). Across all tasks, our model achieve state-of-the art results in terms of accuracy, novelty and diversity of generated compounds. In particular, we outperform the best baseline with 38% absolute improvement in the task with three property constraints. We further provide ablation studies to validate the benefit of our architecture in the low-resource scenario. Finally, we show that identified rationales are chemically meaningful in a toxicity prediction task (Sushko et al., 2012).

2 Related Work

Reinforcement Learning One of the prevailing paradigms for drug design is reinforcement learning (RL) (You et al., 2018a; Olivecrona et al., 2017; Popova et al., 2018), which seeks to maximize the expected reward defined as the sum of predicted property scores using the property predictors. Their approach learns a distribution

(a neural network) for generating molecules. Ideally, the model should achieve high success rate in generating molecules that meet all the constraints, while maintaining the diversity of


The main challenge of RL lies in the sparsity of rewards, especially when there are multiple competing constraints. For illustration, we tested a state-of-the-art reinforcement learning method (Olivecrona et al., 2017) under three property constraints: biological activity to target DRD2, GSK3 and JNK3 (Li et al., 2018). As shown in Figure 2, initially the success rate and diversity is high when given only one of the constraints, but they decrease dramatically when all the property constraints are added. The reason of this failure is that the property predictor (i.e., reward function) remains black-box and the model has limited understand of how and why certain molecules are desirable.

Our framework offsets this complexity by understanding property landscape through rationales. At a high level, the rationales are analogous to options (Sutton et al., 1999; Stolle & Precup, 2002), which are macro-actions leading the agent faster to its goal. The rationales are automatically discovered from molecules with labeled properties.

Figure 2: Challenge of multi-objective drug design. Standard reinforcement learning method (Olivecrona et al., 2017) fails under three property constraints due to reward sparsity.
Figure 3: Overview of our approach. We first construct rationales for each individual property and then combine them as multi-property rationales. The method learns a graph completion model and rationale distribution in order to generate positive molecules.

Molecule Generation Previous work have adopted various approaches for generating molecules under specific property constraints. Roughly speaking, existing methods can be divided along two axes — representation and optimization. On the representation side, they either operate on SMILES strings (Gómez-Bombarelli et al., 2018; Segler et al., 2017; Kang & Cho, 2018) or directly on molecular graphs (Simonovsky & Komodakis, 2018; Jin et al., 2018; Samanta et al., 2018; Liu et al., 2018; De Cao & Kipf, 2018; Ma et al., 2018; Seff et al., 2019). On the optimization side, the task has been formulated as reinforcement learning (Guimaraes et al., 2017; Olivecrona et al., 2017; Popova et al., 2018; You et al., 2018a; Zhou et al., 2018)

, continuous optimization in the latent space learned by variational autoencoders

(Gómez-Bombarelli et al., 2018; Kusner et al., 2017; Dai et al., 2018; Jin et al., 2018; Kajino, 2018; Liu et al., 2018), or graph-to-graph translation (Jin et al., 2019b). In contrast to existing approaches, our model focuses on the multi-objective setting of the problem and offers a different formulation for molecule generation based on rationales.

Interpretability Our rationale based generative model seeks to provide transparency (Doshi-Velez & Kim, 2017) for molecular design. The choice of rationales is visible to users and can be easily controlled by human experts. Prior work on interpretability primarily focuses on finding rationales (i.e., explanations) of model predictions in image and text classification (Lei et al., 2016; Ribeiro et al., 2016; Sundararajan et al., 2017) and molecule property prediction (McCloskey et al., 2019; Ying et al., 2019; Lee et al., 2019). In contrast, our model uses rationales as building blocks for molecule generation.

3 Composing Molecules using Rationales

Molecules are represented as graphs with atoms as nodes and bonds as edges. The goal of drug discovery is to find novel compounds satisfying given property constraints (e.g., drug-likeness, binding affinity, etc.). Without loss of generality, we assume the property constraints to be of the following form:


For each property , the property score of molecule must be higher than threshold . A molecule is called positive to property if and negative otherwise.

Following previous work (Olivecrona et al., 2017; Popova et al., 2018),

is output of property prediction models (e.g., random forests) which effectively approximate empirical measurements. The prediction model is trained over a set of molecules with labeled properties gathered from real experimental data. The property predictor is then

fixed throughout the rest of the training process.

Overview Our model generates molecules by first sampling a rationale from the vocabulary , and then completing it into a molecule . The generative model is defined as


As shown in Figure 3, our model consists of three modules:

  • [leftmargin=*,topsep=0pt,itemsep=0pt]

  • Rational Extraction: Construct rationale vocabulary each individual property and combines these rationales for multiple properties (see §3.1).

  • Graph Completion : Generate molecules using multi-property rationales . The model is first pre-trained on natural compounds and then fine-tuned to generate molecules satisfying multiple constraints (see §3.2 for its architecture and §3.3 for fine-tuning).

  • Rationale Distribution : The rationale distribution is learned based on the properties of complete molecules generated from . A rationale is sampled more frequently if it is more likely to be expanded into a positive molecule (see §3.3).

3.1 Rationale Extraction from Predictive Models

Single-property Rationale We define a rationale for a single property as a subgraph of some molecule which causes to be positive (see Figure 1). To be specific, let be the vocabulary of such rationales for property . Each rationale should satisfy the following two criteria to be considered as a rationale:

  1. [leftmargin=*,topsep=0pt,itemsep=0pt]

  2. The size of should be small (less than atoms).

  3. Its predicted property score .

For a single property , we propose to extract its rationales from a set of positive molecules used to train the property predictor. For each molecule , we find a rationale subgraph with high predicted property and small size ():


Solving the above problem is challenging because rationale is discrete and the potential number of subgraphs grows exponentially to the size of . To limit the search space, we have added an additional constraint that has to be a connected subgraph.111This assumption is valid in many cases. For instance, rationales for toxicity (i.e., toxicophores) are connected subgraphs in most cases (Sushko et al., 2012). In this case, we can find a rationale by iteratively removing some peripheral bonds while maintaining its property. Therefore, the key is learning to prune the molecule.

This search problem can be efficiently solved by Monte Carlo Tree Search (MCTS) (Silver et al., 2017). The root of the search tree is and each state in the search tree is a subgraph derived from a sequence of bond deletions. To ensure that each subgraph is chemically valid and stays connected, we only allow deletion of one peripheral non-aromatic bond or one peripheral ring from each state. As shown in Figure 4, a bond or a ring is called peripheral if stays connected after deleting .

During search process, each state in the search tree contains edges for all legal deletions . Following Silver et al. (2017), each edge stores the following statistics:

  • [leftmargin=*,topsep=0pt,itemsep=0pt]

  • is the visit count of deletion , which is used for exploration-exploitation tradeoff in the search process.

  • is total action value which indicates how likely the deletion will lead to a good rationale.

  • is the predicted property score of the new subgraph derived from deleting from .

Guided by these statistics, MCTS searches for rationales in multiple iterations. Each iteration consists of two phases:

  1. [leftmargin=*,topsep=0pt,itemsep=0pt]

  2. Forward pass: Select a path from the root to a leaf state with less than atoms and evaluate its property score . At each state , an deletion is selected according to the statistics in the search tree:


    where determines the level of exploration. This search strategy is a variant of the PUCT algorithm (Rosin, 2011). It initially prefers to explore deletions with high and low visit count, but asympotically prefers deletions that are likely to lead to good rationales.

  3. Backward pass: The edge statistics are updated for each state . Specifically, and .

In the end, we collect all the leaf states with and add them to the rationale vocabulary .

Figure 4: Illustration of Monte Carlo tree search for molecules. Peripheral bonds and rings are highlighted in red. In the forward pass, the model deletes a peripheral bond or ring from each state which has maximum value. In the backward pass, the model updates the statistics of each state.

Multi-property Rationale For a set of properties, we can similarly define its rationale by imposing property constraints at the same time, namely

In principle, we can apply MCTS to extract rationales from molecules that satisfy all the property constraints. However, in many cases there are no such molecules available. As a result, we propose to construct rationales from single-property rationales for each property . Specifically, each multi-property rationale is a disconnected graph with connected components :


where means to concatenate two graphs and . For notational convenience, we denote both single and multi-property rationales as . In the rest of the paper, is a rationale graph with one or multiple connected components.

3.2 Graph Completion

This module is a variational autoencoder which completes a full molecule given a rationale . Since each rationale can be realized into many different molecules, we introduce a latent variable to generate diverse outputs:


where is the prior distribution. Different from standard graph generation, our graph decoder must generate graphs that contain subgraph . Our VAE architecture is adapted from existing atom-by-atom generative models (You et al., 2018b; Liu et al., 2018) to incorporate the subgraph constraint. For completeness, we present our architecture here:

Encoder Our encoder is a message passing network (MPN) which learns the approximate posterior for variational inference. Let be the embedding of atom with atom type , and be the embedding of bond with bond type . The MPN computes atom representations .


For simplicity, we denote the MPN encoding process as

, which is detailed in the appendix. The atom vectors are aggregated to represent

as a single vector . Finally, we sample latent vector from with mean

and log variance



Decoder The decoder generates molecule according to its breadth-first order. In each step, the model generates a new atom and all its connecting edges. During generation, we maintain a queue that contains frontier nodes in the graph who still have neighbors to be generated. Let be the partial graph generated till step . To ensure contains as subgraph, we set the initial state of and put all the peripheral atoms of to the queue (only peripheral atoms are needed due to the rationale extraction algorithm).

In generation step, the decoder first runs a MPN over current graph to compute atom representations :


The current graph is represented as the sum of its atom vectors . Suppose the first atom in is . The decoder decides to expand in three steps:

  1. [leftmargin=*,topsep=0pt,itemsep=0pt]

  2. Predict whether there will be a new atom attached to :



    is a ReLU network whose input is a concatenation of multiple vectors.

  3. If , discard and move on to the next node in . Stop generation if is empty. Otherwise, create a new atom and predict its atom type:

  4. Predict the bond type between and other frontier nodes in (). Since atoms are generated in breadth-first order, there are no bonds between and atoms not in .

To fully capture edge dependencies, we predict the bonds between and atoms in sequentially and update the representation of when new bonds are added to . In the step, we predict the bond type of as follows:


where is the new representation of after bonds have been added to :

3.3 Training Procedure

Our training objective is to maximize the expected reward of generated molecules , where the reward is an indicator of for all properties


We incorporate an entropy regularization term to encourage the model to explore different types of rationales. The rationale distribution is a categorical distribution over the rationale vocabulary. Let . It is easy to show that the optimal has a closed form solution:


The remaining question is how to train graph generator . The generator seeks to produce molecules that are realistic and positive. However, Eq.(15) itself does not take into account whether generated molecules are realistic or not. To encourage the model to generate realistic compounds, we train the graph generator in two phases:

  • [leftmargin=*,topsep=0pt,itemsep=0pt]

  • Pre-training using real molecules.

  • Fine-tuning using policy gradient with reward from property predictors.

The overall training algorithm is shown in Algorithm 1.

1:  for  do
2:      rationales extracted from existing molecules positive to property . (see §3.1)
3:  end for
4:  Construct multi-property rationales .
5:  Pre-train on the pre-training dataset .
6:  Fine-tune model on for iterations using policy gradient.
7:  Compute based on Eq.(16) using fine-tuned model .
Algorithm 1 Training method with property constraints.

3.3.1 Pre-training

In addition to satisfying all the property constraints, the output of the model should constitute a realistic molecule. For this purpose, we pre-train the graph generator on a large set of molecules from ChEMBL (Gaulton et al., 2017). Each training example is a pair , where is a (random) subgraph of a molecule . The task is to take as input subgraphs and complete them into a full molecule. Given a molecule , we consider two types of random subgraphs:

  • [leftmargin=*,topsep=0pt,itemsep=0pt]

  • is a connected subgraph of with up to atoms.

  • is a disconnected subgraph of with multiple connected components. This is to simulate the case of generating molecules from multi-property rationales.

Finally, we train the graph generator to maximize the likelihood of the pre-training dataset .

3.3.2 Fine-tuning

After pre-training, we further fine-tune the graph generator on property-specific rationales in order to maximize Eq.(15). The model is fine-tuned through multiple iterations using policy gradient (Sutton et al., 2000). Let be the model trained till iteration. In each iteration, we perform the following two steps:

  1. [leftmargin=*,topsep=0pt,itemsep=0pt]

  2. Initialize the fine-tuning set . For each rationale , use the current model to sample molecules . Add to set if is predicted to be positive.

  3. Update the model on the fine-tuning set using policy gradient method.

After fine-tuning , we compute the rationale distribution based on Eq.(16).

4 Experiments

Method DRD2 GSK3 JNK3
Success Novelty Diversity Success Novelty Diversity Success Novelty Diversity
GVAE + RL 55.4% 56.2% 0.858 33.2% 76.4% 0.874 57.7% 62.6% 0.832
GCPN 40.1% 12.8% 0.880 42.4% 11.6% 0.904 32.3% 4.4% 0.884
REINVENT 98.1% 28.8% 0.798 99.3% 61.0% 0.733 98.5% 31.6% 0.729
Ours 100% 84.9% 0.866 100% 76.8% 0.850 100% 81.4% 0.847
Table 1: Results on molecule design with single property constraint.
Method DRD2 + GSK3 DRD2 + JNK3 GSK3 + JNK3
Success Novelty Diversity Success Novelty Diversity Success Novelty Diversity
GVAE + RL 96.1% 0% 0.447 98.5% 0% 0.0 40.7% 80.3% 0.783
GCPN 0.1% 40% 0.793 0.1% 25% 0.785 3.5% 8.0% 0.874
REINVENT 98.7% 100% 0.584 91.6% 100% 0.533 97.4% 39.7% 0.595
Ours 100% 100% 0.866 100% 100% 0.837 100% 94.4% 0.844
Table 2: Results on molecule design with two property constraints.
Method DRD2 + GSK3 + JNK3
Success Novelty Diversity
GVAE + RL 0% 0% 0.0
GCPN 0% 0% 0.0
REINVENT 48.3% 100% 0.166
Ours 86.2% 100% 0.726
Table 3: Molecule design with three property constraints.
Figure 5: Left & middle: t-SNE plot of the extracted rationales for GSK3 and JNK3. For both properties, rationales mostly covers the chemical space populated by existing positive molecules. Right: t-SNE plot of generated GSK3+JNK3 dual inhibitors.
Figure 6: Left: Examples of molecules generated from dual-property rationales. The model learns to combine two disjoint rationale graphs. The added subgraphs are highlighted in red. Right: Example structural alerts in the toxicity dataset. The ground truth rationale (Azobenzene) is highlighted in red. Our learned rationale almost matches the ground truth (error highlighted in dashed circle).

We evaluate our method on molecule design tasks under various combination of property constraints. In our experiments, we consider the following three properties:

  • [leftmargin=*,topsep=0pt,itemsep=0pt]

  • DRD2: Dopamine type 2 receptor activity. The DRD2 activity prediction model is trained on the dataset provided by Olivecrona et al. (2017), which contains 7219 positive and 100K negative compounds.

  • GNK3: Inhibition against glycogen synthase kinase-3 beta. The GNK3 prediction model is trained on the dataset from Li et al. (2018), which contains 2665 positives and 50K negative compounds.

  • JNK3: Inhibition against c-Jun N-terminal kinase-3. The JNK3 prediction model is also trained on the dataset from Li et al. (2018) with 740 positives and 50K negatives.

Following Li et al. (2018), the property prediction model is a random forest using Morgan fingerprint features (Rogers & Hahn, 2010). We set the positive threshold .

Multi-property Constraints We also consider various combinations of property constraints:

  • [leftmargin=*,topsep=0pt,itemsep=0pt]

  • GNK3 + JNK3: Jointly inhibiting JNK3 and GSK-3 may provide potential benefit for the treatment of Alzheimer’s disease (Li et al., 2018). There exist 316 dual inhibitors already available in the dataset.

  • DRD2 + JNK3: JNK3 inhibitors active to DRD2.

  • DRD2 + GSK3: GSK3 inhibitors active to DRD2.

  • DRD2 + GSK3 + JNK3: Combining all constraints.

Except the first case, all other combinations have less than three positive molecules in the datasets, which impose significant challenge for molecule design methods.

Evaluation Metrics Our evaluation effort measures various aspects of molecule design. For each method, we generate molecules and compute the following metrics:

  • [leftmargin=*,topsep=0pt,itemsep=0pt]

  • Success: The fraction of sampled molecules predicted to be positive (i.e., satisfying all property constraints). A good model should have a high success rate. Following previous work (Olivecrona et al., 2017; You et al., 2018a), we only consider the success rate under property prediction, as it is hard to obtain real property measurements.

  • Diversity: It is also important for a model generate diverse range of positive molecules. To this end, we measure the diversity of generated positive compounds by computing their pairwise molecular distance , which is defined as the Tanimoto distance over Morgan fingerprints of two molecules.

  • Novelty: Crucially, a good model should discover novel positive compounds. In this regard, for each generated positive compound , we find its nearest neighbor from positive molecules in the training set. We define the novelty as the fraction of molecules with nearest neighbor similarity greater than 0.4 (Olivecrona et al., 2017):

Baselines We compare our method against the following state-of-the-art generation methods for molecule design:

  • [leftmargin=*,topsep=0pt,itemsep=0pt]

  • REINVENT (Olivecrona et al., 2017) is a RL model generating molecules based on their SMILES strings (Weininger, 1988). To generate realistic molecules, their model is pre-trained over one million molecules from ChEMBL and then finetuned under property reward.

  • GCPN (You et al., 2018a) is a RL model which generates molecular graphs atom by atom. It uses GAN (Goodfellow et al., 2014) to help generate realistic molecules.

  • GVAE + RL is a graph variational autoencoder which generates molecules atom by atom. The graph VAE architecture is the same as our model, but it generates molecules from scratch without using rationales (i.e., ). The model is pre-trained on the same ChEMBL dataset and fine-tuned for each property using policy gradient. This is an ablation study of our method to show the importance of using rationales.

Rationales Details of the rationales used in our model:

  • [leftmargin=*,topsep=0pt,itemsep=0pt]

  • Single property: For single properties, the rationale size is required to be less than 20 atoms. For each positive molecule, we run 20 iteration of MCTS with . In total, we extracted 8611, 6299 and 417 different rationales for DRD2, GSK3 and JNK3.

  • Two properties: We constructed dual-property rationales by . To limit the vocabulary size, we cluster the rationales using the algorithm from Butina (1999) and keep the cluster centroids only. In total, we extracted 60K, 9.9K, 9.2K rationales for DRD2+GSK3, DRD2+JNK3, GSK3+JNK3 respectively.

  • Three properties: The rationale for three properties is constructed as . In total, we extracted 8163 different rationales for this task.

Model Setup Our model (and all baselines) are pretrained on the same ChEMBL dataset from Olivecrona et al. (2017). We fine-tune our model for iterations, where each rationale is expanded for times.

4.1 Results

The results on all design tasks are reported in Table 1-3. On the single-property design tasks, our model and REINVENT demonstrate nearly perfect success rate since there is only one constraint. We also outperform all the baselines in terms of novelty metric. Though our diversity score is slightly lower than GCPN, our success rate and novelty score is much higher.

Table 2 summarizes the results on dual-property design tasks. On the GSK3+JNK3 task, we achieved 100% success rate while maintaining 94.4% novelty and 0.844 diversity score. Meanwhile, GCPN fails to discover positive compounds on all the tasks due to reward sparsity. GVAE + RL fails to discover novel positives on the DRD2+GSK3 and DRD2+JNK3 tasks because there are less than three positive compounds available for training. Therefore it can only learn to replicate existing positives. Our method is able to succeed in all cases regardless of training data scarcity.

Table 3 shows the results on the three-property design task, which is the most challenging. The difference between our model the baselines become significantly larger. In fact, GVAE + RL and GCPN completely fail in this task due to reward sparsity. Our model outperforms REINVENT with a wide margin (success rate: 86.8% versus 48.3%; diversity: 0.726 versus 0.166).

When there are multiple properties, our method significantly outperforms GVAE + RL, which has the same generative architecture but does not utilize rationales and generates molecules from scratch. Thus we conclude the importance of rationales for multi-property molecular design.

Visualization We further provide visualizations to help understand our model. In Figure 5, we plotted a t-SNE (Maaten & Hinton, 2008) plot of the extracted rationales for GSK3 and JNK3. For both properties, rationales mostly cover the chemical space populated by existing positive molecules. The generated GSK3+JNK3 dual inhibitors mostly appear in the place where GSK3 and JNK3 rationales overlap. In Figure 6, we show examples of molecules generated from dual-property rationales. Each rationale has two connected components: one from GSK3 and the other from JNK3.

4.2 Rationale Sanity Check

Method Partial Match Exact Match
Integrated Gradient 0.857 39.4%
MCTS Rationale 0.861 46.0%
Table 4: Rationale accuracy on the toxicity dataset.

While our rationales are mainly extracted for generation, it is also important for them to be chemically relevant. In other words, the extracted rationales should accurately explain the property of interest. As there is no ground truth rationales available for DRD2, JNK3 and GSK3, we turn to an auxiliary toxicity dataset for evaluating rationale quality. Specifically, our dataset contains 100K molecules randomly selected from ChEMBL. Each molecule is labeled as toxic if it contains structural alerts (Sushko et al., 2012) — chemical substructures that is correlated with human or environmental hazards (see Figure 6).222Structural alerts used in our paper are from knowledgebase/169485-non-medchem-friendly-smarts We trained a neural network to predict toxicity on this dataset. Under this setup, the structural alerts are ground truth rationales and we evaluate how often the extracted rationales match them.

We compare our MCTS based rationale extraction with integrated gradient (Sundararajan et al., 2017), which has been applied to explain property prediction models (McCloskey et al., 2019). We report two metrics: partial match AUC (attribution AUC metric used in McCloskey et al. (2019)) and exact match accuracy which measures how often a rationale graph exactly matches the true rationale in the molecule. As shown in Table 4, our method significantly outperforms the baseline in terms of exact matching. The extracted rationales has decent overlap with true rationales, with 0.86 partial match on average. Therefore, our model is capable of finding rationales that are chemically meaningful.

5 Conclusion

In this paper, we developed a rationale based generative model for molecular design. Our model generates molecules in two phases: 1) identifying rationales whose presence indicate strong positive signals for each property; 2) expanding rationale graphs into molecules using graph generative models and fine-tuning it towards desired combination of properties. Our model demonstrates strong improvement over prior reinforcement learning methods in various tasks.