1 Introduction
The key challenge in drug discovery is to find molecules that satisfy multiple constraints, from potency, safety, to desired metabolic profiles. Optimizing these constraints simultaneously is challenging for existing computational models. The primary difficulty lies in the lack of training instances of molecules that conform to all the constraints. For example, for this reason, Jin et al. (2019a) reports over 60% performance loss when moving beyond the singleconstraint setting.
In this paper, we propose a novel approach to multiproperty molecular optimization. Our strategy is inspired by fragmentbased drug discovery (Murray & Rees, 2009) often followed by medicinal chemists. The idea is to start with substructures (e.g., functional groups or later pieces) that drive specific properties of interest, and then combine these building blocks into a target molecule. To automate this process, our model has to learn two complementary tasks illustrated in Figure 1: (1) identification of the building blocks that we call rationales, and (2) assembling multiple rationales together into a fully formed target molecule. In contrast to competing methods, our generative model does not build molecules from scratch, but instead assembles them from automatically extracted rationales already implicated for specific properties (see Figure 1).
We implement this idea using a generative model of molecules where the rationale choices play the role of latent variables. Specifically, a molecular graph is generated from underlying rationale sets according to:
(1) 
As ground truth rationales (e.g., functional groups or subgraphs) are not provided, the model has to extract candidate rationales from molecules with the help of a property predictor. We formulate this task as a discrete optimization problem efficiently solved by Monte Carlo tree search. Our rationale conditioned graph generator,
, is initially trained on a large collection of real molecules so that it is capable of expanding any subgraph into a full molecule. The mixture model is then finetuned using reinforcement learning to ensure that the generated molecules preserve all the properties of interest. This training paradigm enables us to realize molecules that satisfy multiple constraints without observing any such instances in the training set.
The proposed model is evaluated on molecule design tasks under different combinations of property constraints. Our baselines include stateoftheart molecule generation methods (Olivecrona et al., 2017; You et al., 2018a). Across all tasks, our model achieve stateofthe art results in terms of accuracy, novelty and diversity of generated compounds. In particular, we outperform the best baseline with 38% absolute improvement in the task with three property constraints. We further provide ablation studies to validate the benefit of our architecture in the lowresource scenario. Finally, we show that identified rationales are chemically meaningful in a toxicity prediction task (Sushko et al., 2012).
2 Related Work
Reinforcement Learning One of the prevailing paradigms for drug design is reinforcement learning (RL) (You et al., 2018a; Olivecrona et al., 2017; Popova et al., 2018), which seeks to maximize the expected reward defined as the sum of predicted property scores using the property predictors. Their approach learns a distribution
(a neural network) for generating molecules. Ideally, the model should achieve high success rate in generating molecules that meet all the constraints, while maintaining the diversity of
.The main challenge of RL lies in the sparsity of rewards, especially when there are multiple competing constraints. For illustration, we tested a stateoftheart reinforcement learning method (Olivecrona et al., 2017) under three property constraints: biological activity to target DRD2, GSK3 and JNK3 (Li et al., 2018). As shown in Figure 2, initially the success rate and diversity is high when given only one of the constraints, but they decrease dramatically when all the property constraints are added. The reason of this failure is that the property predictor (i.e., reward function) remains blackbox and the model has limited understand of how and why certain molecules are desirable.
Our framework offsets this complexity by understanding property landscape through rationales. At a high level, the rationales are analogous to options (Sutton et al., 1999; Stolle & Precup, 2002), which are macroactions leading the agent faster to its goal. The rationales are automatically discovered from molecules with labeled properties.
Molecule Generation Previous work have adopted various approaches for generating molecules under specific property constraints. Roughly speaking, existing methods can be divided along two axes — representation and optimization. On the representation side, they either operate on SMILES strings (GómezBombarelli et al., 2018; Segler et al., 2017; Kang & Cho, 2018) or directly on molecular graphs (Simonovsky & Komodakis, 2018; Jin et al., 2018; Samanta et al., 2018; Liu et al., 2018; De Cao & Kipf, 2018; Ma et al., 2018; Seff et al., 2019). On the optimization side, the task has been formulated as reinforcement learning (Guimaraes et al., 2017; Olivecrona et al., 2017; Popova et al., 2018; You et al., 2018a; Zhou et al., 2018)
, continuous optimization in the latent space learned by variational autoencoders
(GómezBombarelli et al., 2018; Kusner et al., 2017; Dai et al., 2018; Jin et al., 2018; Kajino, 2018; Liu et al., 2018), or graphtograph translation (Jin et al., 2019b). In contrast to existing approaches, our model focuses on the multiobjective setting of the problem and offers a different formulation for molecule generation based on rationales.Interpretability Our rationale based generative model seeks to provide transparency (DoshiVelez & Kim, 2017) for molecular design. The choice of rationales is visible to users and can be easily controlled by human experts. Prior work on interpretability primarily focuses on finding rationales (i.e., explanations) of model predictions in image and text classification (Lei et al., 2016; Ribeiro et al., 2016; Sundararajan et al., 2017) and molecule property prediction (McCloskey et al., 2019; Ying et al., 2019; Lee et al., 2019). In contrast, our model uses rationales as building blocks for molecule generation.
3 Composing Molecules using Rationales
Molecules are represented as graphs with atoms as nodes and bonds as edges. The goal of drug discovery is to find novel compounds satisfying given property constraints (e.g., druglikeness, binding affinity, etc.). Without loss of generality, we assume the property constraints to be of the following form:
(2) 
For each property , the property score of molecule must be higher than threshold . A molecule is called positive to property if and negative otherwise.
Following previous work (Olivecrona et al., 2017; Popova et al., 2018),
is output of property prediction models (e.g., random forests) which effectively approximate empirical measurements. The prediction model is trained over a set of molecules with labeled properties gathered from real experimental data. The property predictor is then
fixed throughout the rest of the training process.Overview Our model generates molecules by first sampling a rationale from the vocabulary , and then completing it into a molecule . The generative model is defined as
(3) 
As shown in Figure 3, our model consists of three modules:

[leftmargin=*,topsep=0pt,itemsep=0pt]

Rational Extraction: Construct rationale vocabulary each individual property and combines these rationales for multiple properties (see §3.1).

Rationale Distribution : The rationale distribution is learned based on the properties of complete molecules generated from . A rationale is sampled more frequently if it is more likely to be expanded into a positive molecule (see §3.3).
3.1 Rationale Extraction from Predictive Models
Singleproperty Rationale We define a rationale for a single property as a subgraph of some molecule which causes to be positive (see Figure 1). To be specific, let be the vocabulary of such rationales for property . Each rationale should satisfy the following two criteria to be considered as a rationale:

[leftmargin=*,topsep=0pt,itemsep=0pt]

The size of should be small (less than atoms).

Its predicted property score .
For a single property , we propose to extract its rationales from a set of positive molecules used to train the property predictor. For each molecule , we find a rationale subgraph with high predicted property and small size ():
(4) 
Solving the above problem is challenging because rationale is discrete and the potential number of subgraphs grows exponentially to the size of . To limit the search space, we have added an additional constraint that has to be a connected subgraph.^{1}^{1}1This assumption is valid in many cases. For instance, rationales for toxicity (i.e., toxicophores) are connected subgraphs in most cases (Sushko et al., 2012). In this case, we can find a rationale by iteratively removing some peripheral bonds while maintaining its property. Therefore, the key is learning to prune the molecule.
This search problem can be efficiently solved by Monte Carlo Tree Search (MCTS) (Silver et al., 2017). The root of the search tree is and each state in the search tree is a subgraph derived from a sequence of bond deletions. To ensure that each subgraph is chemically valid and stays connected, we only allow deletion of one peripheral nonaromatic bond or one peripheral ring from each state. As shown in Figure 4, a bond or a ring is called peripheral if stays connected after deleting .
During search process, each state in the search tree contains edges for all legal deletions . Following Silver et al. (2017), each edge stores the following statistics:

[leftmargin=*,topsep=0pt,itemsep=0pt]

is the visit count of deletion , which is used for explorationexploitation tradeoff in the search process.

is total action value which indicates how likely the deletion will lead to a good rationale.

is the predicted property score of the new subgraph derived from deleting from .
Guided by these statistics, MCTS searches for rationales in multiple iterations. Each iteration consists of two phases:

[leftmargin=*,topsep=0pt,itemsep=0pt]

Forward pass: Select a path from the root to a leaf state with less than atoms and evaluate its property score . At each state , an deletion is selected according to the statistics in the search tree:
(5) (6) where determines the level of exploration. This search strategy is a variant of the PUCT algorithm (Rosin, 2011). It initially prefers to explore deletions with high and low visit count, but asympotically prefers deletions that are likely to lead to good rationales.

Backward pass: The edge statistics are updated for each state . Specifically, and .
In the end, we collect all the leaf states with and add them to the rationale vocabulary .
Multiproperty Rationale For a set of properties, we can similarly define its rationale by imposing property constraints at the same time, namely
In principle, we can apply MCTS to extract rationales from molecules that satisfy all the property constraints. However, in many cases there are no such molecules available. As a result, we propose to construct rationales from singleproperty rationales for each property . Specifically, each multiproperty rationale is a disconnected graph with connected components :
(7) 
where means to concatenate two graphs and . For notational convenience, we denote both single and multiproperty rationales as . In the rest of the paper, is a rationale graph with one or multiple connected components.
3.2 Graph Completion
This module is a variational autoencoder which completes a full molecule given a rationale . Since each rationale can be realized into many different molecules, we introduce a latent variable to generate diverse outputs:
(8) 
where is the prior distribution. Different from standard graph generation, our graph decoder must generate graphs that contain subgraph . Our VAE architecture is adapted from existing atombyatom generative models (You et al., 2018b; Liu et al., 2018) to incorporate the subgraph constraint. For completeness, we present our architecture here:
Encoder Our encoder is a message passing network (MPN) which learns the approximate posterior for variational inference. Let be the embedding of atom with atom type , and be the embedding of bond with bond type . The MPN computes atom representations .
(9) 
For simplicity, we denote the MPN encoding process as
, which is detailed in the appendix. The atom vectors are aggregated to represent
as a single vector . Finally, we sample latent vector from with meanand log variance
:(10) 
Decoder The decoder generates molecule according to its breadthfirst order. In each step, the model generates a new atom and all its connecting edges. During generation, we maintain a queue that contains frontier nodes in the graph who still have neighbors to be generated. Let be the partial graph generated till step . To ensure contains as subgraph, we set the initial state of and put all the peripheral atoms of to the queue (only peripheral atoms are needed due to the rationale extraction algorithm).
In generation step, the decoder first runs a MPN over current graph to compute atom representations :
(11) 
The current graph is represented as the sum of its atom vectors . Suppose the first atom in is . The decoder decides to expand in three steps:

[leftmargin=*,topsep=0pt,itemsep=0pt]

Predict whether there will be a new atom attached to :
(12) where
is a ReLU network whose input is a concatenation of multiple vectors.

If , discard and move on to the next node in . Stop generation if is empty. Otherwise, create a new atom and predict its atom type:
(13) 
Predict the bond type between and other frontier nodes in (). Since atoms are generated in breadthfirst order, there are no bonds between and atoms not in .
To fully capture edge dependencies, we predict the bonds between and atoms in sequentially and update the representation of when new bonds are added to . In the step, we predict the bond type of as follows:
(14) 
where is the new representation of after bonds have been added to :
3.3 Training Procedure
Our training objective is to maximize the expected reward of generated molecules , where the reward is an indicator of for all properties
(15) 
We incorporate an entropy regularization term to encourage the model to explore different types of rationales. The rationale distribution is a categorical distribution over the rationale vocabulary. Let . It is easy to show that the optimal has a closed form solution:
(16) 
The remaining question is how to train graph generator . The generator seeks to produce molecules that are realistic and positive. However, Eq.(15) itself does not take into account whether generated molecules are realistic or not. To encourage the model to generate realistic compounds, we train the graph generator in two phases:

[leftmargin=*,topsep=0pt,itemsep=0pt]

Pretraining using real molecules.

Finetuning using policy gradient with reward from property predictors.
The overall training algorithm is shown in Algorithm 1.
3.3.1 Pretraining
In addition to satisfying all the property constraints, the output of the model should constitute a realistic molecule. For this purpose, we pretrain the graph generator on a large set of molecules from ChEMBL (Gaulton et al., 2017). Each training example is a pair , where is a (random) subgraph of a molecule . The task is to take as input subgraphs and complete them into a full molecule. Given a molecule , we consider two types of random subgraphs:

[leftmargin=*,topsep=0pt,itemsep=0pt]

is a connected subgraph of with up to atoms.

is a disconnected subgraph of with multiple connected components. This is to simulate the case of generating molecules from multiproperty rationales.
Finally, we train the graph generator to maximize the likelihood of the pretraining dataset .
3.3.2 Finetuning
After pretraining, we further finetune the graph generator on propertyspecific rationales in order to maximize Eq.(15). The model is finetuned through multiple iterations using policy gradient (Sutton et al., 2000). Let be the model trained till iteration. In each iteration, we perform the following two steps:

[leftmargin=*,topsep=0pt,itemsep=0pt]

Initialize the finetuning set . For each rationale , use the current model to sample molecules . Add to set if is predicted to be positive.

Update the model on the finetuning set using policy gradient method.
After finetuning , we compute the rationale distribution based on Eq.(16).
4 Experiments
Method  DRD2  GSK3  JNK3  

Success  Novelty  Diversity  Success  Novelty  Diversity  Success  Novelty  Diversity  
GVAE + RL  55.4%  56.2%  0.858  33.2%  76.4%  0.874  57.7%  62.6%  0.832 
GCPN  40.1%  12.8%  0.880  42.4%  11.6%  0.904  32.3%  4.4%  0.884 
REINVENT  98.1%  28.8%  0.798  99.3%  61.0%  0.733  98.5%  31.6%  0.729 
Ours  100%  84.9%  0.866  100%  76.8%  0.850  100%  81.4%  0.847 
Method  DRD2 + GSK3  DRD2 + JNK3  GSK3 + JNK3  

Success  Novelty  Diversity  Success  Novelty  Diversity  Success  Novelty  Diversity  
GVAE + RL  96.1%  0%  0.447  98.5%  0%  0.0  40.7%  80.3%  0.783 
GCPN  0.1%  40%  0.793  0.1%  25%  0.785  3.5%  8.0%  0.874 
REINVENT  98.7%  100%  0.584  91.6%  100%  0.533  97.4%  39.7%  0.595 
Ours  100%  100%  0.866  100%  100%  0.837  100%  94.4%  0.844 
Method  DRD2 + GSK3 + JNK3  

Success  Novelty  Diversity  
GVAE + RL  0%  0%  0.0 
GCPN  0%  0%  0.0 
REINVENT  48.3%  100%  0.166 
Ours  86.2%  100%  0.726 
We evaluate our method on molecule design tasks under various combination of property constraints. In our experiments, we consider the following three properties:

[leftmargin=*,topsep=0pt,itemsep=0pt]

DRD2: Dopamine type 2 receptor activity. The DRD2 activity prediction model is trained on the dataset provided by Olivecrona et al. (2017), which contains 7219 positive and 100K negative compounds.

GNK3: Inhibition against glycogen synthase kinase3 beta. The GNK3 prediction model is trained on the dataset from Li et al. (2018), which contains 2665 positives and 50K negative compounds.

JNK3: Inhibition against cJun Nterminal kinase3. The JNK3 prediction model is also trained on the dataset from Li et al. (2018) with 740 positives and 50K negatives.
Following Li et al. (2018), the property prediction model is a random forest using Morgan fingerprint features (Rogers & Hahn, 2010). We set the positive threshold .
Multiproperty Constraints We also consider various combinations of property constraints:

[leftmargin=*,topsep=0pt,itemsep=0pt]

GNK3 + JNK3: Jointly inhibiting JNK3 and GSK3 may provide potential benefit for the treatment of Alzheimer’s disease (Li et al., 2018). There exist 316 dual inhibitors already available in the dataset.

DRD2 + JNK3: JNK3 inhibitors active to DRD2.

DRD2 + GSK3: GSK3 inhibitors active to DRD2.

DRD2 + GSK3 + JNK3: Combining all constraints.
Except the first case, all other combinations have less than three positive molecules in the datasets, which impose significant challenge for molecule design methods.
Evaluation Metrics Our evaluation effort measures various aspects of molecule design. For each method, we generate molecules and compute the following metrics:

[leftmargin=*,topsep=0pt,itemsep=0pt]

Success: The fraction of sampled molecules predicted to be positive (i.e., satisfying all property constraints). A good model should have a high success rate. Following previous work (Olivecrona et al., 2017; You et al., 2018a), we only consider the success rate under property prediction, as it is hard to obtain real property measurements.

Diversity: It is also important for a model generate diverse range of positive molecules. To this end, we measure the diversity of generated positive compounds by computing their pairwise molecular distance , which is defined as the Tanimoto distance over Morgan fingerprints of two molecules.

Novelty: Crucially, a good model should discover novel positive compounds. In this regard, for each generated positive compound , we find its nearest neighbor from positive molecules in the training set. We define the novelty as the fraction of molecules with nearest neighbor similarity greater than 0.4 (Olivecrona et al., 2017):
Baselines We compare our method against the following stateoftheart generation methods for molecule design:

[leftmargin=*,topsep=0pt,itemsep=0pt]

GVAE + RL is a graph variational autoencoder which generates molecules atom by atom. The graph VAE architecture is the same as our model, but it generates molecules from scratch without using rationales (i.e., ). The model is pretrained on the same ChEMBL dataset and finetuned for each property using policy gradient. This is an ablation study of our method to show the importance of using rationales.
Rationales Details of the rationales used in our model:

[leftmargin=*,topsep=0pt,itemsep=0pt]

Single property: For single properties, the rationale size is required to be less than 20 atoms. For each positive molecule, we run 20 iteration of MCTS with . In total, we extracted 8611, 6299 and 417 different rationales for DRD2, GSK3 and JNK3.

Two properties: We constructed dualproperty rationales by . To limit the vocabulary size, we cluster the rationales using the algorithm from Butina (1999) and keep the cluster centroids only. In total, we extracted 60K, 9.9K, 9.2K rationales for DRD2+GSK3, DRD2+JNK3, GSK3+JNK3 respectively.

Three properties: The rationale for three properties is constructed as . In total, we extracted 8163 different rationales for this task.
Model Setup Our model (and all baselines) are pretrained on the same ChEMBL dataset from Olivecrona et al. (2017). We finetune our model for iterations, where each rationale is expanded for times.
4.1 Results
The results on all design tasks are reported in Table 13. On the singleproperty design tasks, our model and REINVENT demonstrate nearly perfect success rate since there is only one constraint. We also outperform all the baselines in terms of novelty metric. Though our diversity score is slightly lower than GCPN, our success rate and novelty score is much higher.
Table 2 summarizes the results on dualproperty design tasks. On the GSK3+JNK3 task, we achieved 100% success rate while maintaining 94.4% novelty and 0.844 diversity score. Meanwhile, GCPN fails to discover positive compounds on all the tasks due to reward sparsity. GVAE + RL fails to discover novel positives on the DRD2+GSK3 and DRD2+JNK3 tasks because there are less than three positive compounds available for training. Therefore it can only learn to replicate existing positives. Our method is able to succeed in all cases regardless of training data scarcity.
Table 3 shows the results on the threeproperty design task, which is the most challenging. The difference between our model the baselines become significantly larger. In fact, GVAE + RL and GCPN completely fail in this task due to reward sparsity. Our model outperforms REINVENT with a wide margin (success rate: 86.8% versus 48.3%; diversity: 0.726 versus 0.166).
When there are multiple properties, our method significantly outperforms GVAE + RL, which has the same generative architecture but does not utilize rationales and generates molecules from scratch. Thus we conclude the importance of rationales for multiproperty molecular design.
Visualization We further provide visualizations to help understand our model. In Figure 5, we plotted a tSNE (Maaten & Hinton, 2008) plot of the extracted rationales for GSK3 and JNK3. For both properties, rationales mostly cover the chemical space populated by existing positive molecules. The generated GSK3+JNK3 dual inhibitors mostly appear in the place where GSK3 and JNK3 rationales overlap. In Figure 6, we show examples of molecules generated from dualproperty rationales. Each rationale has two connected components: one from GSK3 and the other from JNK3.
4.2 Rationale Sanity Check
Method  Partial Match  Exact Match 

Integrated Gradient  0.857  39.4% 
MCTS Rationale  0.861  46.0% 
While our rationales are mainly extracted for generation, it is also important for them to be chemically relevant. In other words, the extracted rationales should accurately explain the property of interest. As there is no ground truth rationales available for DRD2, JNK3 and GSK3, we turn to an auxiliary toxicity dataset for evaluating rationale quality. Specifically, our dataset contains 100K molecules randomly selected from ChEMBL. Each molecule is labeled as toxic if it contains structural alerts (Sushko et al., 2012) — chemical substructures that is correlated with human or environmental hazards (see Figure 6).^{2}^{2}2Structural alerts used in our paper are from surechembl.org/ knowledgebase/169485nonmedchemfriendlysmarts We trained a neural network to predict toxicity on this dataset. Under this setup, the structural alerts are ground truth rationales and we evaluate how often the extracted rationales match them.
We compare our MCTS based rationale extraction with integrated gradient (Sundararajan et al., 2017), which has been applied to explain property prediction models (McCloskey et al., 2019). We report two metrics: partial match AUC (attribution AUC metric used in McCloskey et al. (2019)) and exact match accuracy which measures how often a rationale graph exactly matches the true rationale in the molecule. As shown in Table 4, our method significantly outperforms the baseline in terms of exact matching. The extracted rationales has decent overlap with true rationales, with 0.86 partial match on average. Therefore, our model is capable of finding rationales that are chemically meaningful.
5 Conclusion
In this paper, we developed a rationale based generative model for molecular design. Our model generates molecules in two phases: 1) identifying rationales whose presence indicate strong positive signals for each property; 2) expanding rationale graphs into molecules using graph generative models and finetuning it towards desired combination of properties. Our model demonstrates strong improvement over prior reinforcement learning methods in various tasks.
References
 Butina (1999) Butina, D. Unsupervised data base clustering based on daylight’s fingerprint and tanimoto similarity: A fast and automated way to cluster small and large data sets. Journal of Chemical Information and Computer Sciences, 39(4):747–750, 1999.
 Dai et al. (2018) Dai, H., Tian, Y., Dai, B., Skiena, S., and Song, L. Syntaxdirected variational autoencoder for structured data. arXiv preprint arXiv:1802.08786, 2018.
 De Cao & Kipf (2018) De Cao, N. and Kipf, T. Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, 2018.
 DoshiVelez & Kim (2017) DoshiVelez, F. and Kim, B. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017.
 Gaulton et al. (2017) Gaulton, A., Hersey, A., Nowotka, M., Bento, A. P., Chambers, J., Mendez, D., Mutowo, P., Atkinson, F., Bellis, L. J., CibriánUhalte, E., et al. The chembl database in 2017. Nucleic acids research, 45(D1):D945–D954, 2017.
 GómezBombarelli et al. (2018) GómezBombarelli, R., Wei, J. N., Duvenaud, D., HernándezLobato, J. M., SánchezLengeling, B., Sheberla, D., AguileraIparraguirre, J., Hirzel, T. D., Adams, R. P., and AspuruGuzik, A. Automatic chemical design using a datadriven continuous representation of molecules. ACS Central Science, 2018. doi: 10.1021/acscentsci.7b00572.
 Goodfellow et al. (2014) Goodfellow, I., PougetAbadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
 Guimaraes et al. (2017) Guimaraes, G. L., SanchezLengeling, B., Farias, P. L. C., and AspuruGuzik, A. Objectivereinforced generative adversarial networks (organ) for sequence generation models. arXiv preprint arXiv:1705.10843, 2017.
 Jin et al. (2018) Jin, W., Barzilay, R., and Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. International Conference on Machine Learning, 2018.
 Jin et al. (2019a) Jin, W., Barzilay, R., and Jaakkola, T. Hierarchical graphtograph translation for molecules. arXiv preprint arXiv:1907.11223, 2019a.
 Jin et al. (2019b) Jin, W., Yang, K., Barzilay, R., and Jaakkola, T. Learning multimodal graphtograph translation for molecular optimization. International Conference on Learning Representations, 2019b.
 Kajino (2018) Kajino, H. Molecular hypergraph grammar with its application to molecular optimization. arXiv preprint arXiv:1809.02745, 2018.
 Kang & Cho (2018) Kang, S. and Cho, K. Conditional molecular design with deep generative models. Journal of chemical information and modeling, 59(1):43–52, 2018.
 Kusner et al. (2017) Kusner, M. J., Paige, B., and HernándezLobato, J. M. Grammar variational autoencoder. arXiv preprint arXiv:1703.01925, 2017.
 Lee et al. (2019) Lee, G.H., Jin, W., AlvarezMelis, D., and Jaakkola, T. S. Functional transparency for structured data: a gametheoretic approach. arXiv preprint arXiv:1902.09737, 2019.
 Lei et al. (2016) Lei, T., Barzilay, R., and Jaakkola, T. Rationalizing neural predictions. arXiv preprint arXiv:1606.04155, 2016.
 Li et al. (2018) Li, Y., Zhang, L., and Liu, Z. Multiobjective de novo drug design with conditional graph generative model. arXiv preprint arXiv:1801.07299, 2018.
 Liu et al. (2018) Liu, Q., Allamanis, M., Brockschmidt, M., and Gaunt, A. L. Constrained graph variational autoencoders for molecule design. Neural Information Processing Systems, 2018.
 Ma et al. (2018) Ma, T., Chen, J., and Xiao, C. Constrained generation of semantically valid graphs via regularizing variational autoencoders. In Advances in Neural Information Processing Systems, pp. 7113–7124, 2018.
 Maaten & Hinton (2008) Maaten, L. v. d. and Hinton, G. Visualizing data using tsne. Journal of machine learning research, 9(Nov):2579–2605, 2008.
 McCloskey et al. (2019) McCloskey, K., Taly, A., Monti, F., Brenner, M. P., and Colwell, L. J. Using attribution to decode binding mechanism in neural network models for chemistry. Proceedings of the National Academy of Sciences, 116(24):11624–11629, 2019.
 Murray & Rees (2009) Murray, C. W. and Rees, D. C. The rise of fragmentbased drug discovery. Nature chemistry, 1(3):187, 2009.
 Olivecrona et al. (2017) Olivecrona, M., Blaschke, T., Engkvist, O., and Chen, H. Molecular denovo design through deep reinforcement learning. Journal of cheminformatics, 9(1):48, 2017.
 Popova et al. (2018) Popova, M., Isayev, O., and Tropsha, A. Deep reinforcement learning for de novo drug design. Science advances, 4(7):eaap7885, 2018.

Ribeiro et al. (2016)
Ribeiro, M. T., Singh, S., and Guestrin, C.
” why should i trust you?” explaining the predictions of any classifier.
In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144, 2016.  Rogers & Hahn (2010) Rogers, D. and Hahn, M. Extendedconnectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754, 2010.

Rosin (2011)
Rosin, C. D.
Multiarmed bandits with episode context.
Annals of Mathematics and Artificial Intelligence
, 61(3):203–230, 2011.  Samanta et al. (2018) Samanta, B., De, A., Jana, G., Chattaraj, P. K., Ganguly, N., and GomezRodriguez, M. Nevae: A deep generative model for molecular graphs. arXiv preprint arXiv:1802.05283, 2018.
 Seff et al. (2019) Seff, A., Zhou, W., Damani, F., Doyle, A., and Adams, R. P. Discrete object generation with reversible inductive construction. In Advances in Neural Information Processing Systems, pp. 10353–10363, 2019.
 Segler et al. (2017) Segler, M. H., Kogej, T., Tyrchan, C., and Waller, M. P. Generating focussed molecule libraries for drug discovery with recurrent neural networks. arXiv preprint arXiv:1701.01329, 2017.
 Silver et al. (2017) Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. Mastering the game of go without human knowledge. Nature, 550(7676):354–359, 2017.
 Simonovsky & Komodakis (2018) Simonovsky, M. and Komodakis, N. Graphvae: Towards generation of small graphs using variational autoencoders. arXiv preprint arXiv:1802.03480, 2018.
 Stolle & Precup (2002) Stolle, M. and Precup, D. Learning options in reinforcement learning. In International Symposium on abstraction, reformulation, and approximation, pp. 212–223. Springer, 2002.
 Sundararajan et al. (2017) Sundararajan, M., Taly, A., and Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 3319–3328. JMLR. org, 2017.
 Sushko et al. (2012) Sushko, I., Salmina, E., Potemkin, V. A., Poda, G., and Tetko, I. V. Toxalerts: a web server of structural alerts for toxic chemicals and compounds with potential adverse reactions, 2012.
 Sutton et al. (1999) Sutton, R. S., Precup, D., and Singh, S. Between mdps and semimdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(12):181–211, 1999.
 Sutton et al. (2000) Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pp. 1057–1063, 2000.
 Weininger (1988) Weininger, D. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
 Ying et al. (2019) Ying, Z., Bourgeois, D., You, J., Zitnik, M., and Leskovec, J. Gnnexplainer: Generating explanations for graph neural networks. In Advances in Neural Information Processing Systems, pp. 9240–9251, 2019.
 You et al. (2018a) You, J., Liu, B., Ying, R., Pande, V., and Leskovec, J. Graph convolutional policy network for goaldirected molecular graph generation. arXiv preprint arXiv:1806.02473, 2018a.
 You et al. (2018b) You, J., Ying, R., Ren, X., Hamilton, W. L., and Leskovec, J. Graphrnn: A deep generative model for graphs. arXiv preprint arXiv:1802.08773, 2018b.
 Zhou et al. (2018) Zhou, Z., Kearnes, S., Li, L., Zare, R. N., and Riley, P. Optimization of molecules via deep reinforcement learning. arXiv preprint arXiv:1810.08678, 2018.
Comments
There are no comments yet.