1 Introduction
Deep learning models for molecule property prediction and molecule generation are improving at a fast pace. Work to date has adopted primarily two types of building blocks for representing and building molecules: atombyatom strategies (Li et al., 2018; You et al., 2018a; Liu et al., 2018), or substructure based (either rings or bonds) (Jin et al., 2018, 2019). While these methods have been successful for small molecules, their performance degrades significantly for larger molecules such as polymers (see Figure 1). The failure is likely due to many generation steps required to realize larger molecules and the associated challenges with gradients across the iterative steps.
Large molecules such as polymers exhibit clear hierarchical structure, being built from repeated structural motifs. We hypothesize that explicitly incorporating such motifs as building blocks in the generation process can significantly improve reconstruction and generation accuracy, as already illustrated in Figure 1. While different substructures as building blocks were considered in previous work (Jin et al., 2018), their approach could not scale to larger motifs. Indeed, their decoding process required each substructure neighborhood to be assembled in one go, making it combinatorially challenging to handle large components with many possible attachment points.
In this paper, we propose a motifbased hierarchical encoderdecoder for graph generation. The motifs themselves are extracted separately at the outset from frequently occurring substructures, regardless of size. During generation, molecules are built step by step by attaching motifs, large or small, to the emerging molecule. The decoder operates hierarchically, in a coarsetofine manner, and makes three key consecutive predictions in each pass: new motif selection, which part of it attaches, and the points of contact with the current molecule. These decisions are highly coupled and naturally modeled autoregressively. Moreover, each decision is directly guided by the information explicated in the associated layer of the mirroring hierarchical encoder. The feedforward finetocoarse encoding performs iterative graph convolutions at each level, conditioned on the results from layer below.
The proposed model is evaluated on various tasks ranging from polymer generative modeling to graph translation for molecule property optimization. Our baselines include stateoftheart graph generation methods (You et al., 2018a; Liu et al., 2018; Jin et al., 2019). On polymer generation, our model achieved stateofthe art results under various metrics, outperforming the best baselines with 20% absolute improvement in reconstruction accuracy. On graph translation tasks, our model outperformed all the baselines, yielding 3.3% and 8.1% improvement on QED and DRD2 optimization tasks. During decoding, our model runs 6.3 times faster than previous substructurebased methods (Jin et al., 2019). We further conduct ablation studies to validate the advantage of using larger motifs and model architecture.
2 Background and Motivation
Molecules are represented as graphs with atoms as nodes and bonds
as edges. Graphs are challenging objects to generate, especially for larger molecules such as polymers. For the polymer dataset used in our experiment, there are thousands of molecules with more than 80 atoms. To illustrate the challenge, we tested two stateoftheart variational autoencoders
(Liu et al., 2018; Jin et al., 2018) on this dataset and found these models often fail to reconstruct molecules from their latent embedding (see Figure 1).The reason of this failure is that these methods generate molecules based on small building blocks. In terms of autoregressive models, previous work on molecular graph generation can be roughly divided in two categories:
^{1}^{1}1We restrict our discussion to molecule generation. You et al. (2018b); Liao et al. (2019) developed generative models for other types of graphs such as social networks. Their current implementations do not support the prediction of node and edge attributes and cannot be directly applied to molecules. Thus their methods are not tested here.
[leftmargin=*,topsep=0pt,itemsep=0pt]
As the building blocks are typically small, it requires many decoding steps for current models to reconstruct polymers. Therefore they are prone to make errors when generating large molecules. On the other hand, many of these molecules consist of structural motifs beyond simple substructures. The number of decoding steps can be significantly reduced if graphs are generated motif by motif. As shown in Figure 1, our motifbased method achieves a much higher reconstruction accuracy.
Motivation for New Architecture Current substructurebased method (Jin et al., 2018) requires a combinatorial enumeration to assemble substructures whose time complexity is exponential to substructure size. Their enumeration algorithm assumes the substructures to be of certain types (single cycles or bonds). In practice, their method often fails when handling rings with more than 10 atoms (e.g., memory error). Unlike substructures, motifs are typically much larger and can have flexible structures (see Figure 1). As a result, this method cannot be directly extended to utilize motifs in practice.
To this end, we propose a hierarchical encoderdecoder for graph generation. Our decoder allows arbitrary types of motifs and can assemble them efficiently without combinatorial explosion. Our encoder learns a hierarchical representation that allows the decoding process to depend on both coarsegrained motif and finegrained atom connectivity.
2.1 Motif Extraction
We define a motif as a subgraph of molecule induced by atoms in and bonds in . Given a molecule, we extract its motifs such that their union covers the entire molecular graph: and . To extract motifs, we decompose a molecule into disconnected fragments by breaking all the bridge bonds that will not violate chemical validity (illustrations in the appendix).

[leftmargin=*,topsep=0pt,itemsep=0pt]

Find all the bridge bonds , where both and have degree and either or is part of a ring. Detach all the bridge bonds from its neighbors.

Now the graph becomes a set of disconnected subgraphs . Select as motif in if its occurrence in the training set is more than .

If is not selected as motif, further decompose it into rings and bonds and select them as motif in .
We apply the above procedure to all the molecules in the training set and construct a vocabulary of motifs . In the following section, we will describe how we encode and decode molecules using the extracted motifs.
3 Hierarchical Graph Generation
Our approach extends the variational autoencoder (Kingma & Welling, 2013)
to molecular graphs by introducing a hierarchical decoder and a matching encoder. In our framework, the probability of a graph
is modeled as a joint distribution over structural motifs
constituting , together with their attachments . Each attachment indicates the intersecting atoms between and its neighbor motifs. To capture complex dependencies involved in the joint distribution of motifs and their attachments, we propose an autoregressive factorization of :(1) 
As illustrated in Figure 3, in each generation step, our decoder adds a new motif (motif prediction) and its attachment configuration (attachment prediction). Then it decides how the new motif should be attached to the current graph (graph prediction).
To support the above hierarchical generation, we need to design a matching encoder representing molecules at multiple resolutions in order to provide necessary information for each decoding step. Therefore, we propose to represent a molecule by a hierarchical graph with three layers (see Figure 2):

[leftmargin=*,topsep=0pt,itemsep=0pt]

Motif layer: This layer represents how the motifs are coarsely connected in the graph. This layer provides essential information for the motif prediction in the decoding process. Specifically, this layer contains nodes and edges for all intersecting motifs . This layer is treestructured due to our way of constructing motifs.

Attachment layer: This layer encodes the connectivity between motifs at a finegrained level. Each node in this layer represents a particular attachment configuration of motif , where are atoms in the intersection between and one of its neighbor motifs (see Figure 2). This layer provides crucial information for the attachment prediction step during decoding, which helps reducing the space of candidate attachments between and its neighbor motifs. Just like the motif vocabulary , all the attachment configurations of form a motifspecific vocabulary , which is computed from the training set.^{2}^{2}2In our experiments, the average size of attachment vocabulary and the size of motif vocabulary .

Atom layer: The atom layer is the molecular graph representing how its atoms are connected. Each atom node is associated with a label indicating its atom type and charge. Each edge in the atom layer is labeled with indicating its bond type. This layer provides necessary information for the graph prediction step during decoding.
We further introduce edges that connect the atoms and motifs between different layers in order to propagate information in between. In particular, we draw a directed edge from atom in the atom layer to node in the attachment layer if . We also draw edges from to in the motif layer. This gives us the hierarchical graph for molecule , which will be encoded by a hierarchical message passing network (MPN). During encoding, each node
is represented as a onehot encoding in the motif vocabulary
. Likewise, each node is represented as a onehot encoding in the attachment vocabulary .3.1 Hierarchical Graph Encoder
Our encoder contains three MPNs that encode each of the three layers in the hierarchical graph. For simplicity, we denote the MPN encoding process as with parameter , and denote
as a multilayer neural network whose input is the concatenation of
and . The details of MPN architecture is listed in the appendix.Atom Layer MPN We first encode the atom layer of (denoted as ). The inputs to this MPN are the embedding vectors of all the atoms and bonds in . During encoding, the network propagates the message vectors between different atoms for iterations and then outputs the atom representation for each atom :
(2) 
Attachment Layer MPN The input feature of each node in the attachment layer is an concatenation of the embedding and the sum of its atom vectors :
(3) 
The input feature for each edge in this layer is an embedding vector , where describes the relative ordering between node and during decoding. Specifically, we set if node is the th child of node and if is the parent. We then run iterations of message passing over to compute the motif representations:
(4) 
Motif Layer MPN Similarly, the input feature of node in this layer is computed as the concatenation of embedding and the node vector from the previous layer. Finally, we run message passing over the motif layer to obtain the motif representations:
(5)  
(6) 
Finally, we represent a molecule by a latent vector sampled through reparameterization trick with mean
and log variance
:(7) 
where is the root motif (i.e., the first motif to be generated during reconstruction).
3.2 Hierarchical Graph Decoder
As illustrated in Figure 3, our graph decoder generates a molecule by incrementally expanding its hierarchical graph. In generation step, we first use the same hierarchical MPN architecture to encode all the motifs and atoms in , the (partial) hierarchical graph generated till step . This gives us motif vectors and atom vectors for the existing motifs and atoms.
During decoding, the model maintains a set of frontier nodes where each node is a motif that still has neighbors to be generated. is implemented as a stack because motifs are generated in their depthfirst order. Suppose is at the top of stack in step , the model makes the following predictions conditioned on latent representation :

[leftmargin=*,topsep=0pt,itemsep=0pt]

Motif Prediction: The model predicts the next motif to be attached to . This is cast as a classification task over the motif vocabulary :
(8) 
Attachment Prediction: Now the model needs to predict the attachment configuration of motif (i.e., what atoms belong to the intersection of and its neighbor motifs). This is also cast as a classification task over the attachment vocabulary :
(9) This prediction step is crucial because it significantly reduces the space of possible attachments between and its neighbor motifs.

Graph Prediction: Finally, the model must decide how should be attached to . The attachment between and is defined as atom pairs where atom and are attached together. The probability of a candidate attachment is computed based on the atom vectors and :
(10) (11) The number of possible attachments are limited because the number of attaching atoms between two motifs is small and the attaching points must be consecutive.^{3}^{3}3In our experiments, the number of possible attachments are usually less than 20 for polymers and small molecules.
The above three predictions together give an autoregressive factorization of the distribution over the next motif and its attachment. Each of the three decoding steps depends on the outcome of previous step, and predicted attachments will in turn affect the prediction of subsequent motifs.
Training During training, we apply teacher forcing to the above generation process, where the generation order is determined by a depthfirst traversal over the ground truth molecule. Given a training set of molecules, we seek to minimize the negative ELBO:
(12) 
3.3 Extension to GraphtoGraph Translation
The proposed architecture can be naturally extended to graphtograph translation (Jin et al., 2019) for molecular optimization, which seeks to modify compounds in order to improve their biochemical properties. Given a corpus of molecular pairs , where is a structural analog of with better chemical properties, the model is trained to translate an input molecular graph into its better form. In this case, we seek to learn a translation model parameterized by our encoderdecoder architecture. We also introduce attention layers into our model, which is crucial for translation performance (Bahdanau et al., 2014).
Training In graph translation, a compound can be associated with multiple outputs since there are many ways to modify to improve its properties. In order to generate diverse outputs, we follow previous work (Zhu et al., 2017; Jin et al., 2019) and incorporate latent variables to the translation model:
(13) 
where the latent vector indicates the intended mode of translation, sampled from a prior during testing.
The model is trained as a conditional variational autoencoder. Given a training example , we sample from the approximate posterior . To compute , we first encode and into their representations and and then compute difference vector that summarizes the structural changes from molecule to at both atom and motif level:
Finally, we compute and sample using reparameterization trick. The latent code is passed to the decoder along with the input representation to reconstruct output . The training objective is to minimize negative ELBO similar to Eq.(12).
Attention For graph translation, the input molecule is embedded by our hierarchical encoder into a set of vectors , representing the molecule at multiple resolutions. These vectors are fed into the decoder through attention mechanisms (Luong et al., 2015). Specifically, we modify the motif prediction (Eq. 8) into
(14)  
(15) 
where is a bilinear attention over vectors with query vector . The attachment prediction (Eq. 9) is modified similarly with its attention over . The graph prediction (Eq. 10) is modified into
(16)  
(17) 
4 Experiments
We evaluate our method on two application tasks. The first task is polymer generative modeling. This experiment validates our argument in section 2 that our model is advantageous when the molecules have large sizes. The second task is graphtograph translation for small molecules. Here we show the proposed architecture also brings benefits to small molecules compared to previous stateoftheart graph generation methods.
4.1 Polymer Generative Modeling
Method  Reconstruction / Sample Quality ()  Property Statistics ()  Structural Statistics ()  
Recon.  Valid  Unique  Div.  logP  SA  QED  MW  SNN  Frag.  Scaf.  
Real data    100%  100%  0.823  0.094  6.7e5  1.7e5  82.3  0.706  0.995  0.462 
SMILES  21.5%  93.1%  97.3%  0.821  1.471  0.011  5.4e4  4963  0.704  0.981  0.385 
CGVAE  42.4%  100%  96.2%  0.879  3.958  2.600  0.0030  3944  0.204  0.372  0.001 
JTVAE  58.5%  100%  94.1%  0.864  2.645  0.157  0.0075  10867  0.522  0.925  0.297 
HierVAE  79.9%  100%  97.0%  0.817  0.525  0.007  5.7e4  1928  0.708  0.984  0.390 
Small motif  71.0%  100%  97.2%  0.835  0.872  0.042  0.0019  5320  0.575  0.949  0.191 
Dataset Our method is evaluated on the polymer dataset from St. John et al. (2019), which contains 86K polymers in total (after removing duplicates). The dataset is divided into 76K, 5K and 5K for training, validation and testing. Using our motif extraction, we collected 436 different motifs (examples shown in Figure 4). On average, each motif has 5.24 different attachment configurations. The distribution of motif size and their frequencies are reported in Figure 5.
Evaluation Metrics Our evaluation effort measures various aspects of molecule generation proposed in Kusner et al. (2017); Polykovskiy et al. (2018). Besides basic metrics like chemical validity and diversity, we compare distributional statistics between generated and real compounds. A good generative model should generate molecules which present similar aggregate statistics to real compounds. Our metrics include (with details shown in the appendix):

[leftmargin=*,topsep=0pt,itemsep=0pt]

Reconstruction accuracy: We measure how often the model can completely reconstruct a given molecule from its latent embedding . The reconstruction accuracy is computed over 5K compounds in the test set.

Validity: Percentage of chemically valid compounds.

Uniqueness: Percentage of unique compounds.

Diversity: We compute the pairwise molecular distance among generated compounds. The molecular distance is defined as the Tanimoto distance over Morgan fingerprints (Rogers & Hahn, 2010) of two molecules.

Property statistics: We compare property statistics between generated molecules and real data. Our properties include partition coefficient (logP), synthetic accessibility (SA), druglikeness (QED) and molecular weight (MW). To quantitatively evaluate the distance between two distributions, we compute Frechet distance between property distributions of molecules in the generated and test sets (Polykovskiy et al., 2018).

Structural statistics: We also compute structural statistics between generated molecules and real data. Nearest neighbor similarity (SNN) is the average similarity of generated molecules to the nearest molecule from the test set. Fragment similarity (Frag) and scaffold similarity (Scaf) are cosine distances between vectors of fragment or scaffold frequencies of the generated and the test set.
Baselines We compare our method against three stateoftheart variational autoencoders for molecular graphs. SMILES VAE (GómezBombarelli et al., 2018) is a sequence to sequence VAE that generates molecules based on their SMILES strings (Weininger, 1988). CGVAE (Liu et al., 2018) is a graphbased VAE generating molecules atom by atom. JTVAE (Jin et al., 2018) is also a graphbased VAE generating molecules based on simple substructures restricted to rings and bonds. Finally, we report the oracle performance of distributional statistics by using real molecules in the training set as our generated samples.
4.1.1 Results
The performance of different methods are summarized in Table 1, Our method (HierVAE) significantly outperforms all previous methods in terms of reconstruction accuracy (79.9% vs 58.5%). This validates the advantage of utilizing large structural motifs, which reduces the number of generation steps. In terms of distributional statistics, our method achieves stateoftheart results on logP (0.525 vs 1.471), molecular weight Frechet distance (1928 vs 4863) and all the structural similarity metrics. Since our model requires fewer generation steps, our training speed is much faster than other graphbased methods (see Figure 5).
Ablation Study To validate the importance of utilizing large structural motifs, we further experiment a variant of our model (), which keeps the same architecture but replaces the large structural motifs with basic substructures such as rings and bonds (with less than ten atoms). As shown in Table 1, its performance is significantly worse than our full model even though it builds on the same hierarchical architecture.
4.2 GraphtoGraph Translation
Method  logP ()  logP ()  Drug likeness  DRD2  

Improvement  Diversity  Improvement  Diversity  Success  Diversity  Success  Diversity  
JTVAE      8.8%    3.4%    
CGVAE      4.8%    2.3%    
GCPN      9.4%  0.216  4.4%  0.152  
MMPA  0.329  0.496  32.9%  0.236  46.4%  0.275  
Seq2Seq  0.331  0.471  58.5%  0.331  75.9%  0.176  
JTNN  0.333  0.480  59.9%  0.373  77.8%  0.156  
AtomG2G  0.379  3.98 1.54  0.563  73.6%  0.421  75.8%  0.128  
HierG2G  2.49 1.09  0.381  3.98 1.46  0.564  76.9%  0.477  85.9%  0.192 
Method  QED  DRD2 

HierG2G  76.9%  85.9% 
atombased decoder  76.1%  75.0% 
twolayer encoder  75.8%  83.5% 
onelayer encoder  67.8%  74.1% 
We follow the experimental design by Jin et al. (2019) and evaluate our model on their graphtograph translation tasks. Following their setup, we require the molecular similarity between and output to be above certain threshold at test time. This is to prevent the model from ignoring input and translating it into arbitrary compound. Here the molecular similarity is defined as .
Dataset The dataset consists of four property optimization tasks. In each task, we train and evaluate our model on their provided training and test sets.

[leftmargin=*,topsep=0pt,itemsep=0pt]

LogP: The penalized logP score (Kusner et al., 2017) measures the solubility and synthetic accessibility of a compound. In this task, the model needs to translate input into output such that . We experiment with two similarity thresholds .

QED: The QED score (Bickerton et al., 2012) quantifies a compound’s druglikeness. In this task, the model is required to translate molecules with QED scores from the lower range into the higher range . The similarity constraint is .

DRD2: This task involves the optimization of a compound’s biological activity against dopamine type 2 receptor (DRD2). The model needs to translate inactive compounds () into active compounds (), where the bioactivity is assessed by a property prediction model from Olivecrona et al. (2017). The similarity constraint is .
Evaluation Metrics Our evaluation metrics include translation accuracy and diversity. Each test molecule is translated times with different latent codes sampled from the prior distribution. On the logP optimization, we select compound as the final translation of that gives the highest property improvement and satisfies . We then report the average property improvement over test set . For other tasks, we report the translation success rate. A compound is successfully translated if one of its translation candidates satisfies all the similarity and property constraints of the task. To measure the diversity, for each molecule we compute the average pairwise Tanimoto distance between all its successfully translated compounds.
Baselines We compare our method against the baselines including GCPN (You et al., 2018a), MMPA (Dalke et al., 2018) and translation based methods Seq2Seq and JTNN (Jin et al., 2019). Seq2Seq is a sequencetosequence model that generates molecules by their SMILES strings. JTNN is a graphtograph architecture that generates molecules structure by structure, but its decoder is not fully autoregressive.
To make a direct comparison possible between our method and atombased generation, we further developed an atombased translation model (AtomG2G) as baseline. It makes three predictions in each generation step. First, it predicts whether the decoding process has completed (no more new atoms). If not, it creates a new atom and predicts its atom type. Lastly, it predicts the bond type between and other atoms autoregressively to fully capture edge dependencies (You et al., 2018b). The encoder of AtomG2G encodes only the atomlayer graph and the decoder attention only sees the atom vectors . All translation models are trained under the same variational objective. Details of baseline architectures are in the appendix.
4.2.1 Results
As shown in Table 2, our model (HierG2G) achieves the new stateoftheart on the four translation tasks. In particular, our model significantly outperforms JTNN in both translation accuracy (e.g., 76.9% versus 59.9% on the QED task) and output diversity (e.g., 0.564 versus 0.480 on the logP task). While both methods generate molecules by structures, our decoder is autoregressive which can learn more expressive mappings. In addition, our model runs 6.3 times faster than JTNN during decoding. Our model also outperforms AtomG2G on three datasets, with over 10% improvement on the DRD2 task. This shows the advantage of our hierarchical model.
Ablation Study To understand the importance of different architecture choices, we report ablation studies over the QED and DRD2 tasks in Table 3. We first replace our hierarchical decoder with the atombased decoder of AtomG2G to see how much the motifbased decoding benefits us. We keep the same hierarchical encoder but modified the input of the decoder attention to include both atom and motif vectors. Using this setup, the model performance decreases by 0.8% and 10.9% on the two tasks. We suspect the DRD2 task benefits more from motifbased decoding because biological target binding often depends on the presence of specific functional groups.
Our second experiment reduces the number of hierarchies in our encoder and decoder MPN, while keeping the same hierarchical decoding process. When the top motif layer is removed, the translation accuracy drops slightly by 0.8% and 2.4%. When we further remove the attachment layer (onelayer encoder), the performance degrades significantly on both datasets. This is because all the motif information is lost and the model needs to infer what motifs are and how motif layers are constructed for each molecule. This shows the importance of the hierarchical representation.
5 Related Work
Graph Generation Previous work have adopted various approaches for generating molecular graphs. GómezBombarelli et al. (2018); Segler et al. (2017); Kusner et al. (2017); Dai et al. (2018); Guimaraes et al. (2017); Olivecrona et al. (2017); Popova et al. (2018); Kang & Cho (2018) generated molecules based on their SMILES strings (Weininger, 1988). Simonovsky & Komodakis (2018); De Cao & Kipf (2018); Ma et al. (2018) developed generative models which output the adjacency matrices and node labels of the graphs at once. You et al. (2018b); Li et al. (2018); Samanta et al. (2018); Liu et al. (2018); Zhou et al. (2018) proposed generative models which decode molecules sequentially node by node. Seff et al. (2019) developed a editbased model which generates molecules based on insertions and deletions.
Our model is closely related to Liao et al. (2019) which generate graphs one block of nodes and edges at a time. While their encoder operates on original graphs, our encoder operates on multiple hierarchies and learns multiresolution representations of input graphs. Our work is also closely related to Jin et al. (2018, 2019) that generate molecules based on substructures. Their decoder first generates a junction tree with substructures as nodes, and then predicts how the substructures should be attached to each other. Their substructure attachment process involves combinatorial enumeration and therefore their model cannot scale to substructures more complex than simple rings and bonds. In contrast, our model allows the motif to have flexible structures.
Graph Encoders Graph neural networks have been extensively studied for graph encoding (Scarselli et al., 2009; Bruna et al., 2013; Li et al., 2015; Niepert et al., 2016; Kipf & Welling, 2017; Hamilton et al., 2017; Lei et al., 2017; Velickovic et al., 2017; Xu et al., 2018). Our method is related to graph encoders for molecules (Duvenaud et al., 2015; Kearnes et al., 2016; Dai et al., 2016; Gilmer et al., 2017; Schütt et al., 2017). Different to these approaches, our method represents molecules as hierarchical graphs spanning from atomlevel to motiflevel graphs.
Our work is most closely related to (Defferrard et al., 2016; Ying et al., 2018; Gao & Ji, 2019) that learn to represent graphs in a hierarchical manner. In particular, Defferrard et al. (2016) utilized graph coarsening algorithms to construct multiple layers of graph hierarchy and Ying et al. (2018); Gao & Ji (2019) proposed to learn the graph hierarchy jointly with the encoding process. Despite some differences, all of these methods learns the hierarchy for regression or classification tasks. In contrast, our hierarchy is constructed for efficient graph generation.
6 Conclusion
In this paper, we developed a hierarchical encoderdecoder architecture generating molecular graphs using structural motifs as building blocks. The experimental results show our model outperforms prior atom and substructure based methods in both small molecule and polymer domains.
References
 Bahdanau et al. (2014) Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
 Bickerton et al. (2012) Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S., and Hopkins, A. L. Quantifying the chemical beauty of drugs. Nature chemistry, 4(2):90, 2012.
 Bruna et al. (2013) Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203, 2013.
 Dai et al. (2016) Dai, H., Dai, B., and Song, L. Discriminative embeddings of latent variable models for structured data. In International Conference on Machine Learning, pp. 2702–2711, 2016.
 Dai et al. (2018) Dai, H., Tian, Y., Dai, B., Skiena, S., and Song, L. Syntaxdirected variational autoencoder for structured data. arXiv preprint arXiv:1802.08786, 2018.

Dalke et al. (2018)
Dalke, A., Hert, J., and Kramer, C.
mmpdb: An opensource matched molecular pair platform for large multiproperty data sets.
Journal of chemical information and modeling, 2018.  De Cao & Kipf (2018) De Cao, N. and Kipf, T. Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, 2018.
 Defferrard et al. (2016) Defferrard, M., Bresson, X., and Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pp. 3844–3852, 2016.
 Duvenaud et al. (2015) Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., AspuruGuzik, A., and Adams, R. P. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pp. 2224–2232, 2015.
 Gao & Ji (2019) Gao, H. and Ji, S. Graph unet. International Conference on Machine Learning, 2019.
 Gilmer et al. (2017) Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212, 2017.
 GómezBombarelli et al. (2018) GómezBombarelli, R., Wei, J. N., Duvenaud, D., HernándezLobato, J. M., SánchezLengeling, B., Sheberla, D., AguileraIparraguirre, J., Hirzel, T. D., Adams, R. P., and AspuruGuzik, A. Automatic chemical design using a datadriven continuous representation of molecules. ACS Central Science, 2018. doi: 10.1021/acscentsci.7b00572.
 Guimaraes et al. (2017) Guimaraes, G. L., SanchezLengeling, B., Farias, P. L. C., and AspuruGuzik, A. Objectivereinforced generative adversarial networks (organ) for sequence generation models. arXiv preprint arXiv:1705.10843, 2017.
 Hamilton et al. (2017) Hamilton, W. L., Ying, R., and Leskovec, J. Inductive representation learning on large graphs. arXiv preprint arXiv:1706.02216, 2017.
 Jin et al. (2018) Jin, W., Barzilay, R., and Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. International Conference on Machine Learning, 2018.
 Jin et al. (2019) Jin, W., Yang, K., Barzilay, R., and Jaakkola, T. Learning multimodal graphtograph translation for molecular optimization. International Conference on Learning Representations, 2019.
 Kang & Cho (2018) Kang, S. and Cho, K. Conditional molecular design with deep generative models. Journal of chemical information and modeling, 59(1):43–52, 2018.
 Kearnes et al. (2016) Kearnes, S., McCloskey, K., Berndl, M., Pande, V., and Riley, P. Molecular graph convolutions: moving beyond fingerprints. Journal of computeraided molecular design, 30(8):595–608, 2016.
 Kingma & Welling (2013) Kingma, D. P. and Welling, M. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
 Kipf & Welling (2017) Kipf, T. N. and Welling, M. Semisupervised classification with graph convolutional networks. International Conference on Learning Representations, 2017.
 Kusner et al. (2017) Kusner, M. J., Paige, B., and HernándezLobato, J. M. Grammar variational autoencoder. arXiv preprint arXiv:1703.01925, 2017.
 Lei et al. (2017) Lei, T., Jin, W., Barzilay, R., and Jaakkola, T. Deriving neural architectures from sequence and graph kernels. International Conference on Machine Learning, 2017.
 Li et al. (2015) Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493, 2015.
 Li et al. (2018) Li, Y., Vinyals, O., Dyer, C., Pascanu, R., and Battaglia, P. Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324, 2018.
 Liao et al. (2019) Liao, R., Li, Y., Song, Y., Wang, S., Hamilton, W., Duvenaud, D. K., Urtasun, R., and Zemel, R. Efficient graph generation with graph recurrent attention networks. In Advances in Neural Information Processing Systems, pp. 4257–4267, 2019.
 Liu et al. (2018) Liu, Q., Allamanis, M., Brockschmidt, M., and Gaunt, A. L. Constrained graph variational autoencoders for molecule design. Neural Information Processing Systems, 2018.
 Luong et al. (2015) Luong, M.T., Pham, H., and Manning, C. D. Effective approaches to attentionbased neural machine translation. arXiv preprint arXiv:1508.04025, 2015.
 Ma et al. (2018) Ma, T., Chen, J., and Xiao, C. Constrained generation of semantically valid graphs via regularizing variational autoencoders. In Advances in Neural Information Processing Systems, pp. 7113–7124, 2018.
 Niepert et al. (2016) Niepert, M., Ahmed, M., and Kutzkov, K. Learning convolutional neural networks for graphs. In International Conference on Machine Learning, pp. 2014–2023, 2016.

Olivecrona et al. (2017)
Olivecrona, M., Blaschke, T., Engkvist, O., and Chen, H.
Molecular denovo design through deep reinforcement learning.
Journal of cheminformatics, 9(1):48, 2017.  Polykovskiy et al. (2018) Polykovskiy, D., Zhebrak, A., SanchezLengeling, B., Golovanov, S., Tatanov, O., Belyaev, S., Kurbanov, R., Artamonov, A., Aladinskiy, V., Veselov, M., Kadurin, A., Nikolenko, S., AspuruGuzik, A., and Zhavoronkov, A. Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. arXiv preprint arXiv:1811.12823, 2018.
 Popova et al. (2018) Popova, M., Isayev, O., and Tropsha, A. Deep reinforcement learning for de novo drug design. Science advances, 4(7):eaap7885, 2018.
 Rogers & Hahn (2010) Rogers, D. and Hahn, M. Extendedconnectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754, 2010.
 Samanta et al. (2018) Samanta, B., De, A., Jana, G., Chattaraj, P. K., Ganguly, N., and GomezRodriguez, M. Nevae: A deep generative model for molecular graphs. arXiv preprint arXiv:1802.05283, 2018.
 Scarselli et al. (2009) Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009.
 Schütt et al. (2017) Schütt, K., Kindermans, P.J., Felix, H. E. S., Chmiela, S., Tkatchenko, A., and Müller, K.R. Schnet: A continuousfilter convolutional neural network for modeling quantum interactions. In Advances in Neural Information Processing Systems, pp. 992–1002, 2017.
 Seff et al. (2019) Seff, A., Zhou, W., Damani, F., Doyle, A., and Adams, R. P. Discrete object generation with reversible inductive construction. In Advances in Neural Information Processing Systems, pp. 10353–10363, 2019.
 Segler et al. (2017) Segler, M. H., Kogej, T., Tyrchan, C., and Waller, M. P. Generating focussed molecule libraries for drug discovery with recurrent neural networks. arXiv preprint arXiv:1701.01329, 2017.
 Simonovsky & Komodakis (2018) Simonovsky, M. and Komodakis, N. Graphvae: Towards generation of small graphs using variational autoencoders. arXiv preprint arXiv:1802.03480, 2018.
 St. John et al. (2019) St. John, P. C., Phillips, C., Kemper, T. W., Wilson, A. N., Guan, Y., Crowley, M. F., Nimlos, M. R., and Larsen, R. E. Messagepassing neural networks for highthroughput polymer screening. The Journal of chemical physics, 150(23):234111, 2019.
 Velickovic et al. (2017) Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
 Weininger (1988) Weininger, D. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
 Xu et al. (2018) Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
 Ying et al. (2018) Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., and Leskovec, J. Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems, pp. 4800–4810, 2018.
 You et al. (2018a) You, J., Liu, B., Ying, R., Pande, V., and Leskovec, J. Graph convolutional policy network for goaldirected molecular graph generation. arXiv preprint arXiv:1806.02473, 2018a.
 You et al. (2018b) You, J., Ying, R., Ren, X., Hamilton, W. L., and Leskovec, J. Graphrnn: A deep generative model for graphs. arXiv preprint arXiv:1802.08773, 2018b.
 Zhou et al. (2018) Zhou, Z., Kearnes, S., Li, L., Zare, R. N., and Riley, P. Optimization of molecules via deep reinforcement learning. arXiv preprint arXiv:1810.08678, 2018.

Zhu et al. (2017)
Zhu, J.Y., Zhang, R., Pathak, D., Darrell, T., Efros, A. A., Wang, O., and
Shechtman, E.
Toward multimodal imagetoimage translation.
In Advances in Neural Information Processing Systems, pp. 465–476, 2017.
Comments
There are no comments yet.