Hierarchical Generation of Molecular Graphs using Structural Motifs

02/08/2020 ∙ by Wengong Jin, et al. ∙ MIT 14

Graph generation techniques are increasingly being adopted for drug discovery. Previous graph generation approaches have utilized relatively small molecular building blocks such as atoms or simple cycles, limiting their effectiveness to smaller molecules. Indeed, as we demonstrate, their performance degrades significantly for larger molecules. In this paper, we propose a new hierarchical graph encoder-decoder that employs significantly larger and more flexible graph motifs as basic building blocks. Our encoder produces a multi-resolution representation for each molecule in a fine-to-coarse fashion, from atoms to connected motifs. Each level integrates the encoding of constituents below with the graph at that level. Our autoregressive coarse-to-fine decoder adds one motif at a time, interleaving the decision of selecting a new motif with the process of resolving its attachments to the emerging molecule. We evaluate our model on multiple molecule generation tasks, including polymers, and show that our model significantly outperforms previous state-of-the-art baselines.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: Left: Illustration of structural motifs in polymers. Right: Reconstruction accuracy for polymers with various sizes (number of atoms). Notably, the atom-based generative model CG-VAE (Liu et al., 2018) fails to reconstruct molecules over 80 atoms. In contrast, the proposed model maintains high accuracy for large molecules by utilizing motifs as building blocks for generation (red curve).

Deep learning models for molecule property prediction and molecule generation are improving at a fast pace. Work to date has adopted primarily two types of building blocks for representing and building molecules: atom-by-atom strategies (Li et al., 2018; You et al., 2018a; Liu et al., 2018), or substructure based (either rings or bonds) (Jin et al., 2018, 2019). While these methods have been successful for small molecules, their performance degrades significantly for larger molecules such as polymers (see Figure 1). The failure is likely due to many generation steps required to realize larger molecules and the associated challenges with gradients across the iterative steps.

Large molecules such as polymers exhibit clear hierarchical structure, being built from repeated structural motifs. We hypothesize that explicitly incorporating such motifs as building blocks in the generation process can significantly improve reconstruction and generation accuracy, as already illustrated in Figure 1. While different substructures as building blocks were considered in previous work (Jin et al., 2018), their approach could not scale to larger motifs. Indeed, their decoding process required each substructure neighborhood to be assembled in one go, making it combinatorially challenging to handle large components with many possible attachment points.

In this paper, we propose a motif-based hierarchical encoder-decoder for graph generation. The motifs themselves are extracted separately at the outset from frequently occurring substructures, regardless of size. During generation, molecules are built step by step by attaching motifs, large or small, to the emerging molecule. The decoder operates hierarchically, in a coarse-to-fine manner, and makes three key consecutive predictions in each pass: new motif selection, which part of it attaches, and the points of contact with the current molecule. These decisions are highly coupled and naturally modeled auto-regressively. Moreover, each decision is directly guided by the information explicated in the associated layer of the mirroring hierarchical encoder. The feed-forward fine-to-coarse encoding performs iterative graph convolutions at each level, conditioned on the results from layer below.

The proposed model is evaluated on various tasks ranging from polymer generative modeling to graph translation for molecule property optimization. Our baselines include state-of-the-art graph generation methods (You et al., 2018a; Liu et al., 2018; Jin et al., 2019). On polymer generation, our model achieved state-of-the art results under various metrics, outperforming the best baselines with 20% absolute improvement in reconstruction accuracy. On graph translation tasks, our model outperformed all the baselines, yielding 3.3% and 8.1% improvement on QED and DRD2 optimization tasks. During decoding, our model runs 6.3 times faster than previous substructure-based methods (Jin et al., 2019). We further conduct ablation studies to validate the advantage of using larger motifs and model architecture.

2 Background and Motivation

Molecules are represented as graphs with atoms as nodes and bonds

as edges. Graphs are challenging objects to generate, especially for larger molecules such as polymers. For the polymer dataset used in our experiment, there are thousands of molecules with more than 80 atoms. To illustrate the challenge, we tested two state-of-the-art variational autoencoders 

(Liu et al., 2018; Jin et al., 2018) on this dataset and found these models often fail to reconstruct molecules from their latent embedding (see Figure 1).

The reason of this failure is that these methods generate molecules based on small building blocks. In terms of autoregressive models, previous work on molecular graph generation can be roughly divided in two categories:

111We restrict our discussion to molecule generation. You et al. (2018b); Liao et al. (2019) developed generative models for other types of graphs such as social networks. Their current implementations do not support the prediction of node and edge attributes and cannot be directly applied to molecules. Thus their methods are not tested here.

  • [leftmargin=*,topsep=0pt,itemsep=0pt]

  • Atom-based methods (Li et al., 2018; You et al., 2018a; Liu et al., 2018) generate molecules atom by atom.

  • Substructure-based methods (Jin et al., 2018, 2019) generates molecules based on small substructures restricted to rings and bonds (often no more than six atoms).

As the building blocks are typically small, it requires many decoding steps for current models to reconstruct polymers. Therefore they are prone to make errors when generating large molecules. On the other hand, many of these molecules consist of structural motifs beyond simple substructures. The number of decoding steps can be significantly reduced if graphs are generated motif by motif. As shown in Figure 1, our motif-based method achieves a much higher reconstruction accuracy.

Motivation for New Architecture Current substructure-based method (Jin et al., 2018) requires a combinatorial enumeration to assemble substructures whose time complexity is exponential to substructure size. Their enumeration algorithm assumes the substructures to be of certain types (single cycles or bonds). In practice, their method often fails when handling rings with more than 10 atoms (e.g., memory error). Unlike substructures, motifs are typically much larger and can have flexible structures (see Figure 1). As a result, this method cannot be directly extended to utilize motifs in practice.

To this end, we propose a hierarchical encoder-decoder for graph generation. Our decoder allows arbitrary types of motifs and can assemble them efficiently without combinatorial explosion. Our encoder learns a hierarchical representation that allows the decoding process to depend on both coarse-grained motif and fine-grained atom connectivity.

2.1 Motif Extraction

We define a motif as a subgraph of molecule induced by atoms in and bonds in . Given a molecule, we extract its motifs such that their union covers the entire molecular graph: and . To extract motifs, we decompose a molecule into disconnected fragments by breaking all the bridge bonds that will not violate chemical validity (illustrations in the appendix).

  1. [leftmargin=*,topsep=0pt,itemsep=0pt]

  2. Find all the bridge bonds , where both and have degree and either or is part of a ring. Detach all the bridge bonds from its neighbors.

  3. Now the graph becomes a set of disconnected subgraphs . Select as motif in if its occurrence in the training set is more than .

  4. If is not selected as motif, further decompose it into rings and bonds and select them as motif in .

We apply the above procedure to all the molecules in the training set and construct a vocabulary of motifs . In the following section, we will describe how we encode and decode molecules using the extracted motifs.

3 Hierarchical Graph Generation

Figure 2: Hierarchical graph encoder. Dashed arrows connect each atom to the motifs it belongs. In the attachment layer, each node is a particular attachment configuration of motif . The atoms in the intersection between each motif and its neighbors are highlighted in faded block.
Figure 3:

Hierarchical graph decoder. In each step, the decoder first runs hierarchical message passing to compute motif, attachment and atom vectors. Then it performs motif and attachment prediction for the next motif node. Finally, it decides how the new motif should be attached to the current graph via graph prediction.

Our approach extends the variational autoencoder (Kingma & Welling, 2013)

to molecular graphs by introducing a hierarchical decoder and a matching encoder. In our framework, the probability of a graph

is modeled as a joint distribution over structural motifs

constituting , together with their attachments . Each attachment indicates the intersecting atoms between and its neighbor motifs. To capture complex dependencies involved in the joint distribution of motifs and their attachments, we propose an auto-regressive factorization of :


As illustrated in Figure 3, in each generation step, our decoder adds a new motif (motif prediction) and its attachment configuration (attachment prediction). Then it decides how the new motif should be attached to the current graph (graph prediction).

To support the above hierarchical generation, we need to design a matching encoder representing molecules at multiple resolutions in order to provide necessary information for each decoding step. Therefore, we propose to represent a molecule by a hierarchical graph with three layers (see Figure 2):

  1. [leftmargin=*,topsep=0pt,itemsep=0pt]

  2. Motif layer: This layer represents how the motifs are coarsely connected in the graph. This layer provides essential information for the motif prediction in the decoding process. Specifically, this layer contains nodes and edges for all intersecting motifs . This layer is tree-structured due to our way of constructing motifs.

  3. Attachment layer: This layer encodes the connectivity between motifs at a fine-grained level. Each node in this layer represents a particular attachment configuration of motif , where are atoms in the intersection between and one of its neighbor motifs (see Figure 2). This layer provides crucial information for the attachment prediction step during decoding, which helps reducing the space of candidate attachments between and its neighbor motifs. Just like the motif vocabulary , all the attachment configurations of form a motif-specific vocabulary , which is computed from the training set.222In our experiments, the average size of attachment vocabulary and the size of motif vocabulary .

  4. Atom layer: The atom layer is the molecular graph representing how its atoms are connected. Each atom node is associated with a label indicating its atom type and charge. Each edge in the atom layer is labeled with indicating its bond type. This layer provides necessary information for the graph prediction step during decoding.

We further introduce edges that connect the atoms and motifs between different layers in order to propagate information in between. In particular, we draw a directed edge from atom in the atom layer to node in the attachment layer if . We also draw edges from to in the motif layer. This gives us the hierarchical graph for molecule , which will be encoded by a hierarchical message passing network (MPN). During encoding, each node

is represented as a one-hot encoding in the motif vocabulary

. Likewise, each node is represented as a one-hot encoding in the attachment vocabulary .

3.1 Hierarchical Graph Encoder

Our encoder contains three MPNs that encode each of the three layers in the hierarchical graph. For simplicity, we denote the MPN encoding process as with parameter , and denote

as a multi-layer neural network whose input is the concatenation of

and . The details of MPN architecture is listed in the appendix.

Atom Layer MPN We first encode the atom layer of (denoted as ). The inputs to this MPN are the embedding vectors of all the atoms and bonds in . During encoding, the network propagates the message vectors between different atoms for iterations and then outputs the atom representation for each atom :


Attachment Layer MPN The input feature of each node in the attachment layer is an concatenation of the embedding and the sum of its atom vectors :


The input feature for each edge in this layer is an embedding vector , where describes the relative ordering between node and during decoding. Specifically, we set if node is the -th child of node and if is the parent. We then run iterations of message passing over to compute the motif representations:


Motif Layer MPN Similarly, the input feature of node in this layer is computed as the concatenation of embedding and the node vector from the previous layer. Finally, we run message passing over the motif layer to obtain the motif representations:


Finally, we represent a molecule by a latent vector sampled through reparameterization trick with mean

and log variance



where is the root motif (i.e., the first motif to be generated during reconstruction).

3.2 Hierarchical Graph Decoder

As illustrated in Figure 3, our graph decoder generates a molecule by incrementally expanding its hierarchical graph. In generation step, we first use the same hierarchical MPN architecture to encode all the motifs and atoms in , the (partial) hierarchical graph generated till step . This gives us motif vectors and atom vectors for the existing motifs and atoms.

During decoding, the model maintains a set of frontier nodes where each node is a motif that still has neighbors to be generated. is implemented as a stack because motifs are generated in their depth-first order. Suppose is at the top of stack in step , the model makes the following predictions conditioned on latent representation :

  1. [leftmargin=*,topsep=0pt,itemsep=0pt]

  2. Motif Prediction: The model predicts the next motif to be attached to . This is cast as a classification task over the motif vocabulary :

  3. Attachment Prediction: Now the model needs to predict the attachment configuration of motif (i.e., what atoms belong to the intersection of and its neighbor motifs). This is also cast as a classification task over the attachment vocabulary :


    This prediction step is crucial because it significantly reduces the space of possible attachments between and its neighbor motifs.

  4. Graph Prediction: Finally, the model must decide how should be attached to . The attachment between and is defined as atom pairs where atom and are attached together. The probability of a candidate attachment is computed based on the atom vectors and :


    The number of possible attachments are limited because the number of attaching atoms between two motifs is small and the attaching points must be consecutive.333In our experiments, the number of possible attachments are usually less than 20 for polymers and small molecules.

The above three predictions together give an autoregressive factorization of the distribution over the next motif and its attachment. Each of the three decoding steps depends on the outcome of previous step, and predicted attachments will in turn affect the prediction of subsequent motifs.

Training During training, we apply teacher forcing to the above generation process, where the generation order is determined by a depth-first traversal over the ground truth molecule. Given a training set of molecules, we seek to minimize the negative ELBO:


3.3 Extension to Graph-to-Graph Translation

The proposed architecture can be naturally extended to graph-to-graph translation (Jin et al., 2019) for molecular optimization, which seeks to modify compounds in order to improve their biochemical properties. Given a corpus of molecular pairs , where is a structural analog of with better chemical properties, the model is trained to translate an input molecular graph into its better form. In this case, we seek to learn a translation model parameterized by our encoder-decoder architecture. We also introduce attention layers into our model, which is crucial for translation performance (Bahdanau et al., 2014).

Training In graph translation, a compound can be associated with multiple outputs since there are many ways to modify to improve its properties. In order to generate diverse outputs, we follow previous work (Zhu et al., 2017; Jin et al., 2019) and incorporate latent variables to the translation model:


where the latent vector indicates the intended mode of translation, sampled from a prior during testing.

The model is trained as a conditional variational autoencoder. Given a training example , we sample from the approximate posterior . To compute , we first encode and into their representations and and then compute difference vector that summarizes the structural changes from molecule to at both atom and motif level:

Finally, we compute and sample using reparameterization trick. The latent code is passed to the decoder along with the input representation to reconstruct output . The training objective is to minimize negative ELBO similar to Eq.(12).

Attention For graph translation, the input molecule is embedded by our hierarchical encoder into a set of vectors , representing the molecule at multiple resolutions. These vectors are fed into the decoder through attention mechanisms (Luong et al., 2015). Specifically, we modify the motif prediction (Eq. 8) into


where is a bilinear attention over vectors with query vector . The attachment prediction (Eq. 9) is modified similarly with its attention over . The graph prediction (Eq. 10) is modified into


4 Experiments

We evaluate our method on two application tasks. The first task is polymer generative modeling. This experiment validates our argument in section 2 that our model is advantageous when the molecules have large sizes. The second task is graph-to-graph translation for small molecules. Here we show the proposed architecture also brings benefits to small molecules compared to previous state-of-the-art graph generation methods.

4.1 Polymer Generative Modeling

Method Reconstruction / Sample Quality () Property Statistics () Structural Statistics ()
Recon. Valid Unique Div. logP SA QED MW SNN Frag. Scaf.
Real data - 100% 100% 0.823 0.094 6.7e-5 1.7e-5 82.3 0.706 0.995 0.462
SMILES 21.5% 93.1% 97.3% 0.821 1.471 0.011 5.4e-4 4963 0.704 0.981 0.385
CG-VAE 42.4% 100% 96.2% 0.879 3.958 2.600 0.0030 3944 0.204 0.372 0.001
JT-VAE 58.5% 100% 94.1% 0.864 2.645 0.157 0.0075 10867 0.522 0.925 0.297
HierVAE 79.9% 100% 97.0% 0.817 0.525 0.007 5.7e-4 1928 0.708 0.984 0.390
Small motif 71.0% 100% 97.2% 0.835 0.872 0.042 0.0019 5320 0.575 0.949 0.191
Table 1: Results on polymer generative modeling. The first row reports the oracle performance using real data as generated samples. The last row (small motif) is a variant of our model where we restrict the motif vocabulary to contain only single rings and bonds (similar to JT-VAE). “Recon.” means reconstruction accuracy; “Div.” means diversity; SNN means nearest neighbor similarity; “Frag / Scaf” means fragment and scaffold similarity. Except property statistics, all metrics are the higher the better.

Dataset Our method is evaluated on the polymer dataset from St. John et al. (2019), which contains 86K polymers in total (after removing duplicates). The dataset is divided into 76K, 5K and 5K for training, validation and testing. Using our motif extraction, we collected 436 different motifs (examples shown in Figure 4). On average, each motif has 5.24 different attachment configurations. The distribution of motif size and their frequencies are reported in Figure 5.

Evaluation Metrics Our evaluation effort measures various aspects of molecule generation proposed in Kusner et al. (2017); Polykovskiy et al. (2018). Besides basic metrics like chemical validity and diversity, we compare distributional statistics between generated and real compounds. A good generative model should generate molecules which present similar aggregate statistics to real compounds. Our metrics include (with details shown in the appendix):

  • [leftmargin=*,topsep=0pt,itemsep=0pt]

  • Reconstruction accuracy: We measure how often the model can completely reconstruct a given molecule from its latent embedding . The reconstruction accuracy is computed over 5K compounds in the test set.

  • Validity: Percentage of chemically valid compounds.

  • Uniqueness: Percentage of unique compounds.

  • Diversity: We compute the pairwise molecular distance among generated compounds. The molecular distance is defined as the Tanimoto distance over Morgan fingerprints (Rogers & Hahn, 2010) of two molecules.

  • Property statistics: We compare property statistics between generated molecules and real data. Our properties include partition coefficient (logP), synthetic accessibility (SA), drug-likeness (QED) and molecular weight (MW). To quantitatively evaluate the distance between two distributions, we compute Frechet distance between property distributions of molecules in the generated and test sets (Polykovskiy et al., 2018).

  • Structural statistics: We also compute structural statistics between generated molecules and real data. Nearest neighbor similarity (SNN) is the average similarity of generated molecules to the nearest molecule from the test set. Fragment similarity (Frag) and scaffold similarity (Scaf) are cosine distances between vectors of fragment or scaffold frequencies of the generated and the test set.

Figure 4: Examples of motif structures utilized by our model. These motifs consist of multiple rings and bonds, which are substantially more complex than previous methods (Jin et al., 2018).

Baselines We compare our method against three state-of-the-art variational autoencoders for molecular graphs. SMILES VAE (Gómez-Bombarelli et al., 2018) is a sequence to sequence VAE that generates molecules based on their SMILES strings (Weininger, 1988). CG-VAE (Liu et al., 2018) is a graph-based VAE generating molecules atom by atom. JT-VAE (Jin et al., 2018) is also a graph-based VAE generating molecules based on simple substructures restricted to rings and bonds. Finally, we report the oracle performance of distributional statistics by using real molecules in the training set as our generated samples.

4.1.1 Results

Figure 5: Left: Histogram of motif frequencies with respect to their sizes (i.e., number of atoms). Right: Training speed comparison between our method and baselines (on the same hardware).

The performance of different methods are summarized in Table 1, Our method (HierVAE) significantly outperforms all previous methods in terms of reconstruction accuracy (79.9% vs 58.5%). This validates the advantage of utilizing large structural motifs, which reduces the number of generation steps. In terms of distributional statistics, our method achieves state-of-the-art results on logP (0.525 vs 1.471), molecular weight Frechet distance (1928 vs 4863) and all the structural similarity metrics. Since our model requires fewer generation steps, our training speed is much faster than other graph-based methods (see Figure 5).

Ablation Study To validate the importance of utilizing large structural motifs, we further experiment a variant of our model (), which keeps the same architecture but replaces the large structural motifs with basic substructures such as rings and bonds (with less than ten atoms). As shown in Table 1, its performance is significantly worse than our full model even though it builds on the same hierarchical architecture.

4.2 Graph-to-Graph Translation

Method logP () logP () Drug likeness DRD2
Improvement Diversity Improvement Diversity Success Diversity Success Diversity
JT-VAE - - 8.8% - 3.4% -
CG-VAE - - 4.8% - 2.3% -
GCPN - - 9.4% 0.216 4.4% 0.152
MMPA 0.329 0.496 32.9% 0.236 46.4% 0.275
Seq2Seq 0.331 0.471 58.5% 0.331 75.9% 0.176
JTNN 0.333 0.480 59.9% 0.373 77.8% 0.156
AtomG2G 0.379 3.98 1.54 0.563 73.6% 0.421 75.8% 0.128
HierG2G 2.49 1.09 0.381 3.98 1.46 0.564 76.9% 0.477 85.9% 0.192
Table 2: Results on graph translation tasks from Jin et al. (2019). We report average improvement for continuous properties (logP), and success rate for binary properties (e.g., DRD2).
Method QED DRD2
HierG2G 76.9% 85.9%
atom-based decoder 76.1% 75.0%
two-layer encoder 75.8% 83.5%
one-layer encoder 67.8% 74.1%
Table 3: Ablation study: the importance of hierarchical graph encoding, LSTM MPN architecture and structure-based decoding.

We follow the experimental design by Jin et al. (2019) and evaluate our model on their graph-to-graph translation tasks. Following their setup, we require the molecular similarity between and output to be above certain threshold at test time. This is to prevent the model from ignoring input and translating it into arbitrary compound. Here the molecular similarity is defined as .

Dataset The dataset consists of four property optimization tasks. In each task, we train and evaluate our model on their provided training and test sets.

  • [leftmargin=*,topsep=0pt,itemsep=0pt]

  • LogP: The penalized logP score (Kusner et al., 2017) measures the solubility and synthetic accessibility of a compound. In this task, the model needs to translate input into output such that . We experiment with two similarity thresholds .

  • QED: The QED score (Bickerton et al., 2012) quantifies a compound’s drug-likeness. In this task, the model is required to translate molecules with QED scores from the lower range into the higher range . The similarity constraint is .

  • DRD2: This task involves the optimization of a compound’s biological activity against dopamine type 2 receptor (DRD2). The model needs to translate inactive compounds () into active compounds (), where the bioactivity is assessed by a property prediction model from Olivecrona et al. (2017). The similarity constraint is .

Evaluation Metrics Our evaluation metrics include translation accuracy and diversity. Each test molecule is translated times with different latent codes sampled from the prior distribution. On the logP optimization, we select compound as the final translation of that gives the highest property improvement and satisfies . We then report the average property improvement over test set . For other tasks, we report the translation success rate. A compound is successfully translated if one of its translation candidates satisfies all the similarity and property constraints of the task. To measure the diversity, for each molecule we compute the average pairwise Tanimoto distance between all its successfully translated compounds.

Baselines We compare our method against the baselines including GCPN (You et al., 2018a), MMPA (Dalke et al., 2018) and translation based methods Seq2Seq and JTNN (Jin et al., 2019). Seq2Seq is a sequence-to-sequence model that generates molecules by their SMILES strings. JTNN is a graph-to-graph architecture that generates molecules structure by structure, but its decoder is not fully autoregressive.

To make a direct comparison possible between our method and atom-based generation, we further developed an atom-based translation model (AtomG2G) as baseline. It makes three predictions in each generation step. First, it predicts whether the decoding process has completed (no more new atoms). If not, it creates a new atom and predicts its atom type. Lastly, it predicts the bond type between and other atoms autoregressively to fully capture edge dependencies (You et al., 2018b). The encoder of AtomG2G encodes only the atom-layer graph and the decoder attention only sees the atom vectors . All translation models are trained under the same variational objective. Details of baseline architectures are in the appendix.

4.2.1 Results

As shown in Table 2, our model (HierG2G) achieves the new state-of-the-art on the four translation tasks. In particular, our model significantly outperforms JTNN in both translation accuracy (e.g., 76.9% versus 59.9% on the QED task) and output diversity (e.g., 0.564 versus 0.480 on the logP task). While both methods generate molecules by structures, our decoder is autoregressive which can learn more expressive mappings. In addition, our model runs 6.3 times faster than JTNN during decoding. Our model also outperforms AtomG2G on three datasets, with over 10% improvement on the DRD2 task. This shows the advantage of our hierarchical model.

Ablation Study To understand the importance of different architecture choices, we report ablation studies over the QED and DRD2 tasks in Table 3. We first replace our hierarchical decoder with the atom-based decoder of AtomG2G to see how much the motif-based decoding benefits us. We keep the same hierarchical encoder but modified the input of the decoder attention to include both atom and motif vectors. Using this setup, the model performance decreases by 0.8% and 10.9% on the two tasks. We suspect the DRD2 task benefits more from motif-based decoding because biological target binding often depends on the presence of specific functional groups.

Our second experiment reduces the number of hierarchies in our encoder and decoder MPN, while keeping the same hierarchical decoding process. When the top motif layer is removed, the translation accuracy drops slightly by 0.8% and 2.4%. When we further remove the attachment layer (one-layer encoder), the performance degrades significantly on both datasets. This is because all the motif information is lost and the model needs to infer what motifs are and how motif layers are constructed for each molecule. This shows the importance of the hierarchical representation.

5 Related Work

Graph Generation Previous work have adopted various approaches for generating molecular graphs. Gómez-Bombarelli et al. (2018); Segler et al. (2017); Kusner et al. (2017); Dai et al. (2018); Guimaraes et al. (2017); Olivecrona et al. (2017); Popova et al. (2018); Kang & Cho (2018) generated molecules based on their SMILES strings (Weininger, 1988). Simonovsky & Komodakis (2018); De Cao & Kipf (2018); Ma et al. (2018) developed generative models which output the adjacency matrices and node labels of the graphs at once. You et al. (2018b); Li et al. (2018); Samanta et al. (2018); Liu et al. (2018); Zhou et al. (2018) proposed generative models which decode molecules sequentially node by node. Seff et al. (2019) developed a edit-based model which generates molecules based on insertions and deletions.

Our model is closely related to Liao et al. (2019) which generate graphs one block of nodes and edges at a time. While their encoder operates on original graphs, our encoder operates on multiple hierarchies and learns multi-resolution representations of input graphs. Our work is also closely related to Jin et al. (2018, 2019) that generate molecules based on substructures. Their decoder first generates a junction tree with substructures as nodes, and then predicts how the substructures should be attached to each other. Their substructure attachment process involves combinatorial enumeration and therefore their model cannot scale to substructures more complex than simple rings and bonds. In contrast, our model allows the motif to have flexible structures.

Graph Encoders Graph neural networks have been extensively studied for graph encoding (Scarselli et al., 2009; Bruna et al., 2013; Li et al., 2015; Niepert et al., 2016; Kipf & Welling, 2017; Hamilton et al., 2017; Lei et al., 2017; Velickovic et al., 2017; Xu et al., 2018). Our method is related to graph encoders for molecules (Duvenaud et al., 2015; Kearnes et al., 2016; Dai et al., 2016; Gilmer et al., 2017; Schütt et al., 2017). Different to these approaches, our method represents molecules as hierarchical graphs spanning from atom-level to motif-level graphs.

Our work is most closely related to (Defferrard et al., 2016; Ying et al., 2018; Gao & Ji, 2019) that learn to represent graphs in a hierarchical manner. In particular, Defferrard et al. (2016) utilized graph coarsening algorithms to construct multiple layers of graph hierarchy and Ying et al. (2018); Gao & Ji (2019) proposed to learn the graph hierarchy jointly with the encoding process. Despite some differences, all of these methods learns the hierarchy for regression or classification tasks. In contrast, our hierarchy is constructed for efficient graph generation.

6 Conclusion

In this paper, we developed a hierarchical encoder-decoder architecture generating molecular graphs using structural motifs as building blocks. The experimental results show our model outperforms prior atom and substructure based methods in both small molecule and polymer domains.