ConfVAEICML21
None
view repo
Predicting molecular conformations (or 3D structures) from molecular graphs is a fundamental problem in many applications. Most existing approaches are usually divided into two steps by first predicting the distances between atoms and then generating a 3D structure through optimizing a distance geometry problem. However, the distances predicted with such twostage approaches may not be able to consistently preserve the geometry of local atomic neighborhoods, making the generated structures unsatisfying. In this paper, we propose an endtoend solution for molecular conformation prediction called ConfVAE based on the conditional variational autoencoder framework. Specifically, the molecular graph is first encoded in a latent space, and then the 3D structures are generated by solving a principled bilevel optimization program. Extensive experiments on several benchmark data sets prove the effectiveness of our proposed approach over existing stateoftheart approaches.
READ FULL TEXT VIEW PDFNone
Recently we have witnessed much success of deep learning for molecule modeling in a variety of applications ranging from molecule property prediction
(Gilmer et al., 2017) and molecule generation (You et al., 2018; Shi et al., 2020b) to retrosynthesis planning (Shi et al., 2020a). In these applications, molecules are generally represented as graphs with atoms as nodes and covalent chemical bonds as edges. Although this is empirically effective, in reality molecules are better represented as 3D structures (also known as conformations), where each atom is characterized by 3D Cartesian coordinates. Such 3D structures are also more intrinsic and informative, determining many chemical and biological properties such as chemical sensing and therapeutic interactions with proteins.However, determining the 3D structures from experiments is challenging and costly. Effectively predicting valid and lowenergy conformations has been a very important and active topic in computational chemistry. Traditional computational approaches are typically based on Markov chain Monte Carlo (MCMC) or molecular dynamics (MD)
(De Vivo et al., 2016) to propose conformations combined with simulations to assign energies through cheap empirical potentials or expensive quantum chemical simulations (Ballard et al., 2015). Recently, there is growing interest in developing machine learning approaches (Mansimov et al., 2019; Simm and HernándezLobato, 2020; Xu et al., 2021) to model the conditional distribution of stable conformations given the molecular graph by training on a collection of molecules with available stable conformations. Specifically, two recent works (Simm and HernándezLobato, 2020; Xu et al., 2021) propose to first predict the distances between atoms and then generate molecular conformations based on the predicted distances by solving a distance geometry problem (Liberti et al., 2014). Such approaches based on distance geometry effectively take into account the rotation and translation invariance of molecular conformations and have hence achieved very promising performance.However, there is still a significant limitation for these twostage approaches, which predict the distances and conformations separately: the predicted distances might not be able to properly preserve the 3D geometry of local atomic neighborhoods. Some invalid combinations of distances could be assigned a high likelihood according to the distance prediction model. The errors in these distances could be significantly exaggerated by the distance geometry program of the second stage, yielding unrealistic outlier samples of 3D structures. This is not surprising as the distance prediction model is trained by maximizing the factorized likelihood of distances while our end goal is to predict valid and stable conformations. We propose to effectively address this issue with an endtoend solution which directly predicts the conformation given the molecular graph. Indeed, in a related problem of predicting 3D structures of proteins (
a.k.a. protein structure prediction) based on aminoacid sequences, the recent huge success of the AlphaFold2 algorithm shows the importance and effectiveness of developing an endtoend solution compared to the previous AlphaFold algorithm (though exact details of AlphaFold2 algorithm are still lacking) (Senior et al., 2020a; Jumper et al., 2020).In this paper, we propose such an endtoend solution called ConfVAE for molecular conformation generation, based on bilevel programming. To model the rotational and translational invariance of conformations, we still take the pairwise distances among atoms as intermediate variables. However, instead of learning to predict distances by minimizing errors in the space of distance, we formulate the whole problem as bilevel programming (Franceschi et al., 2018), with the distance prediction problem and the distance geometry problem for conformation generation being simultaneously optimized. The whole framework is built on the conditional variational autoencoder (VAE) framework (Kingma and Welling, 2013), in which the molecular graph is first encoded into the VAE latent space, and the conformations are generated based on the latent variable and molecular graph. During training, we iteratively sample a set of distances from the distance prediction model, generate the 3D structures by minimizing an inner objective (defined by the distance geometry problem), and then update the distance prediction model by optimizing the outer objective, i.e., the likelihood directly defined on the conformations.
To the best of our knowledge, ConfVAE is the first method for molecular conformation generation which can be trained in an endtoend fashion and at the same time keep the property of rotational and translational invariance. Extensive experiments demonstrate the superior performance of the proposed method over existing stateoftheart approaches on several widely used benchmarks including conformation generation and distance distribution modeling. We also verify that the endtoend objective is of vital importance for generating realistic and meaningful conformations.
Notations. Following existing work (Simm and HernándezLobato, 2020; Xu et al., 2021), each molecule is represented as an attributed atombond graph , where is the set of vertices representing atoms and is the set of edges representing interatomic bonds. Each node in describes the chosen atomic features such as element type. Each edge in describes the corresponding chemical bond connecting and , and is labeled with its bond type. Since the distances of bonds existing in the molecular graph are not sufficient to determine an unique conformation (e.g.due to socalled internal rotations around the axis of the bond), we adopt the common preprocessing methodology in existing works (Simm and HernándezLobato, 2020; Xu et al., 2021) to expand the graphs by incorporating auxiliary edges, which force multihop distance constraint eliminating some ambiguities in the 3D conformation, as elaborated in Appendix A.
For the geometry , each atom in
is represented by a 3D coordinate vector
, and the full set of positions is represented by the matrix . Let denote the Euclidean distance between the and atom, then all the distances between connected nodes can be summarized as a vector .Problem Definition. The problem of molecular conformation generation is a conditional generation process, where the goal is to model the conditional distribution of molecular conformations given the graph , i.e., .
Bilevel programs are defined as optimization problems where a set of variables involved in the (outer) objective function are obtained by solving another (inner) optimization problem (Colson et al., 2007). Formally, given the outer objective function and the inner objective , and the corresponding outer and inner variables and , a bilevel program can be formulated by
(1) 
Bilevel programs have shown effectiveness in a variety of situations such as hyperparameter optimization, adversarial and multitask learning, as well as metalearning
(Maclaurin et al., 2015; Bengio, 2000; Bennett et al., 2006; Flamary et al., 2014; MuñozGonzález et al., 2017; Franceschi et al., 2018).Typically solving equation 1 is intractable since the solution sets of may not be available in closed form (Bengio, 2000). A common approach is to replace the exact minimizer of the inner object with an approximation solution, which can be obtained through an iterative optimization dynamics
such as stochastic gradient descent (SGD)
(Domke, 2012; Maclaurin et al., 2015; Franceschi et al., 2017). Starting from the initial parameter , we can get the approximate solution by running iterations of the inner optimization dynamics , i.e., , and so on. In the general case where and are realvalued and the objectives and optimization dynamics is smooth, the gradient of the object w.r.t. , named hypergradient , can be computed by:(2) 
where denotes the partial derivative to compute the Jacobian on immediate variables while denotes a total derivative taking into account the recursive calls to . The above gradient can be efficiently calculated by unrolling the optimization dynamics with backpropagation, i.e., reversemode automatic differentiation (Griewank and Walther, 2008), where we repeatedly substitute
and apply the chain rule.
In this section we elaborate on the proposed endtoend framework. We first present a highlevel description of our bilevel formulation in Sec. 3.1. Then we present the model schematic and training objectives in Sec. 3.2. Finally we show how to learn the model via hypergradient descent in Sec. 3.3 and how to draw samples in Sec. 3.4.
Since a molecule can have multiple stable conformations, we model the distribution of conformations conditioning on molecular graph (i.e. ) with a conditional variational autoencoder (CVAE) (Kingma and Welling, 2013), in which a latent variable is introduced to model the uncertainty in molecule conformation generation. The CVAE model includes a prior distribution of latent variable and a decoder to capture the conditional distribution of given . During training, we also involve an additional inference model (encoder) . The encoder and decoder are jointly trained to maximize the evidence lower bound (ELBO) of the data loglikelihood:
(3)  
The ELBO can be interpreted as the sum of the negative reconstruction error (the first term) and a latent space prior regularizer (the second term). In practice, and are all modeled as diagonal Gaussians and
, whose mean and standard deviation are predicted by graph neural networks. To efficiently optimize the ELBO during training, sampling from
is done by reparametrizing as , where .With similar encoder and prior models, the key differences among different methods lie in the architecture and learning method of the decoder (generator) model , i.e., how to parameterize the decoder and train it with respect to the reconstruction loss . Let denote the decoder function taking prior and graph to obtain a distance vector, we now elaborate how we formulate the optimization problem of the decoder as a bilevel program:
Inner objective: Directly generating conformations as Cartesian coordinates heavily depends on the arbitrary rotation and translation. Therefore, previous effective approaches (Simm and HernándezLobato, 2020; Xu et al., 2021) instead make the decoder generate interatomic distances , i.e., . The distances are taken as intermediate variables to generate conformations, which are invariant to rotation and translation. To generate a conformation , one needs to first generate the set of distances , and then postprocess to obtain the 3D positions , by solving a distance geometry optimization problem:
(4)  
which we take as the inner loop objective.
Outer objective: Ultimately, we are interested in directly minimizing the generalization error on 3D structures to make the generated conformation consistent with the groundtruth up to rotation and translation. The postalignment RootMeanSquare Deviation (RMSD) is a widely used metric for this purpose. To calculate this metric, another conformation is first obtained by an alignment function , which rotates and translates the reference conformation to have the smallest distance to the generated one according to the RMSD metric:
(5) 
where is the number of atoms. Then the reconstruction objective can be written as:
(6)  
which is the outer loop objective for computing the reconstruction loss and maximize the loglikelihood.
Bilevel program: Now we can consider equation 4 and equation 6 as the inner and outer objectives of a bilevel programming problem. In this formulation, the outer objective aims to model the true conditional distribution , and the inner objective solves for the conformation given a set of predicted distances. By taking the expectation over latent variable , the resulting bilevel program for calculating the reconstruction term in equation 3 can be written as:
(7)  
such  (8) 
The derived bilevel problem is still challenging because: 1) the solution of conformation structure in the inner problem is not available in closed form; 2) computing this expectation exactly over the continuous latent space is intractable. Thus, in practice we compute an empirical estimation of the output with a variational inference model and the reparametrization trick. We elaborate on how we address these issues in the following parts.
We now have the tools needed to define our conditional generative model of molecular conformation. The cornerstone of all modules (encoder, prior and decoder) is messagepassing neural networks (MPNNs) (Gilmer et al., 2017), which is a variant of graph neural networks that achieves stateoftheart performance in representation learning for molecules (Scarselli et al., 2008; Bruna et al., 2013; Duvenaud et al., 2015; Kipf and Welling, 2016; Kearnes et al., 2016; Schütt et al., 2017). The MPNN directly operates on the graph representation and is invariant to graph isomorphism. In each convolutional (message passing) layer, atomic embeddings are updated by aggregating the information from neighboring nodes.
For the encoder and prior , we use the same MPNN architecture as Mansimov et al. (2019); Simm and HernándezLobato (2020)
. Since bilevel optimization has a relatively high memory cost, we use an ordinary differential equation (ODE)based continuous normalizing flow
(Chen et al., 2018) (CNF) for the decoder , which has constant memory cost. We describe the details of our decoder model below.Decoder Architecture. As illustrated in Sec. 3.1, our decoder is composed of two cascaded levels: a distance prediction model that decodes back into a set of distances , and a differentiable distance geometry procedure to recover geometry from distances . The model is implemented as a conditional extension of the CNF which transforms noise variables (also the initial distances in the CNF ODE trajectory) sampled from the prior distribution to final distances . The transformation is conditioned on the latent variable as well as the graph :
(9)  
where is an MPNN that defines the continuoustime dynamics of the flow conditioned on and . Note that, given the true distances , can also be easily computed by reversing the continuous dynamics : . And thus the exact conditional loglikelihood of distances given can be computed by:
(10)  
An ODE solver can then be applied to estimate the gradients on parameters for optimization. In practice, can be taken as an auxiliary objective defined on distances to supervise the training. In summary, the training objective can be interpreted as the sum of three parts:
(11) 
where and are hyperparameters to reweight each component. The overall framework is illustrated in Fig. 1.
We now discuss how to optimize the bilevel problem defined by equation 8 and equation 7 through a practical algorithm. The inner problem in equation 8 is a classic distance geometry problem about how to infer 3D coordinates from pairwise distances (Anand and Huang, 2018; Simm and HernándezLobato, 2020; Xu et al., 2021). Others have used a semidefinite program (SDP) to infer protein structure from nuclear magnetic resonance data (Alipanahi et al., 2013), or an Alternating Direction Method of Multipliers (ADMM) algorithm to fold the protein into the 3D Cartesian coordinates (Anand and Huang, 2018). In this initial work we choose gradient descent (GD), with tractable learning dynamics , to approximately solve for the geometry:
(12) 
where is the learning rate and is the distance set generated from the distance prediction model. Under appropriate assumptions and for a number of updates , GD can converge to a proper geometry that depends on the predicted pairwise distances (Bottou, 2010).
Now we consider how to calculate the hypergradient from the outer loop reconstruction objective (equation 7) to train the model. Let denote the conformation generated by approximately solving for the distance geometry with steps gradient descent. Now we can write the hypergradient as:
(13)  
where the gradient can be computed by fully unrolling the dynamics of inner loop from to . Specifically, in the forward computation, successive geometries resulting from the optimization dynamics are cached. In the backward call, the cached geometries are used to compute gradients in a series of VectorJacobian Products (VJPs). During the reverse computation, the gradient starting from the can be propagated to the intermediate geometries through :
(14)  
where denotes the Hessian w.r.t. . With iteratively computed derivatives , the adjoints on
can be computed in forms of VJPs and further backpropagated to the parameters of encoder
and decoder . Formally, is computed by:(15)  
where can be substituted by equation 14
. The computation can be done efficiently with reversemode automatic differentiation software such as PyTorch
(Paszke et al., 2019). A schematic illustration of the forward and backward computational graph through distance geometry is presented in Fig. 2. We provide a detailed algorithm of the training procedure in Appendix. B.Given the graph , to generate a conformation , we first draw the latent variable from the prior distribution . Then we sample the random initial distances
from a Gaussian distribution, then pass
through the invertible Neural ODE conditioned on and to obtain the distance set . Then we produce the conformation by solving the distance geometry optimization problem as defined in equation 4.Evaluation Tasks. Following previous work on conformation generation (Mansimov et al., 2019; Simm and HernándezLobato, 2020; Xu et al., 2021), we conduct extensive experiments by comparing our method with the stateoftheart baseline models on several standard tasks. Conformation Generation is formulated by Xu et al. (2021), who concentrate on the models’ capacity to generate realistic and diverse molecular conformations. Distance distribution modeling is first proposed by Simm and HernándezLobato (2020), who evaluate whether the methods can model the underlying distribution of distances.
Baselines. We compared our proposed model with the following stateoftheart conformation generation methods. CVGAE (Mansimov et al., 2019) is a conditional VAEbased model, which applied a few layers of graph neural networks to learn the atom representation from the molecular graph, and then directly predicts the 3D coordinates. GraphDG (Simm and HernándezLobato, 2020) also employs the conditional VAE framework. Instead of directly generating the conformations in 3D coordinates, they instead learn the distribution over distances. Then the distances are converted into conformations with a distance geometry algorithm. CGCF (Xu et al., 2021), another twostage method, uses continuous normalizing flows to predict the atomic pairwise distances. Following the baselines, we also compare our model with RDKit (Riniker and Landrum, 2015), a classical distance geometry approach built upon an extensive calculation collection of edge lengths by computational chemistry.
Featurization and Implementation. The MPNNs used for the encoder, prior and decoder are all implemented as Graph Isomorphism Networks (Xu et al., 2018; Hu et al., 2019). For the input features of the graph representation, we only derive the atom and bond types from molecular graphs. As a default setup, the MPNNs are all implemented with layers, and the hidden embedding dimension is set as . For the training of ConfVAE, we train the model on a single Tesla V100 GPU with a batch size of and a learning rate of until convergence, with Adam (Kingma and Welling, 2013) as the optimizer.
Datasets. Following Xu et al. (2021), we use the recent proposed GEOMDrugs and GEOMDrugs (Axelrod and GomezBombarelli, 2020) datasets for the conformation generation task. The Geometric Ensemble Of Molecules (GEOM) dataset contains millions of highquality stable conformations, which is suitable for the conformation generation task. The GEOMDrugs dataset consists of generally mediumsized organic compounds, containing an average of 44.2 atoms. We follow the setting from Xu et al. (2021) to randomly take 50000 conformationmolecule pairs as the training set, and another 9161 conformations (covering 100 molecular graphs) as the test split. By contrast, GEOMQM9 is a much smaller dataset limited to small molecules with 9 heavy atoms. Similarly, we randomly draw 50000 conformationmolecule pairs to constitute the training set, and another 17813 conformations covering 150 molecular graphs as the test set.
Evaluation metrics. In this task we hope the generated samples to be of both high quality and diversity. We follow previous work (Hawkins, 2017; Mansimov et al., 2019; Xu et al., 2021) to calculate the RMSD of the heavy atoms between generated samples and reference ones. Given the generated conformation and the reference , we take the same alignment function defined in equation 5 to obtain the aligned conformation
, and then calculate the evaluation metric by
, where is the number of heavy atoms. Built upon the RMSD metric, Xu et al. (2021) defined Coverage (COV) and Matching (MAT) scores to measure the diversity and quality respectively. COV counts the fraction of conformations in the reference set that are covered by at least one conformation in the generated set:(16)  
where and denote the generated and the reference conformations set respectively. Typically, a higher COV score indicates a better diversity performance to cover the complex true distribution.
While COV is able to detect modecollapse, there is no guarantee for the quality of generated samples. Thus, the MAT score is defined as a complement metric that concentrates on the quality (Xu et al., 2021):
(17) 
Generally, more realistic generated samples lead to a lower MAT score.
Results. We calculate the COV and MAT evaluations on both GEOMQM9 and GEOMDrugs datasets for all baselines, and summarize the results in Tab. 1. We visualize several representative examples in Fig. 3. Our ConfVAE outperforms all existing strong baselines with an obvious margin (top rows). By incorporating an endtoend training objective via bilevel optimization, we consistently achieved a better result on all four metrics. By contrast, current stateoftheart models GraphDG and CGCF suffer much worse performance due to the twostage generation process, where the extra error caused by the distance geometry cannot be taken into account during training. CVGAE enjoys the same training and testing objective, but still shows inferior performance since it fails to keep the vital translation and rotation invariant property.
Similar to previous work (Mansimov et al., 2019; Xu et al., 2021), we also further test all models by incorporating a rulebased empirical force field (Halgren, 1996b) and compare the performance with the classic RDKit toolkit. Specifically, we first generate the conformations with the generative models as initial structures, and then utilize the force field to further optimize the generated structures. The additional results are reported in Tab. 1 (bottom rows). As shown in the table, ConfVAE still achieves the best results among all generative models. More importantly, our method outperforms RDKit on out of evaluations and achieves competitive results on the other one, making our method practically useful for realworld applications.
Ablation Study. So far we have demonstrated the superior performance of the proposed method. However, because we adopt a slightly different architecture, it remains unclear where the effectiveness comes from. In this part, we carefully conduct an ablation study by removing the bilevel component defined in equation 7 during training, i.e., remove and learn the model with only and . We denote this variant of ConfVAE as ConfVAE. and summarize the additional results in Tab. 1.
As shown in the table, removing the bilevel component hurts performance. These results verify that only learning from distances will introduce an extra bias for the generated conformations, and our endtoend method for directly learning on the 3D structure helps to overcome this issue. Another observation is that as a combination of flowbased and VAEbased model, ConfVAE still achieves significantly better results than the Flowbased CGCF and VAEbased GraphDG, with exactly the same training and sampling process. This result indicates that incorporating both global () and local latent variables will contribute to the generated conformations, which can help to capture both the global and local geometric structure and atomic interactions.
Single  Pair  All  

Mean  Median  Mean  Median  Mean  Median 
RDKit 
3.4513  3.1602  3.8452  3.6287  4.0866  3.7519 
CVGAE 
4.1789  4.1762  4.9184  5.1856  5.9747  5.9928 
GraphDG 
0.7645  0.2346  0.8920  0.3287  1.1949  0.5485 
CGCF 
0.4490  0.1786  0.5509  0.2734  0.8703  0.4447 
ConfVAE 
0.2551  0.1352  0.2719  0.1742  0.2968  0.2132 
ConfVAE 
0.1809  0.1153  0.1946  0.1455  0.2113  0.2047 

) and joint distribution (
) of edges connecting C and O atoms. We report the Median and Mean of the MMD metric. Molecular graphs are taken from the test set of ISO17.Dataset. For the distances modeling task, we follow Simm and HernándezLobato (2020); Xu et al. (2021) and use the ISO17 dataset (Simm and HernándezLobato, 2020). ISO17 is constructed from the snapshots of ab initio molecular dynamics simulations, where the coordinates are not just equilibrium conformations but are samples that reflect the underlying density around equilibrium states. We follow previous work to split ISO17 into a training set with 167 molecules and a test set with the other 30 molecules.
Evaluation metrics. To obtain a distribution over distances from a distribution over conformations, we sample a set of conformations and then calculate the corresponding atomic lengths between C and O atoms (H atoms are usually ignored). Let denote the conditional distribution of distances on each edge given a molecular graph . To evaluate the distance distributions, we use the maximum mean discrepancy (MMD) (Gretton et al., 2012) to the groundtruth distributions. More specifically, we evaluate against the ground truth the MMD of marginal distributions of each individual edge’s distance , pairs of distances and the joint distance . For this benchmark, the size of the generated sample set is the same as the reference set.
Results. The results of MMDs are summarized in Tab. 2. The statistics show that the generated distance distribution of ConfVAE is significantly closer to the groundtruth distribution compared with the baseline models. These results demonstrate that our method can not only generate realistic conformations, but also model the density around equilibrium states. By contrast, though RDKit shows competitive performance for conformation generation, it seems to struggle with the distribution modeling benchmark. This is because RDKit is only designed to find the equilibrium states by using the empirical force field (Halgren, 1996a), and thus it lacks the capacity to capture the underlying distribution. The further ablation study between ConfVAE and ConfVAE also verifies the effectiveness of the bilevel optimization components.
In recent years, deep learning has shown significant progress for 3D structure generation. There have been works using neural networks to derive energy prediction models, which then are taken as faster alternatives to quantum mechanicsbased energy calculations (Schütt et al., 2017; Smith et al., 2017) for molecular dynamics simulation or molecule optimization (Wang et al., 2020). However, though accelerated by neural networks, these approaches are still timeconsuming due to the lengthy sampling process. Recently, (Gebauer et al., 2019) and (Hoffmann and Noé, 2019) provide methods to generate new 3D molecules with deep generative models, while (Simm et al., 2020b) and (Simm et al., 2020a)
employ reinforcement learning to search the vast geometric space. However, none of these methods is designed to generate the conformations from the molecular graph structure, making them orthogonal to our framework.
Many other works (Lemke and Peter, 2019; AlQuraishi, 2019; Ingraham et al., 2019; Noé et al., 2019) also learn to directly predict 3D structures, but focus on the protein folding problem. Specifically, Senior et al. (2020b); Jumper et al. (2020) significantly advance this field with an endtoend attentionbased model called AlphaFold. Unfortunately, proteins are aminoacid sequences with low chemical diversity, much larger scale and for which abundant structural exists while general molecules are highly structured graphs with a variety of cycles and much broader chemical composition, making it unclear whether these methods are transferable to the general conformation generation task.
In this paper, we propose ConfVAE, an endtoend framework for molecular conformation generation via bilevel programming. Our generative model can overcome significant errors of previous twostage models, thanks to the endtoend training based on bilevel programming, while keeping the property of rotational and translational invariance. Experimental results demonstrate the superior performance of our method over all stateoftheart baselines on several standard benchmarks. Future work includes combining our bilevel optimization framework with other kinds of generative models, and extending our method to other challenging structures such as proteins.
This project is supported by the Natural Sciences and Engineering Research Council (NSERC) Discovery Grant, the Canada CIFAR AI Chair Program, collaboration grants between Microsoft Research and Mila, Samsung Electronics Co., Ldt., Amazon Faculty Research Award, Tencent AI Lab RhinoBird Gift Fund and a NRC Collaborative R&D Project (AI4DCORE06). This project was also partially funded by IVADO Fundamental Research Project grant PRF20193583139727.
Regularization, Optimization, Kernels, and Support Vector Machines
, pp. 103. Cited by: §2.2.Schnet: a continuousfilter convolutional neural network for modeling quantum interactions
. In Advances in neural information processing systems, pp. 991–1001. Cited by: §3.2, §5.GraphAF: a flowbased autoregressive model for molecular graph generation
. arXiv preprint arXiv:2001.09382. Cited by: §1.PSI4 1.4: opensource software for highthroughput quantum chemistry
. The Journal of chemical physics 152 (18), pp. 184108. Cited by: §C.1.Inspired by classic molecular distance geometry (Crippen et al., 1988), in our framework we also generate the confirmations by taking the interatomic distances as the intermediate variables, which enables the invariant property to rotation and translation. In practice, the chemical bonds existing in the molecular graph are not sufficient to determine a conformation, and thus we follow existing works (Simm and HernándezLobato, 2020; Xu et al., 2021) to first expand the graphs by extending auxiliary edges. Specifically, the atoms that are or hops away are connected with virtual bonds
, labeled differently from the real bonds in the vanilla molecule. These extra bonds contribute to reducing the degrees of freedom in the 3D coordinates and characterizing the unique graph, with the edges between
hop neighbors helping to fix the angles between atoms, and those between hop neighbors fixing dihedral angles.This task is first proposed in Simm and HernándezLobato (2020), which estimates the expected molecular properties for molecular graphs by a set of generated conformations. This task can further demonstrate the effectiveness and quality of generated samples, and is important for many realworld applications such as drug and material design.
Dataset. Following Simm and HernándezLobato (2020), we also employ the ISO17 dataset. More details about the dataset can be found in Sec. 4.3.
Evaluation metrics. For comparison, we calculate the ensemble properties of each molecular graph by averaging over a set of generated conformations. Specifically, we calculate the total electronic energy , the energy of HOMO and the LUMO
, and the dipole moment
, using the quantum chemical calculation package Psi4 (Smith et al., 2020). In practice, we generate samples from different methods to estimate the property, and report median error of averaged properties to measure the accuracy of predicted properties. Similar to Simm and HernándezLobato (2020), we exclude CVGAE from this analysis due to its poor generated quality.Results. The results are shown in Tab. 3
. As shown in the table, ConfVAE outperforms all other generative models, and shows competitive results compared with RDKit. Close observation indicates that CGCF struggles with this task since the generated conformations suffer a extremely high variance. By contrast, our proposed method enjoys the best performance thanks to the high quality of generated samples.
RDKit 
42.7  0.08  0.15  0.29 

GraphDG 
58.0  0.10  0.09  0.33 
CGCF 
208.2  0.80  1.11  0.46 
ConfVAE 
40.2  0.10  0.08  0.29 

In this section, we give more results about Coverage score with different thresholds . The details about the COV score can be found in Sec. 4.2. Results are shown in Fig. 4. As shown in the figure, ConfVAE consistently achieves better performance than previous stateoftheart models, which demonstrates our proposed method is capable to generate more realistic samples.
In Fig. 5, we investigate the accuracy of generated conformations by visualizing the marginal distributions for all pairwise distances between C and O atoms of a molecular graph in the ISO17 test set. As shown in the figure, though primarily designed for learning the 3D structures via an endtoend framework, our method can still make a much better estimation of the distance distributions than the stateoftheart model for molecular geometry modeling. As a representative element of the pairwise property between atoms, the interatomic distances demonstrate the capacity of our model to capture the interatomic interactions.