1 Introduction
Graphbased representations of molecules, where nodes represent atoms and edges represent bonds, have been widely applied to a variety of molecular modeling tasks, ranging from property prediction (Gilmer et al., 2017; Hu et al., 2020) to molecule generation (Jin et al., 2018; You et al., 2018; Shi et al., 2020). However, in reality a more natural and intrinsic representation for molecules is their threedimensional structures, where atoms are represented by 3D coordinates. These structures are also widely known as molecular geometry or conformation.
Nevertheless, obtaining valid and stable conformations for a molecule remains a longstanding challenge. Existing approaches to molecular conformation generation mainly rely on molecular dynamics (MD) (De Vivo et al., 2016), where coordinates of atoms are sequentially updated based on the forces acting on each atom. The peratom forces come either from computationally intensive Density Functional Theory (DFT) (Parr, 1989) or from handdesigned force fields (Rappé et al., 1992; Halgren, 1996) which are often crude approximations of the actual molecular energies (Kanal et al., 2017). Recently, there are growing interests in developing machine learning methods for conformation generation (Mansimov et al., 2019; Simm and HernandezLobato, 2020; Xu et al., 2021). In general, these approaches (Simm and HernandezLobato, 2020; Xu et al., 2021) are divided into two stages, which first predict the molecular distance geometry, i.e., interatomic distances, and then convert the distances to 3D coordinates via postprocessing algorithms (Liberti et al., 2014). Such methods effectively model the rototranslation equivariance of molecular conformations and have achieved impressive performance. Nevertheless, such approaches generate conformations in a twostage fashion, where noise in generated distances may affect 3D coordinate reconstruction, often leading to less accurate or even erroneous structures. Therefore, we are seeking for an approach that is able to generate molecular conformations within a single stage.
Inspired by the traditional force field methods for molecular dynamics simulation (Frenkel and Smit, 1996), in this paper, we propose a novel and principled approach called ConfGF for molecular conformation generation. The basic idea is to directly learn the gradient fields of the log density w.r.t. the atomic coordinates, which can be viewed as pseduoforces acting on each atom. With the estimated gradient fields, conformations can be generated directly using Langevin dynamics (Welling and Teh, 2011), which amounts to moving atoms toward the highlikelihood regions guided by pseudoforces. The key challenge with this approach is that such gradient fields of atomic coordinates are rototranslation equivariant, i.e., the gradients rotate together with the molecule system and are invariant under translation.
To tackle this challenge, we observe that the interatomic distances are continuously differentiable w.r.t. atomic coordinates. Therefore, we propose to first estimate the gradient fields of the log density w.r.t. interatomic distances, and show that under a minor assumption, the gradient fields of the log density of atomic coordinates can be calculated from these w.r.t. distances in closed form using chain rule. We also demonstrate that such designed gradient fields satisfy rototranslation equivariance. In contrast to distancebased methods
(Simm and HernandezLobato, 2020; Xu et al., 2021), our approach generates conformations in an onestage fashion, i.e., the sampling process is done directly in the coordinate space, which avoids the errors induced by postprocessing and greatly enhances generation quality. In addition, our approach requires no surrogate losses and allows flexible network architectures, making it more capable to generate diverse and realistic conformations.We conduct extensive experiments on GEOM (Axelrod and GomezBombarelli, 2020) and ISO17 (Simm and HernandezLobato, 2020) benchmarks, and compare ConfGF against previous stateoftheart neuralmethods as well as the empirical method on multiple tasks ranging from conformation generation and distance modeling to property prediction. Numerical results show that ConfGF outperforms previous stateoftheart baselines by a clear margin.
2 Related Work
Molecular Conformation Generation The standard approach to molecular conformation generation is molecular dynamics (MD). MD starts with an initial conformation and applies small changes to it step by step based on forces acting on each atoms until it converges, which is considered as the gold standard for sampling equilibrium states of molecules. However, these methods are inefficient, especially when molecules are large as they involve computationally expensive quantum mechanical calculations (Parr, 1989; Shim and MacKerell Jr, 2011; Ballard et al., 2015). Another much more efficient but less accurate sort of approach is the empirical method, which fixes distances and angles between atoms in a molecule to idealized values according to a set of rules (Blaney and Dixon, 2007).
Recent advances in deep generative models open the door to datadriven approaches, which have shown great potential to strike a good balance between computational efficiency and accuracy. For example, Mansimov et al. (2019) introduce the Conditional Variational Graph AutoEncoder (CVGAE) which takes molecular graphs as input and directly generates 3D coordinates for each atom. This method does not manage to model the rototranslation equivariance of molecular conformations, resulting in great differences between generated structures and groundtruth structures. To address the issue, Simm and HernandezLobato (2020) and Xu et al. (2021) propose to model interatomic distances of molecular conformations using VAE and Continuous Flow respectively. These approaches preserve rototranslation equivariance of molecular conformations and consist of two stages — first generating distances between atoms and then feeding the distances to the distance geometry algorithms (Liberti et al., 2014) to reconstruct 3D coordinates. However, the distance geometry algorithms are vulnerable to noise in generated distances, which may lead to inaccurate or even erroneous structures. Another line of research (Gogineni et al., 2020)
leverages reinforcement learning for conformation generation by sequentially determining torsional angles, which relies on an additional classical force field for state transition and reward evaluation. Nevertheless, it is incapable of modeling other geometric properties such as bond angles and bond lengths, which distinguishes it from all other works.
Neural Force FieldsNeural networks have also been employed to estimate molecular potential energies and force fields (Schütt et al., 2017a; Zhang et al., 2018; Hu et al., 2021). The goal of these models is to predict energies and forces as accurate as possible, serving as an alternative to quantum chemical methods such as Density Function Theory (DFT) method. Training these models requires access to molecular dynamics trajectories along with groundtruth energies and forces from expensive quantum mechanical calculations. Different from these methods, our goal is to generate equilibrium conformations within a single stage. To this end, we define gradient fields^{1}^{1}1Such gradient fields, however, are not force fields as they are not energy conserving. analogous to force fields, which serve as pseudoforces acting on each atom that gradually move atoms toward the highdensity areas until they converges to an equilibrium. The only data we need to train the model is equilibrium conformations.
3 Preliminaries
3.1 Problem Definition
Notations In this paper, a molecular graph is represented as an undirected graph , where is the set of nodes representing atoms, and is the set of edges between atoms in the molecule. Each node is associated with a nuclear charge
and a 3D vector
, indicating its atomic type and atomic coordinate respectively. Each edge is associated with a bond type and a scalar denoting the Euclidean distance between the positions of and . All distances between connected nodes can be represented as a vector .As the bonded edges in a molecule are not sufficient to characterize a conformation (bonds are rotatable), we extend the original molecular graph by adding auxiliary edges, i.e., virtual bonds, between atoms that are 2 or 3 hops away from each other, which is a widelyused technique in previous work (Simm and HernandezLobato, 2020; Xu et al., 2021)
for reducing the degrees of freedom in 3D coordinates. The extra edges between second neighbors help fix angles between atoms, while those between third neighbors fix dihedral angles. Hereafter, we assume all molecular graphs are extended unless stated.
Problem Definition Given an extended molecular graph , our goal is to learn a generative model that generates molecular conformations based on the molecular graph.
3.2 ScoreBased Generative Modeling
The score function
of a continuously differentiable probability density
is defined as . Scorebased generative modeling (Song and Ermon, 2019; Song et al., 2020; Meng et al., 2020; Song and Ermon, 2020) is a class of generative models that approximate the score function of using neural networks and generate new samples with Langevin dynamics (Welling and Teh, 2011). Modeling the score function instead of the density function allows getting rid of the intractable partition function ofand avoiding extra computation on getting the higherorder gradients of an energybased model
(Song and Ermon, 2019).To address the difficulty of estimating score functions in the regions without training data, Song and Ermon (2019) propose to perturb the data using Gaussian noise of various intensities and jointly estimate the score functions, i.e., , for all noise levels. In specific, given a data distribution and a sequence of noise levels with a noise distribution , e.g., , the training objective for each noise level is as follows:
(1) 
After the noise conditional score networks are trained, samples are generated using annealed Langevin dynamics (Song and Ermon, 2019)
, where samples from each noise level serve as initializations for Langevin dynamics of the next noise level. For detailed design choices of noise levels and sampling hyperparameters, we refer readers to
Song and Ermon (2020).3.3 Equivariance in Molecular Geometry
Equivariance is ubiquitous in physical systems, e.g., 3D rototranslation equivariance of molecular conformations or point clouds (Thomas et al., 2018; Weiler et al., 2018; Fuchs et al., 2020; Miller et al., 2020; Simm et al., 2021). Endowing model with such inductive biases is critical for better generalization and successful learning (Köhler et al., 2020). Formally, a function being equivariant can be represented as follows:^{2}^{2}2Strictly speaking, does not have to be the same on both sides of the equation, as long as it is a representation of the same group. Note that invariance is a special case of equivariance.
(2) 
where is a transformation function, e.g., rotation. Intuitively, Eq. 2 says that applying the on the input has the same effect as applying it to the output.
In this paper, we aim to model the score function of , i.e., . As is rototranslation invariant with respect to conformations, the score function is therefore rototranslation equivariant with respect to conformations. To pursue a generalizable and elegant approach, we explicitly build such equivariance into the model design.
4 Proposed Method
4.1 Overview
Our approach extends the denoising score matching (Vincent, 2011; Song and Ermon, 2019) to molecular conformation generation by estimating the gradient fields of the log density of atomic coordinates, i.e., . Directly parameterizing such gradient fields using vanilla Graph Neural Networks (GNNs) (Duvenaud et al., 2015; Kearnes et al., 2016; Kipf and Welling, 2016; Gilmer et al., 2017; Schütt et al., 2017b, a) is problematic, as these GNNs only leverage the node, edge or distance information, resulting in node representations being rototranslation invariant. However, the gradient fields of atomic coordinates should be rototranslation equivariant (see Section 3.3), i.e., the vector fields should rotate together with the molecular system and be invariant under translation.
To tackle this issue, we assume that the log density of atomic coordinates given the molecular graph can be parameterized as up to a constant, where denotes a function that maps a set of atomic coordinates to a set of interatomic distances, and is a graph neural network that estimates the negative energy of a molecule based on the interatomic distances and the graph representation . Such design is a common practice in existing literature (Schütt et al., 2017a; Hu et al., 2020; Klicpera et al., 2020), which is favored as it ensures several physical invariances, e.g., rotation and translation invariance of energy prediction. We observe that the interatomic distances are continuously differentiable w.r.t. atomic coordinates . Therefore, the two gradient fields, and , are connected by chain rule, and the gradients can be backpropageted from to as follows:
(3)  
where and . Here denotes atom’s neighbors. The second equation of Eq. 3 holds because of the chain rule over the partial derivative of the composite function .
Inspired by the above observations, we take the interatomic pairwise distances as intermediate variables and propose to first estimate the gradient fields of interatomic distances corresponding to different noise levels, i.e., training a noise conditional score network to approximate . We then calculate the gradient fields of atomic coordinates, i.e., , from via Eq. 3, without relying on any extra parameters. Such designed score network enjoys the property of rototranslation equivariance. Formally, we have:
Proposition 1.
Under the assumption that we parameterize as a function of interatomic distances and molecular graph representation , the score network defined in Eq. 3 is rototranslation equivariant, i.e., the gradients rotate together with the molecule system and are invariant under translation.
Proof Sketch.
The score network of distances is rototranslation invariant as it only depends on interatomic distances, and the vector rotates together with the molecule system, which endows the score network of atomic coordinates with rototranslation equivariance. See Supplementary material for a formal proof. ∎
After training the score network, based on the estimated gradient fields of atomic coordinates corresponding to different noise levels, conformations are generated by annealed Langevin dynamics (Song and Ermon, 2019), combining information from all noise levels. The overview of our proposed method is illustrated in Figure 1 and Figure 2. Below we describe the design of the noise conditional score network in Section 4.2, and introduce the generation procedure in Section 4.3. Finally, we show how the designed gradient fields are connected with the classical laws of mechanics in Section 4.4.
4.2 Noise Conditional Score Networks for Distances
Let be a positive geometric progression with common ratio , i.e., . We define the perturbed conditional distance distribution to be . In this setting, we aim to learn a conditional score network to jointly estimate the scores of all perturbed distance distributions, i.e., . Note that , we therefore formulate it as an edge regression problem.
Given a molecular graph with a set of interatomic distances computed from its conformation , we first embed the node attributes and the edge attributes into low dimensional feature spaces using feedforward networks:
(4)  
We then adopt a variant of Graph Isomorphism Network (GINs) (Xu et al., 2019; Hu et al., 2020) to compute node embeddings based on graph structures. At each layer, node embeddings are updated by aggregating messages from neighboring nodes and edges:
(5) 
where denotes atom’s neighbors. After rounds of message passing, we compute the final edge embeddings by concatenating the corresponding node embeddings for each edge as follows:
(6) 
where denotes the vector concatenation operation, and denotes the final embeddings of edge . Following the suggestions of Song and Ermon (2020), we parameterize the noise conditional network with as follows:
(7) 
where is an unconditional score network, and the maps edge embeddings to a scalar for each . Note that is rototranslation invariant as we only use interatomic distances. Since , the training loss of is thus:
(8) 
where is the coefficient function weighting losses of different noise levels according to Song and Ermon (2019), and all expectations can be efficiently estimated using Monte Carlo estimation.
4.3 Conformation Generation via Annealed Langevin Dynamics
Given the molecular graph representation and the welltrained noise conditional score networks, molecular conformations are generated using annealed Langevin dynamics, i.e., gradually anneal down the noise level from large to small. We start annealed Langevin dynamics by first sampling an initial conformation
from some fixed prior distribution, e.g., uniform distribution or Gaussian distribution. Empirically, the choice of the prior distribution is arbitrary as long as the supports of the perturbed distance distributions cover the prior distribution, and we take the Gaussian distribution as prior distribution in our case. Then, we update the conformations by sampling from a series of trained noise conditional score networks
sequentially. For each noise level , starting from the final sample from the last noise level, we run Langevin dynamics for steps with a gradually decreasing step size for each noise level. In specific, at each sampling step , we first obtain the interatomic distances based on the current conformation , and calculate the score function via Eq. 3. The conformation is then updated using the gradient information from the score network. The whole sampling algorithm is presented in Algorithm 1.4.4 Discussion
From a physical perspective, reflects how the distances between connected atoms should change to increase the probability density of the conformation , which can be viewed as a network that predicts the repulsion or attraction forces between atoms. Moreover, as shown in Figure 1, can be viewed as resultant forces acting on each atom computed from Eq. 3, where is a unit vector indicating the direction of each component force and specifies the strength of each component force.
5 Experiments
Following existing works on conformation generation, we evaluate the proposed method using the following three standard tasks:

Conformation Generation tests models’ capacity to learn distribution of conformations by measuring the quality of generated conformations compared with reference ones (Section 5.1).

Distributions Over Distances evaluates the discrepancy with respect to distance geometry between the generated conformations and the groundtruth conformations (Section 5.2).

Property Prediction is first proposed in Simm and HernandezLobato (2020), which estimates chemical properties for molecular graphs based on a set of sampled conformations (Section 5.3). This can be useful in a variety of applications such as drug discovery, where we want to predict molecular properties from microscopic structures.
Below we describe setups that are shared across the tasks. Additional setups are provided in the taskspecific sections.
Data Following Xu et al. (2021), we use the GEOMQM9 and GEOMDrugs (Axelrod and GomezBombarelli, 2020) datasets for the conformation generation task. We randomly draw 40,000 molecules and select the 5 most likely conformations^{3}^{3}3sorted by energy for each molecule, and draw 200 molecules from the remaining data, resulting in a training set with 200,000 conformations and a test set with 22,408 and 14,324 conformations for GEOMQM9 and GEOMDrugs respectively. We evaluate the distance modeling task on the ISO17 dataset (Simm and HernandezLobato, 2020). We use the default split of Simm and HernandezLobato (2020), resulting in a training set with 167 molecules and the test set with 30 molecules. For property prediction task, we randomly draw 30 molecules from the GEOMQM9 dataset with the training molecules excluded.
Baselines We compare our approach with four stateoftheart methods for conformation generation. In specific, CVGAE (Mansimov et al., 2019)
is built upon conditional variational graph autoencoder, which directly generates atomic coordinates based on molecular graphs.
GraphDG (Simm and HernandezLobato, 2020) and CGCF (Xu et al., 2021) are distancebased methods, which rely on postprocessing algorithms to convert distances to final conformations. The difference of the two methods lies in how the neural networks are designed, i.e., VAE vs. Continuous Flow. RDKit (Riniker and Landrum, 2015) is a classical Euclidean Distance Geometrybased approach. Results of all baselines across experiments are obtained by running their official codes unless stated.Model Configuration
ConfGF is implemented in Pytorch
(Paszke et al., 2017). The GINs is implemented with layers and the hidden dimension is set as 256 across all modules. For training, we use an exponentiallydecayed learning rate starting from 0.001 with a decay rate of 0.95. The model is optimized with Adam (Kingma and Ba, 2014) optimizer on a single Tesla V100 GPU. All hyperparameters related to noise levels as well as annealed Langevin dynamics are selected according to Song and Ermon (2020). See Supplementary material for full details.5.1 Conformation Generation
Setup The first task is to generate conformations with high diversity and accuracy based on molecular graphs. For each molecular graph in the test set, we sample twice as many conformations as the reference ones from each model. To measure the discrepancy between two conformations, following existing work (Hawkins, 2017; Li et al., 2021), we adopt the Root of Mean Squared Deviations (RMSD) of the heavy atoms between aligned conformations:
(9) 
where is the number of heavy atoms and is an alignment function that aligns two conformations by rotation and translation. Following Xu et al. (2021), we report the Coverage (COV) and Matching (MAT) score to measure the diversity and accuracy respectively:
(10)  
where and are generated and reference conformation set respectively. is a given RMSD threshold. In general, a higher COV score indicates a better diversity, while a lower MAT score indicates a better accuracy.
Results We evaluate the mean and median COV and MAT scores on both GEOMQM9 and GEOMDrugs datasets for all baselines, where mean and median are taken over all molecules in the test set. Results in Table 1 show that our ConfGF achieves the stateoftheart performance on all four metrics. In particular, as a scorebased model that directly estimates the gradients of the log density of atomic coordinates, our approach consistently outperforms other neural models by a large margin. By contrast, the performance of GraphDG and CGCF is inferior to ours due to their twostage generation procedure, which incurs extra errors when converting distances to conformations. The CVGAE seems to struggle with both of the datasets as it fails to model the rototranslation equivariance of conformations.
When compared with the rulebased method, our approach outperforms the RDKit on 7 out of 8 metrics without relying on any postprocessing algorithms, e.g., force fields, as done in previous work (Mansimov et al., 2019; Xu et al., 2021), making it appealing in realworld applications. Figure 3 visualizes several conformations that are bestaligned with the reference ones generated by different methods, which demonstrates ConfGF’s strong ability to generate realistic and diverse conformations.
Dataset  Method  COV (%)  MAT (Å)  

Mean  Median  Mean  Median  
QM9  RDKit  83.26  90.78  0.3447  0.2935 
CVGAE  0.09  0.00  1.6713  1.6088  
GraphDG  73.33  84.21  0.4245  0.3973  
CGCF  78.05  82.48  0.4219  0.3900  
ConfGF  88.49  94.13  0.2673  0.2685  
Drugs  RDKit  60.91  65.70  1.2026  1.1252 
CVGAE  0.00  0.00  3.0702  2.9937  
GraphDG  8.27  0.00  1.9722  1.9845  
CGCF  53.96  57.06  1.2487  1.2247  
ConfGF  62.15  70.93  1.1629  1.1596  
Method  Single  Pair  All  

Mean  Median  Mean  Median  Mean  Median  
RDKit  3.4513  3.1602  3.8452  3.6287  4.0866  3.7519 
CVGAE  4.1789  4.1762  4.9184  5.1856  5.9747  5.9928 
GraphDG  0.7645  0.2346  0.8920  0.3287  1.1949  0.5485 
CGCF  0.4490  0.1786  0.5509  0.2734  0.8703  0.4447 
ConfGF  0.3684  0.2358  0.4582  0.3206  0.6091  0.4240 
5.2 Distributions Over Distances
Setup The second task is to match the generated distributions over distances with the groundtruth. As ConfGF is not designed for modeling molecular distance geometry, we treat interatomic distances calculated from generated conformations as generated distance distributions. For each molecular graph in the test set, we sample 1000 conformations across all methods. Following previous work (Simm and HernandezLobato, 2020; Xu et al., 2021), we report the discrepancy between generated distributions and the groundtruth distributions, by calculating maximum mean discrepancy (MMD) (Gretton et al., 2012) using a Gaussian kernel. In specific, for each graph G in the test set, we ignore edges connected with the H atoms and evaluate distributions of all distances (All), pairwise distances (Pair), and individual distances (Single).
Results As shown in Table 2, the CVGAE suffers the worst performance due to its poor capacity. Despite the superior performance of RDKit on the conformation generation task, its performance falls far short of other neural models. This is because RDKit incorporates an additional molecular force field (Rappé et al., 1992)
to find equilibrium states of conformations after the initial coordinates are generated. By contrast, although not tailored for molecular distance geometry, our ConfGF still achieves competitive results on all evaluation metrics. In particular, the ConfGF outperforms the stateoftheart method CGCF on 4 out of 6 metrics, and achieves comparable results on the other two metrics.
5.3 Property Prediction
Setup The third task is to predict the ensemble property (Axelrod and GomezBombarelli, 2020) of molecular graphs. Specifically, the ensemble property of a molecular graph is calculated by aggregating the property of different conformations. For each molecular graph in the test set, we first calculate the energy, HOMO and LUMO of all the groundtruth conformations using the quantum chemical calculation package Psi4 (Smith et al., 2020). Then, we calculate ensemble properties including average energy , lowest energy , average HOMOLUMO gap , minimum gap , and maximum gap based on the conformational properties for each molecular graph. Next, we use ConfGF and other baselines to generate 50 conformations for each molecular graph and calculate the abovementioned ensemble properties in the same way. We use mean absolute error (MAE) to measure the accuracy of ensemble property prediction. We exclude CVGAE from this analysis due to its poor quality following Simm and HernandezLobato (2020).
Results As shown in Table 3, ConfGF outperforms all the machine learningbased methods by a large margin and achieves better accuracy than RDKit in lowest energy and maximum HOMOLUMO gap
. Closer inspection on the generated samples shows that, although ConfGF has better generation quality in terms of diversity and accuracy, it sometimes generates a few outliers which have negative impact on the prediction accuracy. We further present the median of absolute errors in Table
4. When measured by median absolute error, ConfGF has the best accuracy in average energy and the accuracy of other properties is also improved, which reveals the impact of outliers.Method  

RDKit  0.9233  0.6585  0.3698  0.8021  0.2359 
GraphDG  9.1027  0.8882  1.7973  4.1743  0.4776 
CGCF  28.9661  2.8410  2.8356  10.6361  0.5954 
ConfGF  2.7886  0.1765  0.4688  2.1843  0.1433 
Method  

RDKit  0.8914  0.6629  0.2947  0.5196  0.1617 
ConfGF  0.5328  0.1145  0.3207  0.7365  0.1337 
Dataset  Method  COV (%)  MAT (Å)  

Mean  Median  Mean  Median  
QM9  CGCF  78.05  82.48  0.4219  0.3900 
ConfGFDist  81.94  85.80  0.3861  0.3571  
ConfGF  88.49  94.13  0.2673  0.2685  
Drugs  CGCF  53.96  57.06  1.2487  1.2247 
ConfGFDist  53.40  52.58  1.2493  1.2366  
ConfGF  62.15  70.93  1.1629  1.1596 
5.4 Ablation Study
So far, we have justified the effectiveness of the proposed ConfGF on multiple tasks. However, it remains unclear whether the proposed strategy, i.e., directly estimating the gradient fields of log density of atomic coordinates, contributes to the superior performance of ConfGF. To gain insights into the working behaviours of ConfGF, we conduct the ablation study in this section.
We adopt a variant of ConfGF, called ConfGFDist, which works in a way similar to distancebased methods (Simm and HernandezLobato, 2020; Xu et al., 2021). In specific, ConfGFDist bypasses chain rule and directly generates interatomic distances via annealed Langevin dynamics using estimated gradient fields of interatomic distances, i.e., (Section 4.2), after which we convert them to conformations by the same postprocessing algorithm as used in Xu et al. (2021). Following the setup of Section 5.1, we evaluate two approaches on the conformation generation task and summarize the results in Table 5. As presented in Table 5, we observe that ConfGFDist performs significantly worse than ConfGF on both datasets, and performs on par with the stateoftheart method CGCF. The results indicate that the twostage generation procedure of distancebased methods does harm the quality of conformations, and our strategy effectively addresses this issue, leading to much more accurate and diverse conformations.
6 Conclusion and Future Work
In this paper, we propose a novel principle for molecular conformation generation called ConfGF, where samples are generated using Langevin dynamics within one stage with physically inspired gradient fields of log density of atomic coordinates. We novelly develop an algorithm to effectively estimate these gradients and meanwhile preserve their rototranslation equivariance. Comprehensive experiments over multiple tasks verify that ConfGF outperforms previous stateoftheart baselines by a significant margin. Our future work will include extending our ConfGF approach to 3D molecular design tasks and manybody particle systems.
References
 GEOM: energyannotated molecular conformations for property prediction and molecular generation. arXiv preprint arXiv:2006.05531. Cited by: §1, §5.3, §5.
 Exploiting the potential energy landscape to sample free energy. Wiley Interdisciplinary Reviews: Computational Molecular Science 5 (3), pp. 273–289. Cited by: §2.
 Distance geometry in molecular modeling. ChemInform 25. Cited by: §2.
 Role of molecular dynamics and related methods in drug discovery. Journal of medicinal chemistry 59 (9), pp. 4035–4061. Cited by: §1.
 Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pp. 2224–2232. Cited by: §4.1.
 Understanding molecular simulation: from algorithms to applications. Cited by: §1.
 SE(3)transformers: 3d rototranslation equivariant attention networks. NeurIPS. Cited by: §3.3.
 Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 1263–1272. Cited by: §1, §4.1.
 TorsionNet: a reinforcement learning approach to sequential conformer search. ArXiv abs/2006.07078. Cited by: §2.
 A kernel twosample test. The Journal of Machine Learning Research 13 (1), pp. 723–773. Cited by: §5.2.
 Merck molecular force field. v. extension of mmff94 using experimental data, additional computational data, and empirical rules. Journal of Computational Chemistry 17 (56), pp. 616–641. Cited by: §1.
 Conformation generation: the state of the art. Journal of Chemical Information and Modeling 57 (8), pp. 1747–1756. Cited by: §5.1.
 Strategies for pretraining graph neural networks. In International Conference on Learning Representations, Cited by: §1, §4.1, §4.2.
 ForceNet: a graph neural network for largescale quantum chemistry simulation. Cited by: §2.
 Junction tree variational autoencoder for molecular graph generation. arXiv preprint arXiv:1802.04364. Cited by: §1.
 A sobering assessment of smallmolecule force field methods for low energy conformer predictions. International Journal of Quantum Chemistry 118. Cited by: §1.
 Molecular graph convolutions: moving beyond fingerprints. Journal of computeraided molecular design 30 (8), pp. 595–608. Cited by: §4.1.
 Adam: a method for stochastic optimization. In 3nd International Conference on Learning Representations, Cited by: §5.
 Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §4.1.
 Directional message passing for molecular graphs. In International Conference on Learning Representations, Cited by: §4.1.
 Equivariant flows: exact likelihood generative learning for symmetric densities. In Proceedings of the 37th International Conference on Machine Learning, Cited by: §3.3.
 Conformationguided molecular representation with hamiltonian neural networks. In International Conference on Learning Representations, Cited by: §5.1.
 Euclidean distance geometry and applications. SIAM review 56 (1), pp. 3–69. Cited by: §1, §2.
 Molecular geometry prediction using a deep generative graph neural network. arXiv preprint arXiv:1904.00314. Cited by: §1, §2, §5.1, §5.
 Autoregressive score matching. NeurIPS. Cited by: §3.2.
 Relevance of rotationally equivariant convolutions for predicting molecular properties. ArXiv abs/2008.08461. Cited by: §3.3.
 Densityfunctional theory of atoms and molecules. Cited by: §1, §2.
 Automatic differentiation in pytorch. In NIPSW, Cited by: §5.
 UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. Journal of the American chemical society 114 (25), pp. 10024–10035. Cited by: §1, §5.2.
 Better informed distance geometry: using what we know to improve conformation generation. Journal of Chemical Information and Modeling 55 (12), pp. 2562–2574. Cited by: §5.

SchNet: a continuousfilter convolutional neural network for modeling quantum interactions
. In Advances in Neural Information Processing Systems, pp. 991–1001. Cited by: §2, §4.1, §4.1. 
Quantumchemical insights from deep tensor neural networks
. Nature communications 8, pp. 13890. Cited by: §4.1. 
GraphAF: a flowbased autoregressive model for molecular graph generation
. arXiv preprint arXiv:2001.09382. Cited by: §1.  Computational ligandbased rational design: role of conformational sampling and force fields in model development. MedChemComm 2 (5), pp. 356–370. Cited by: §2.
 A generative model for molecular distance geometry. In Proceedings of the 37th International Conference on Machine Learning, H. D. III and A. Singh (Eds.), Vol. 119, pp. 8949–8958. Cited by: §1, §1, §1, §2, §3.1, 3rd item, §5.2, §5.3, §5.4, §5, §5.
 Symmetryaware actorcritic for 3d molecular design. In International Conference on Learning Representations, Cited by: §3.3.

Psi4 1.4: opensource software for highthroughput quantum chemistry.
. The Journal of chemical physics. Cited by: §5.3.  Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, Vol. 32, pp. 11918–11930. Cited by: §3.2, §3.2, §3.2, §4.1, §4.1, §4.2.
 Improved techniques for training scorebased generative models. NeurIPS. Cited by: §3.2, §3.2, §4.2, §5.

Sliced score matching: a scalable approach to density and score estimation.
In
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference
, R. P. Adams and V. Gogate (Eds.), Vol. 115, pp. 574–584. Cited by: §3.2.  Tensor field networks: rotation and translationequivariant neural networks for 3d point clouds. ArXiv. Cited by: §3.3.

A connection between score matching and denoising autoencoders
. Neural Comput.. Cited by: §4.1.  3D steerable cnns: learning rotationally equivariant features in volumetric data. In NeurIPS, Cited by: §3.3.
 Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, pp. 681–688. Cited by: §1, §3.2.
 How powerful are graph neural networks?. In International Conference on Learning Representations, Cited by: §4.2.
 Learning neural generative dynamics for molecular conformation generation. In International Conference on Learning Representations, Cited by: §1, §1, §2, §3.1, §5.1, §5.1, §5.2, §5.4, Table 1, Table 2, §5, §5.
 Graph convolutional policy network for goaldirected molecular graph generation. In Advances in neural information processing systems, pp. 6410–6421. Cited by: §1.
 Deep Potential Molecular Dynamics: A Scalable Model with the Accuracy of Quantum Mechanics. Physical Review Letters 120 (14), pp. 143001. Cited by: §2.
Appendix A Proof of Proposition 1
Let denote a molecular conformation. Let denote any 3D translation function where , and let denote any 3D rotation function whose rotation matrix representation is , i.e., . In our context, we say a score function of atomic coordinates is rototranslation equivariant if it satisfies:
(11) 
Intuitively, the above equation says that applying any translation to the input has no effect on the output, and applying any rotation to the input has the same effect as applying it to the output, i.e., the gradients rotate together with the molecule system and are invariant under translation.
Proof.
Denote . According to the definition of translation and rotation function, we have . According to Eq. 3, we have
(12)  
Here and because rotation and translation will not change interatomic distances. Together with Eq. 11 and Eq. 12, we conclude that the score network defined in Eq. 3 is rototranslation equivariant. ∎
Appendix B Additional Hyperparameters
We summarize additional hyperparameters of our ConfGF in Table 6, including the biggest noise level , the smallest noise level , the number of noise levels , the number of sampling steps per noise level , the smallest step size
, the batch size and the training epochs.
Dataset  Batch size  Training epochs  

GEOMQM9  10  0.01  50  100  2.4e6  128  200 
GEOMDrugs  10  0.01  50  100  2.4e6  128  200 
ISO17  3  0.1  30  100  2.0e4  128  100 
Appendix C Additional Experiments
We present more results of the Coverage (COV) score at different threshold for GEOMQM9 and GEOMDrugs datasets in Tables 7 and 8 respectively. Results in Tables 7 and 8 indicate that the proposed ConfGF consistently outperforms previous stateoftheart baselines, including GraphDG and CGCF. In addition, we also report the Mismatch (MIS) score defined as follows:
(13) 
where is the threshold. The MIS score measures the fraction of generated conformations that are not matched by any groundtruth conformation in the reference set given a threshold . A lower MIS score indicates less bad samples and better generation quality. As shown in Tables 7 and 8, the MIS scores of ConfGF are consistently lower than the competing baselines, which demonstrates that our ConfGF generates less invalid conformations compared with the existing models.
QM9  Mean COV (%)  Median COV (%)  Mean MIS (%)  Median MIS (%)  

(Å)  GraphDG  CGCF  ConfGF  GraphDG  CGCF  ConfGF  GraphDG  CGCF  ConfGF  GraphDG  CGCF  ConfGF 
0.10  1.03  0.14  17.59  0.00  0.00  10.29  99.70  99.93  92.30  100.00  100.00  96.23 
0.20  13.05  10.97  43.60  2.99  3.95  37.92  96.12  96.91  81.67  99.04  99.04  85.65 
0.30  32.26  31.02  61.94  18.81  22.94  59.66  87.15  89.21  73.06  94.44  94.44  77.27 
0.40  53.53  53.65  75.45  50.00  52.63  80.64  72.60  78.35  65.38  82.63  82.63  70.00 
0.50  73.33  78.05  88.49  84.21  82.48  94.13  56.09  63.51  53.56  64.66  64.66  56.59 
0.60  88.24  94.85  97.71  98.83  98.79  100.00  40.36  44.82  34.78  43.73  43.73  35.86 
0.70  95.93  99.05  99.52  100.00  100.00  100.00  27.93  29.64  21.00  23.38  23.38  15.64 
0.80  98.70  99.47  99.68  100.00  100.00  100.00  19.15  20.98  12.86  10.72  10.72  5.23 
0.90  99.33  99.50  99.77  100.00  100.00  100.00  12.76  16.74  8.98  3.65  3.65  1.53 
1.00  99.48  99.50  99.86  100.00  100.00  100.00  8.00  14.19  6.76  0.47  0.47  0.36 
1.10  99.51  99.51  99.91  100.00  100.00  100.00  4.99  12.26  5.57  0.00  0.00  0.00 
1.20  99.51  99.51  99.94  100.00  100.00  100.00  2.95  9.68  3.48  0.00  0.00  0.00 
1.30  99.51  99.51  99.96  100.00  100.00  100.00  1.65  7.48  1.91  0.00  0.00  0.00 
1.40  99.51  99.51  99.96  100.00  100.00  100.00  0.84  5.94  1.05  0.00  0.00  0.00 
1.50  99.52  99.51  99.97  100.00  100.00  100.00  0.41  4.94  0.70  0.00  0.00  0.00 
Drugs  Mean COV (%)  Median COV (%)  Mean MIS (%)  Median MIS (%)  

(Å)  GraphDG  CGCF  ConfGF  GraphDG  CGCF  ConfGF  GraphDG  CGCF  ConfGF  GraphDG  CGCF  ConfGF 
0.25  0.00  0.06  0.17  0.00  0.00  0.00  100.00  99.99  99.97  100.00  100.00  100.00 
0.50  0.26  0.80  1.15  0.00  0.00  0.00  99.95  99.80  99.52  100.00  100.00  100.00 
0.75  0.75  5.81  9.15  0.00  0.00  0.50  99.69  97.86  96.94  100.00  100.00  99.75 
1.00  2.39  24.67  30.60  0.00  11.81  18.89  99.14  90.82  89.63  100.00  96.50  95.58 
1.25  8.27  53.96  62.15  0.00  57.06  70.93  97.92  78.32  76.58  100.00  86.28  84.48 
1.50  19.96  79.37  86.62  4.00  92.46  98.79  94.40  63.80  60.06  99.14  66.39  63.81 
1.75  36.86  91.47  96.53  26.58  100.00  100.00  87.68  49.72  43.63  95.83  47.09  41.72 
2.00  55.79  96.73  98.62  55.26  100.00  100.00  76.99  37.53  29.80  87.35  30.90  22.44 
2.25  71.43  99.05  99.83  80.00  100.00  100.00  61.76  27.30  18.68  69.74  20.07  10.93 
2.50  83.53  99.47  100.00  95.45  100.00  100.00  44.32  18.97  11.09  42.96  12.33  3.31 
2.75  91.09  99.60  100.00  100.00  100.00  100.00  27.92  12.52  6.32  16.67  6.82  0.74 
3.00  95.00  99.96  100.00  100.00  100.00  100.00  15.97  7.67  3.36  2.46  3.32  0.00 
Appendix D More Generated Samples
We present more visualizations of generated conformations from our ConfGF in Figure 4, including models trained on GEOMQM9 and GEOMdrugs datasets.