1 Introduction
Recently there has been great interest in deep learning based on graph structures
[1, 2, 3] and point clouds [4, 5, 6]. A prominent application example is that of molecules, for which both inference based on the chemical compound, i.e., the molecular graph structure [7, 8, 9], and based on the geometry, i.e. the positions of atoms in 3D space [10, 11, 44, 13] are active areas of research.A particularly interesting branch of machine learning for molecules is the reverse problem of generating molecular structures, as it opens the door for designing molecules, e.g., obtain new materials
[14, 15, 16, 17], design or discover pharmacological molecules such as inhibitors or antibodies [18, 19], optimize biotechnological processes [20]. While this area of research has exploded in the past few years, the vast body of work has been done on the generation of new molecular compounds, i.e. the search for new molecular graphs, based on string encodings of that graph structure or other representations [21, 22]. On the other hand, exploring the geometry space of the individual chemical compound is equally important, as the molecular geometries and their probabilities determine all equilibrium properties, such as binding affinity, solubility etc. Sampling different geometric structures is, however, still largely left to molecular dynamics (MD) simulation that suffers from the rare event sampling problem, although recently machine learning has been used to speed up MD simulation
[23, 24, 25, 26, 27] or to perform sampling of the equilibrium distribution directly, without MD [28]. All of these techniques only sample one single chemical compound in geometry space.Here we explore—to our best knowledge for the first time in depth—the simultaneous generation of chemical compounds and geometries. The only related work we are aware of [29, 30]
demonstrates the generation of chemical compounds, placing atom by atom with an autoregressive model. It was shown that the model can recover compounds contained in the QM9 database of small molecules
[31, 32] when trained on a subset, but different configurations of the same molecule were not analyzed.While autoregressive models seem to work well in the case of small ( heavy atoms) molecules like the ones in the QM9 database, they can be tricky for larger structures as the probability to completely form complex structures, such as rings, decays with the number of involved steps.
To avoid this limitation, in our method we investigate in one shot models for point clouds which have no absolute orientation, i.e., the point cloud structure is considered to be the same independent of its rotation, translation, and of the permutation of points.
A natural representation, which is independent of rotation and translation is the Euclidean distance matrix, which is the matrix of all squared pairwise Euclidean distances. Furthermore, Euclidean distance matrices are useful determinants of valid molecular structures.
While a symmetric and nonnegative matrix with a zero diagonal can easily be parameterized by, e.g., a neural network, it is not immediately clear that this matrix indeed belongs to a set of points in Euclidean space and even then, that this space has the right dimension.
Here we develop a new method to parameterize and generate valid Euclidean distance matrices without placing coordinates directly, hereby taking away a lot of the ambiguity.
We furthermore propose a Wasserstein GAN architecture for learning distributions of pointclouds, e.g., molecular structures invariant to rotation, translation, and permutation. To this end the data distribution as well as the generator distribution are represented in terms of Euclidean distance matrices.
In summary, our main contributions are as follows:

We introduce a new method of training neural networks so that their output are Euclidean distance matrices with a predefined embedding dimension.

We propose a GAN architecture, which can learn distributions of Euclidean distance matrices, while treating the structures described by the distance matrices as set, i.e., invariant under their permutations.

We apply the proposed architecture to a set of isomers contained in the QM9 database and show that it can recover parts of the training set as well as generalize out of it.
2 Generating Euclidean distance matrices
We describe a way to generate Euclidean distance matrices without placing coordinates in Cartesian space. This means in particular that the parameterized output is invariant to translation and rotation.
A matrix is in by definition if there exist points such that for all . Such a matrix defines a structure in Euclidean space up to a combination of translation, rotation, and mirroring. The smallest integer for which a set of points in exists that reproduces the matrix is called the embedding dimension.
The general idea of the generation process is to produce a hollow (i.e., zeros on the diagonal) symmetric matrix and then weakly enforce through a term in the loss. It can be shown that
(1) 
where and [33, 34]. However trying to use this relationship directly in the context of deep learning by parameterizing the matrix poses a problem, as the set of EDMs forms a cone ([35]
) and not a vector space, which is the underlying assumption of the standard optimizers in common deep learning frameworks. One can either turn to optimization techniques on Riemannian manifolds (
[36]) or find a reparameterization in which the network’s output behaves like a vector space and that can be transformed into an EDM.Here, we leverage a connection between EDMs and positive semidefinite matrices [37, 38] in order to parameterize the problem in a space that behaves like a vector space. In particular, for by definition there exist points generating . The EDM has a corresponding Gram matrix by the relationship
(2) 
with and vice versa
(3) 
The matrix furthermore has a specific structure
(4) 
with
and is symmetric and positive semidefinite. It therefore admits an eigenvalue decomposition
which, assuming that with , reveals a composition of coordinates in the first rows where is the embedding dimension and the number of nonzero eigenvalues of associated to .Therefore, the embedding dimension of is given by the rank of or equivalently the number of positive eigenvalues. In principle it would be sufficient to parameterize a symmetric positive semidefinite matrix , as it then automatically is also a Gram matrix for some set of vectors. However, also the set of symmetric positive semidefinite matrices behaves like a cone, which precludes the use of standard optimization techniques.
Instead, we propose to parameterize an arbitrary symmetric matrix , as the set of symmetric matrices behaves like a vector space. This symmetric matrix can be transformed into a symmetric positive semidefinite matrix
(5) 
by any nonnegative function and then used to reconstruct via (3) and (4).
This approach is shown in Algorithm 1 for the context of neural networks and the particular choice of
, the softplus activation function. A symmetric matrix
is parameterized and transformed into a Gram matrix and a matrix . For there is a loss in place that drives it towards a specific rank and for we introduce a penalty on negative eigenvalues of (1).3 Euclidean distance matrix WGAN
We consider the class of generative adversarial networks ([39]) (GANs) and in particular Wasserstein GANs (WGANs), i.e., the ones that minimize the Wasserstein1 distance in contrast to the original formulation, where the former can be related to minimizing the JensenShannon divergence [40]. WGANs consist of two networks, one generator network , which transforms a prior distribution into a target distribution which should match the data’s underlying distribution as closely as possible. The other network is a socalled critic network , which assigns scalar values to individual samples from either distribution. The overall optimization objective reads
(6) 
where is the set of all Lipschitz continuous functions with a Lipschitz constant . We enforce the Lipschitz constant using a gradient penalty (WGANGP) introduced in [41]. One can observe that the maximum in Eq. (6) is attained when as large as possible values are assigned to samples from and as small as possible values to samples from . Meanwhile the minimum over the generator network tries to minimize that difference, which turns out to be exactly the Wasserstein1 distance according to the Kantorovich–Rubinstein duality ([42]). Since the Wasserstein1 distance is a proper metric of distributions, the generated distribution is exactly the data distribution if and only if the maximum in Eq. (6) is zero. The networks and are trained in an alternating fashion.
We choose for the critic network the messagepassing neural network SchNet [43, 44, 45] , which was originally designed to compute energies of molecules.
It operates on the pairwise distances , in a structure and the atom types . If there is no atom type information present, these can be just constant vectors that initially carry no information. These atom types are then embedded into a state vector and transformed with variable sharing across all atoms. Furthermore there are layers in which continuous convolutions are performed based on the relative distances between the atoms. In a physical sense this corresponds to learning energy contributions of, e.g., bonds and angles. Finally all states are mapped to a scalar and then pooled in a sum.
Due to the pooling and the use of only relative distances but never absolute coordinates, the output is invariant under translation, rotation, and permutation.
The generator network employs the construction of Section 2 to produce approximately EDMs with a fixed embedding dimension. Therefore this architecture is able to learn distributions of Euclidean distance matrices.
4 Application and results
The WGANGP introduced in Sec. 3 is applied to a subset of the QM9 dataset consisting of isomers with the chemical formula . To this end the distribution not only consists of the Euclidean distance matrices describing the molecular structure but also of the atom types. The generator produces an additional type vector in a multitask fashion which is checked against a constant type reference with a crossentropy loss. Furthermore the prior of a minimal distance between atoms is applied, i.e., we have a loss penalizing distances that are too small. Altogether we optimize the losses
(original WGAN loss)  (7)  
(gradient penalty of WGANGP)  (8)  
(drift term [46])  (9)  
(original WGAN loss)  (10)  
(cross entropy for types)  (11)  
(harmonic repulsion)  (12)  
(for EDMs, see Alg. 1)  (13) 
with being a SchNet critic, a reference type order, , , , and being the minimal pairwise distance achieved in the considered QM9 subset. Although in principle the cross entropy loss (11) is not required we found in our experiments that it qualitatively helps convergence. The generator network uses a combination of deconvolution and dense layers.
The function ensuring positive semidefiniteness (5) was chosen to be the softplus activation for the largest three eigenvalues and we explicitly set all other eigenvalues to zero. This leads to a Gram matrix with exactly the right rank and the constraint does not need to be weakly enforced anymore in the generator’s loss.
Prior to training the data was split into training and test data. After training on the training data set we evaluate the distribution of pairwise distances between different types of atoms, see Fig. 1. The overall shape of the distributions is picked up and only the distance between pairs of oxygen atoms are not completely correctly distributed.
After generation we perform a computationally cheap validity test by inferring bonds and bond orders with Open Babel [47]. On the inferred bonding graph we check for connectivity and valency, i.e., if for each atom the number of inferred bonds add up to its respective valency. This leaves us with roughly of the generated samples.
For the valid samples we infer canonical SMILES representations which are a fingerprint of the molecule’s topology in order to determine how many different molecule types can be produced using the trained generator. Fig. 2 shows the cumulative number of unique SMILES fingerprints when producing roughly
valid samples. It can be observed that the network is able to generalize out of the training set and is able to generate not only topologies which can be found in the test set but also entirely new ones. Nevertheless, the current performance with respect to the number of found topologies is not optimal and can likely be improved by a better hyperparameter selection.
While SMILES can be used to get an idea about the different bonding structures that were generated, it contains no information about different possible conformations in these bonding structures. To analyze the number of unique conformations that were generated, we compared each generated structure against all structures in the considered QM9 subset. Since the architecture is designed in such a way that it is permutation invariant, i.e., applying the critic onto a matrix and for some permutation yields the same result, one first has to find the best possible assignment of atoms.
To this end, we apply the Hungarian algorithm [48] onto a cost matrix for EDMs and type vectors with
(14) 
Intuitively this means that the cost of assigning atom in the first structure to atom in the second structure depends on whether their atom types match, in which case we compare the mean distance from the th atom to all other atoms in its structure to the same quantity for the th atom in the second structure. If the atom types do not match, we assign a very high number so that this particular mapping is not considered. After we have found an assignment between the atoms, we superpose the structures using functionality from the software package MDTraj ([49]) and evaluate the maximal atomic distance between all heavy atoms (i.e., carbons and oxygens) after alignment. The cutoff at which we consider a structure to be a distinct conformation is a maximal atomic distance between heavy atoms of more than , i.e., more than half a carbon–carbon bond length.
The results of this analysis are depicted in Fig. 3. One can observe that although the reported number of unique molecular structures via SMILES is rather low, under our similarity criterion a lot of different valid conformations are discovered; in particular also some new conformations of structures that were already contained in the QM9 database.
Finally, we also check for the approximate total energies of the generated molecules compared to the database’s. To this end, we use the semiempirical method provided by the software package MOPAC [50] to relax all structures in the considered QM9 subset as well as all valid generated valid structures, see Fig. 4. It can be observed that after relaxation all energies are contained within the same range of roughly to .
In Figure 5 we show examples of generated molecules in the top row (A–D) and the closest respective matches in the QM9 database in the bottom row (A’–D’). The closeness of a match was determined by the maximal atomic distance after assignment of atom identities and superposition. Configurations A and B could be matched with a maximal atomic distance of less than .
5 Conclusion and discussion
We have developed a way to parameterize the output of a neural network so that it produces valid Euclidean distance matrices with a predefined embedding dimension without placing coordinates in Cartesian space directly. This enables us to be naturally invariant under rotation and translation of the described object. Given a network that is able to produce valid Euclidean distance matrices we introduce a Wasserstein GAN that can learn to oneshot generate distributions of point clouds irrespective of their orientation, translation, or permutation. The permutation invariance is achieved by incorporating the message passing neural network SchNet as critic.
We applied the introduced WGAN to the isomer subset of the QM9 molecules database and could generalize out of the training set as well as achieve a good representation of the distribution of pairwise distances in this set of molecules.
In future work we want to improve on the performance of the network on the isomer subset as well as extend it to molecules of varying size and chemical composition. We expect the ideas of this work to be applicable for, e.g., generating, transforming, coarse graining, or upsampling point clouds.
Acknowledgments
We acknowledge funding from European Commission (ERC CoG 772230 “ScaleCell”), Deutsche Forschungsgemeinschaft (CRC1114/A04), the MATH+ Berlin Mathematics research center (AA1x6, EF1x2). We are grateful for inspiring discussions with Tim Hempel, Brooke Husic, Simon Olsson, Jan Hermann, and Mohsen Sadeghi (FU Berlin).
References
 [1] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pages 3844–3852, 2016.
 [2] Thomas N Kipf and Max Welling. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
 [3] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pages 1263–1272. JMLR. org, 2017.

[4]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas.
Pointnet: Deep learning on point sets for 3d classification and
segmentation.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 652–660, 2017.  [5] Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Pointcnn: Convolution on xtransformed points. In Advances in Neural Information Processing Systems, pages 820–830, 2018.
 [6] Guandao Yang, Xun Huang, Zekun Hao, MingYu Liu, Serge Belongie, and Bharath Hariharan. Pointflow: 3d point cloud generation with continuous normalizing flows. arXiv preprint arXiv:1906.12320, 2019.
 [7] Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, and Patrick Riley. Molecular graph convolutions: moving beyond fingerprints. Journal of computeraided molecular design, 30(8):595–608, 2016.
 [8] Jon Paul Janet and Heather J Kulik. Predicting electronic structure properties of transition metal complexes with neural networks. Chemical science, 8(7):5137–5152, 2017.
 [9] R. Winter, F. Montanari, F. Noé, and D.A. Clevert. Learning continuous and datadriven molecular descriptors by translating equivalent chemical representations. Chem. Sci., 10:1692–1701, 2019.
 [10] J. Behler and M. Parrinello. Generalized neuralnetwork representation of highdimensional potentialenergy surfaces. Phys. Rev. Lett., 98:146401, 2007.
 [11] M. Rupp, A. Tkatchenko, K.R. Müller, and O. A. Von Lilienfeld. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett., 108:058301, 2012.
 [12] Kristof Schütt, PieterJan Kindermans, Huziel Enoc Sauceda, Stefan Chmiela, Alexandre Tkatchenko, and KlausRobert Müller. Schnet: A continuousfilter convolutional neural network for modeling quantum interactions. Adv. Neural Inf. Process. Syst., pages 991–1001, 2017.
 [13] J. S. Smith, O. Isayev, and A. E. Roitberg. Ani1: an extensible neural network potential with dft accuracy at force field computational cost. Chem. Sci., 8:3192–3203, 2017.
 [14] Benjamin SanchezLengeling and Alán AspuruGuzik. Inverse molecular design using machine learning: Generative models for matter engineering. Science, 361(6400):360–365, 2018.
 [15] Brian C Barnes, Daniel C Elton, Zois Boukouvalas, DeCarlos E Taylor, William D Mattson, Mark D Fuge, and Peter W Chung. Machine learning of energetic material properties. arXiv preprint arXiv:1807.06156, 2018.
 [16] Daniel C Elton, Zois Boukouvalas, Mark S Butrico, Mark D Fuge, and Peter W Chung. Applying machine learning techniques to predict the properties of energetic materials. Scientific reports, 8(1):9059, 2018.

[17]
Haichen Li, Christopher R Collins, Thomas G Ribelli, Krzysztof Matyjaszewski,
Geoffrey J Gordon, Tomasz Kowalewski, and David J Yaron.
Tuning the molecular weight distribution from atom transfer radical polymerization using deep reinforcement learning.
Molecular Systems Design & Engineering, 3(3):496–508, 2018.  [18] Mariya Popova, Olexandr Isayev, and Alexander Tropsha. Deep reinforcement learning for de novo drug design. Science advances, 4(7):eaap7885, 2018.

[19]
Edward J Griffen, Alexander G Dossetter, Andrew G Leach, and Shane Montague.
Can we accelerate medicinal chemistry by augmenting the chemist with big data and artificial intelligence?
Drug discovery today, 2018.  [20] Gabriel Lima Guimaraes, Benjamin SanchezLengeling, Carlos Outeiral, Pedro Luis Cunha Farias, and Alán AspuruGuzik. Objectivereinforced generative adversarial networks (organ) for sequence generation models. arXiv preprint arXiv:1705.10843, 2017.
 [21] R. GómezBombarelli, J. N. Wei, D. Duvenaud, J. M. HernándezLobato, B. SánchezLengeling, D. Sheberla, J. AguileraIparraguirre, T. D Hirzel, R. P Adams, and A. AspuruGuzik. Automatic chemical design using a datadriven continuous representation of molecules. ACS Cent. Sci., 4:268–276, 2018.
 [22] R. Winter, F. Montanari, A. Steffen, H. Briem, F. Noé, and D. A. Clevert. Efficient multiobjective molecular optimization in a continuous latent space. https://doi.org/10.26434/chemrxiv.7971101.v1, 2019.

[23]
João Marcelo Lamim Ribeiro, Pablo Bravo, Yihang Wang, and Pratyush Tiwary.
Reweighted autoencoded variational bayes for enhanced sampling (rave).
J. Chem. Phys., 149:072301, 2018.  [24] L. Bonati, Y.Y.Zhang, and M. Parrinello. Neural networksbased variationally enhanced sampling. Proc. Natl. Acad. Sci. USA, DOI: 10.1073/pnas.1907975116, 2019.
 [25] J. Zhang, Y. I. Yang, and F. Noé. Targeted adversarial learning optimized sampling. ChemRxiv. DOI: 10.26434/chemrxiv.7932371, 2019.
 [26] N. Plattner, S. Doerr, G. De Fabritiis, and F. Noé. Proteinprotein association and binding mechanism resolved in atomic detail. Nat. Chem., 9:1005–1011, 2017.
 [27] S. Doerr and G. De Fabritiis. Onthefly learning and sampling of ligand binding by highthroughput molecular simulations. J. Chem. Theory Comput., 10:2064–2069, 2014.
 [28] Frank Noé, Simon Olsson, Jonas Köhler, and Hao Wu. Boltzmann generators  sampling equilibrium states of manybody systems with deep learning. arXiv:1812.01729, 2019.
 [29] Niklas WA Gebauer, Michael Gastegger, and Kristof T Schütt. Generating equilibrium molecules with deep neural networks. arXiv preprint arXiv:1810.11347, 2018.
 [30] Niklas WA Gebauer, Michael Gastegger, and Kristof T Schütt. Symmetryadapted generation of 3d point sets for the targeted discovery of molecules. arXiv preprint arXiv:1906.00957, 2019.
 [31] Lars Ruddigkeit, Ruud Van Deursen, Lorenz C Blum, and JeanLouis Reymond. Enumeration of 166 billion organic small molecules in the chemical universe database gdb17. Journal of chemical information and modeling, 52(11):2864–2875, 2012.
 [32] Raghunathan Ramakrishnan, Pavlo O Dral, Matthias Rupp, and O Anatole von Lilienfeld. Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data, 1, 2014.
 [33] Isaac J Schoenberg. Remarks to maurice frechet’s article“sur la definition axiomatique d’une classe d’espace distances vectoriellement applicable sur l’espace de hilbert. Annals of Mathematics, pages 724–732, 1935.
 [34] John Clifford Gower. Euclidean distance geometry. Math. Sci, 7(1):1–14, 1982.
 [35] Jon Dattorro. Convex optimization & Euclidean distance geometry. Lulu.com, 2010.
 [36] Hongyi Zhang, Sashank J Reddi, and Suvrit Sra. Riemannian svrg: Fast stochastic optimization on riemannian manifolds. In Advances in Neural Information Processing Systems, pages 4592–4600, 2016.
 [37] Abdo Y Alfakih, Amir Khandani, and Henry Wolkowicz. Solving euclidean distance matrix completion problems via semidefinite programming. Computational optimization and applications, 12(13):13–30, 1999.
 [38] Nathan Krislock and Henry Wolkowicz. Euclidean distance matrices and applications. In Handbook on semidefinite, conic and polynomial optimization, pages 879–914. Springer, 2012.
 [39] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
 [40] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214–223, 2017.
 [41] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5767–5777, 2017.
 [42] Cédric Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.

[43]
Kristof T Schütt, Farhad Arbabzadah, Stefan Chmiela, Klaus R Müller,
and Alexandre Tkatchenko.
Quantumchemical insights from deep tensor neural networks.
Nature communications, 8:13890, 2017.  [44] Kristof Schütt, PieterJan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, and KlausRobert Müller. Schnet: A continuousfilter convolutional neural network for modeling quantum interactions. In Advances in Neural Information Processing Systems, pages 991–1001, 2017.
 [45] Kristof T Schütt, Huziel E Sauceda, PJ Kindermans, Alexandre Tkatchenko, and KR Müller. Schnet–a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24):241722, 2018.
 [46] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
 [47] Noel M O’Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch, and Geoffrey R Hutchison. Open babel: An open chemical toolbox. Journal of cheminformatics, 3(1):33, 2011.
 [48] H. W. Kuhn. The hungarian method for the assignment problem. Nav. Res. Logist. Quart., 2:83–97, 1955.
 [49] R. T. McGibbon, K. A. Beauchamp, M. P. Harrigan, C. Klein, J. M. Swails, C. X. Hernández, C. R. Schwantes, L. P. Wang, T. J. Lane, and V. S. Pande. Mdtraj: A modern open library for the analysis of molecular dynamics trajectories. Biophys J., 109:1528–1532, 2015.
 [50] James JP Stewart. Mopac: a semiempirical molecular orbital program. Journal of computeraided molecular design, 4(1):1–103, 1990.
Comments
There are no comments yet.