Compressing physical properties of atomic species for improving predictive chemistry

10/31/2018 ∙ by John E. Herr, et al. ∙ University of Notre Dame 0

The answers to many unsolved problems lie in the intractable chemical space of molecules and materials. Machine learning techniques are rapidly growing in popularity as a way to compress and explore chemical space efficiently. One of the most important aspects of machine learning techniques is representation through the feature vector, which should contain the most important descriptors necessary to make accurate predictions, not least of which is the atomic species in the molecule or material. In this work we introduce a compressed representation of physical properties for atomic species we call the elemental modes. The elemental modes provide an excellent representation by capturing many of the nuances of the periodic table and the similarity of atomic species. We apply the elemental modes to several different tasks for machine learning algorithms and show that they enable us to make improvements to these tasks even beyond simply achieving higher accuracy predictions.



There are no comments yet.


page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Machine learning is being applied at an unprecedented rate to efficiently explore the vastness of chemical space. Researchers have used a wide array of machine learning techniques to make reaction outcome predictions,Schwaller et al. (2018); Nam and Kim (2016); Liu et al. (2017) reaction yield predictions,Nielsen et al. (2018); Ahneman et al. (2018) to predict bond energies,Yao et al. (2017) partial charges,Nebgen et al. (2018); Sifain et al. (2018) formation energies,Faber et al. (2016); Zhou et al. (2018) among other properties of electronic structure.Montavon et al. (2013); Kitchin (2018); Schütt et al. (2017a); Rupp et al. (2012); Hansen et al. (2015a) There has also been great interest in generating machine learned model chemistries which promise to greatly reduce the cost of running quantum-accuracy simulations by compressing the corpus of completed results.Yao et al. (2018); Smith, Isayev, and Roitberg (2017); Behler and Parrinello (2007); Behler (2011a); Behler et al. (2008); Behler, Lorenz, and Reuter (2007); Yao, Herr, and Parkhill (2017); Yao and Parkhill (2016); Gastegger et al. (2016); Gastegger, Behler, and Marquetand (2017); Brockherde et al. (2017); Li et al. (2016); Snyder et al. (2013, 2012); Behler (2011b); Handley and Popelier (2010); Zhang et al. (2018); Han et al. (2018); Chmiela et al. (2017); Schütt et al. (2017b, a)
There are many challenges that come with the design of machine learning algorithms for predictive chemistry. They must respect physical invariances,Thomas et al. (2018) the predictions must fall within the scope of the underlying data,Herr et al. (2018); Smith et al. (2018) and predictions must be smooth with respect to geometrical changes in the molecule. Depending on the target problem, one particularly challenging aspect is the representation of atomic species and how predictions change with a change in atomic number. Typically this can be dealt with by parameterizing distinct machine learning models for each atomic species,Behler, Lorenz, and Reuter (2007); Behler (2011a); Yao et al. (2018); Smith, Isayev, and Roitberg (2017); Yao et al. (2017) or by representation in the feature vector.Schütt et al. (2018, 2017c); Gastegger et al. (2018); De et al. (2016); Bartók et al. (2017); Willatt, Musil, and Ceriotti (2018); Faber et al. (2018); Zhou et al. (2018)

There are several issues with the former solution, including increased computational cost and memory requirements, but more importantly distinct parameterizations don’t allow for similar atomic species to share information in the learning process. Using a single machine learning model for all atomic species would encourage faster training with less data. This can be thought of as transfer learning, since making predictions for different atomic species is often a highly similar task with some variation in the desired outcome.

Pan, Yang et al. (2010)
Furthermore, often the feature vector of an atom must also represent the species of its neighboring atoms, such as is the case for the symmetry functions introduced by Behler and Parinello.Behler and Parrinello (2007); Behler, Lorenz, and Reuter (2007) When limited to a single system, atomic species can often be implied since the data will always be consistent in this way, but in the more general case when we wish to make predictions for unique molecules or materials then this change must be represented through the feature vectors for the machine learning model to account for this change. In the case of the symmetry functions this has been done by splitting the features of neighboring atoms into element channels for the radial descriptor, and element-pair channels for the angular descriptor.Smith, Isayev, and Roitberg (2017); Yao et al. (2018) This leads to a quadratically growing size of the feature vector which rapidly becomes unmanageable if treating more than a few unique atomic species.
Recently there has been work focused on representation of atomic species in several machine learning applications. Naturally the first thought is to use atomic number as a multiplying factor to the features.Gastegger et al. (2018) Others have sought to apply similar logic by using the group and period a species belongs to,Faber et al. (2018, 2017) by using one or two physical properties,De et al. (2016); Bartók et al. (2017); Willatt, Musil, and Ceriotti (2018) or by random initialization and allowing the multiplying factor to be learned during the training process.Schütt et al. (2017c, 2018) More recently, Zhou et al. proposed an atom vector which is derived from a materials dataset of which surrounding environments each atomic species appears in.Zhou et al. (2018)
In this work we introduce what we refer to as the elemental modes. The elemental modes improve upon the previous works discussed above by providing an atomic species vector which maps similar species nearby based on using many physical properties fundamental to each atom in a compressed representation. We avoid using a dataset of atomic environments as this will bias the representation towards the given dataset. We show that the elemental modes are highly generalizable by using them in distinctly different learning tasks with success.

Figure 1: Scheme of fundamental properties being encoded to elemental modes. 10 different properties were reduced to 6 elemental modes. Blue is negative, white is zero and red is positive.

Ii Elemental modes

We compiled a dataset of fundamental physical properties for each element with atomic number up to 83 excluding f-block elements. The fundamental properties include atomic number, atomic mass, number of s, p, and d valence electrons, atomic radius, electronegativity, ionization energy, electron affinity, and polarizability. Using this data, we then trained an ordinary auto-encoderHinton and Salakhutdinov (2006) with each element representing one case, and the ten physical properties listed above concatenated into the feature vector for each element. Figure 1 shows heat map plots of the physical properties and the resulting elemental modes features as derived from the auto-encoder.

The encoder and decoder branches each contained two hidden layers with 64 neurons in each layer. The activation function used was the hyperbolic tangent function. After training the auto-encoder to convergence, the latent space vector for each element was taken as the elemental modes. We tested different sizes of the latent space vectors, and a dimension of size four gave the best trade-off between reproduction of the physical properties from the decoder and compression of the original data.

To examine trends learned by the auto-encoder, we performed principal component analysis on the resulting elemental modes and plot the first and second principal components in Figure

2. The trend is strikingly similar to the periodic table. Alkali metals and alkaline earth metals tend towards large positive (negative) values in the first (second) principal component. Noble gases and halogens are the opposite, tending towards large positive values in the second principal component, and large negative or smaller positive values in the first principal component. Atomic species in later periods also tend towards lower values in both the first and second principal components as opposed to those in the same group appearing in earlier periods. Having satisfied that the elemental modes do indeed represent sensible trends to encode the relationship between atomic species, we next turn our attention to their utility.

Figure 2: The first (PC1) and second (PC2) principal components of the elemental modes color coded by grouping into alkali metals (red), alkaline earth metals (orange), transition metals (yellow), post-transition metals (green), metalloids (teal), nonmetals (blue), halogens (pink), and noble gases (purple).

Iii Formation energy prediction

Recently several works have demonstrated machine learning algorithms which are capable of predicting the formation energies of elpasolites, crystals with the chemical formula ABCD.Faber et al. (2016); Zhou et al. (2018) The dataset used in these works consists of 10,000 formation energies from density functional theory (DFT) calculations of crystal structures in the elpasolite configuration containing only main group elements.Faber et al. (2016) Restricting the data to materials of the elpasolite structure allows for the learning problem to be greatly simplified, as the periodic nature and common crystalline structure of the materials allows for a lot of information to be inherently assumed in the learning problem. As such, the most important information that varies between each structure is the atomic species corresponding to A, B, C, and D in the ABCD formula.

We applied the elemental modes as features for a neural network to predict formation energies on this dataset. Our feature vector is the concatenation of the elemental modes for the atomic species A, B, C, and D for a particular elpasolite structure. The feature vectors were then fed into a standard feed-forward neural network with two hidden layers of 32 neurons and a softplus activation function.

The dataset of structures and formation energies was split into an 80:10:10 ratio for training, testing, and validation data. The network was found to perform best with two hidden layers and 32 neurons in each layer. After training had converged, the mean absolute error (MAE) on the independent test set was 0.086 eV/atom, outperforming previous works on this task. Figure 3 shows the resulting distribution of error in the predicted formation energies for the independent test set.

Figure 3: Formation energy error prediction for the independent test set of elpasolites with the formula ABCD. Inset in the top left is the crystal structure for elpasolite (AlNaKF)

Iv Neural network model chemistry

Next we turn our attention to the more difficult task of predicting accurate potential energies, atomic forces, and partial charges for a wide range of small molecules. The difficulty is a result of how robust the machine learning algorithm must be in this case. Unlike periodic crystal structures, there is little information which can be inherently assumed by the algorithm. In this task we must be able to represent not only the atomic species present, but also the full geometry of the molecule must be represented in the feature vector. There are several different proposed methods for making this representation,Schütt et al. (2017c, 2018); Bartók, Kondor, and Csányi (2013); Bartók et al. (2017); Faber et al. (2016); Huang and Von Lilienfeld (2016); Hansen et al. (2015b); Collins et al. (2017) but we will restrict this work to the symmetry functions with high-dimensional neural networks potentials (HD-NNPs) as first introduced by Behler and Parinello.Behler and Parrinello (2007); Behler, Lorenz, and Reuter (2007); Behler (2011a, b)
Analogously to the work above on elpasolites where the geometry could be inherently assumed, this initial works using the symmetry functions and HD-NNPs were restricted instead to cases where the atomic species could be inherently assumed. Following researchers were able to generalize the symmetry functions to include atomic species by splitting the symmetry functions into channels for each element in the radial descriptor, and channels for each pair of elements in the angular descriptor.Smith, Isayev, and Roitberg (2017); Yao et al. (2018) For a dataset containing molecules restricted to C, H, N, and O atoms, this resulted in four radial channels, and ten angular channels. Going much further in terms of atomic species allowed in the dataset will then quickly grow the size of the symmetry functions beyond what is reasonable given the memory constraints of modern graphics processing units (GPUs) needed for training these machine learning algorithms.
To remedy this problem, we applied the elemental modes as an embedding factor into the channels of the symmetry functions to keep the size constant while allowing for any number of atomic species to be included in the dataset. More precisely, for a neighboring atom in the environment of atom , the radial descriptor is given by


and for two neighboring atoms and in the environment of atom the angular descriptor is given by


where is the distance between atoms and , is the angle between the atoms , , and , is the elemental modes for the atomic species of atom , , , , and are parameters of the symmetry functions, represents the outer product, and is a smooth radial cutoff function given by


The total radial environment for atom is then given by summing over all atoms in the environment of atom , and the total angular environment is given by summing over all pairs of atoms and in the environment of atom . The feature vector of atom is then the concatenation of the radial and angular environments. By splitting the radial and angular environments into channels corresponding to the elemental modes, then the machine learning algorithm must infer the atomic species of atoms based on the relative scaling of values across the channels of the symmetry functions.
Furthermore, we also wish to eliminate the need for separate neural networks to be trained for predicting the energy contribution of different atomic species. HD-NNPs make the assumption that the energy of a molecule can be broken down by summing the embedded atomic energy predicted for each atom in a molecule. Partitioning the total energy into a sum of embedded components has a history of success in the computational chemical sciences.Richard and Herbert (2012); Mayhall and Raghavachari (2012); Medders et al. (2015); Ruff, Harmon, and Pappu (2015); John and Csanyi (2017) A choice of molecular partitioning is a compromise between two limits. If large fragments are chosen, nuanced intra-fragment physical forces are coarse-grained out of the learning problem; however, the number of such unique fragments will be large. Choosing a small fragment such as an atom requires a neural network to learn challenging physical interactions, but readily generalizes to new unseen fragments.
An analogy can be drawn to word-levelBrown et al. (1992); Mikolov et al. (2013); Turian, Ratinov, and Bengio (2010) and character-levelKim et al. (2016); Sutskever, Martens, and Hinton (2011) language modeling. The meaning of a character is strongly affected by its environment, whereas a word has significant meaning on its own. We took particular note of a work from Sutskever et al. on character-level language modeling.Sutskever, Martens, and Hinton (2011) To account for the differences between interactions of various characters they used multiplicative interactions which allows the network weights to respond to the identity of a character embedded inside of a larger sentence. We applied similar logic by allowing our neural network to respond to the atomic species of an embedded atom. Given the feature vector of an atom and the elemental modes, , corresponding to the atomic species of that atom, then we introduce to learnable matrices, (,). These matrices are used to interact an atoms elemental modes with its feature vector by


Allowing the feature vector of an atom to interact with it’s own elemental modes then provides a way for the neural network to respond to the atomic species of an atom so that the predicted embedded atomic energy can change accordingly.
We trained two neural networks, one which predicts embedded atomic energies and one which predicts partial charges on each atom. In a similar vein to our previous work,Yao et al. (2018) the predicted partial charges were used to calculate Coulomb energies for a long-range and smoothly cutoff Coulomb kernel. As we have shown previously this helps to account for long-range interactions that the neural network cannot learn due to the short-range nature of the symmetry functions.
Our dataset consists of about 4.3 million geometries from 65,000 unique molecules. The atomic species represented in our dataset include all nonmetals, which allows us to make predictions on a drastically more diverse set of molecules as opposed to previous works where predictions were limited to molecules containing only C, H, N, and O atoms. Initial molecular geometries were downloaded from the chemspider database and geometries were optimized to convergence. Then a subset of all 65,000 molecules was used for running metadynamics simulations to efficiently sample a more diverse set of geometries according to our previous work.Herr et al. (2018) Potential energies, atomic forces, and mulliken charges were calculated using Q-ChemShao et al. (2015) with the B97X-D exchange-correlation functionalChai and Head-Gordon (2008) and 6-311G** basis set.

The neural networks worked well with three hidden layers and 512 neurons in each hidden layer. The loss function used to train our network is the mean square error and included terms from the energy, atomic force, and partial charge errors. We used a softplus activation function which was chosen because it has fewer problems with vanishing gradients, similar to the ReLU or ELU activation functions, but is also continuously differentiable at least up to order two. This property is important when using atomic forces in the loss function. Since the atomic forces predicted by the neural network are the negative gradient of the predicted potential energy with respect to atomic position, then to calculate the gradients to update the network parameters will require taking second order gradients in the backpropagation algorithm.

The root mean square error (RMSE) on the independent test set of the energy is 0.0976 kcal/mol per atom and the RMSE of the atomic forces is 3.71 kcal/mol/Å. We note that the largest errors tend to occur in molecules which contain atoms that occur less frequently throughout the dataset. The nature of known small molecules drastically overrepresents carbon and hydrogen, and to a lesser extent nitrogen and oxygen. While our network allows to reduce this problem to an extent since a single network is used for predicting embedded atomic energies, imbalances in the dataset can still lead to larger errors for atomic species which are underrepresented. Another source of error could be the difficulty in learning the and

matrices for interacting the feature vectors with the corresponding elemental modes. As Sutskever and coworkers pointed out, learning a tensor decomposition like this is a difficult problem with first-order optimization methods alone.

Sutskever, Martens, and Hinton (2011) Future improvements will likely focus in this direction.
As stated above, using a single network to make the embedded atomic energy predictions should allow for the network to use information learned from one atomic species to improve the predictions of another species. To asess how well the network was able to learn accross species, we trained two more networks on subsets of the data. First we took the subset of geometries which contained at least one nonmetal other than C, H, N, and O. Then we took this same subset, but additionally removed any molecules which contained Cl atoms. We trained networks on both of these sets of data, and compared the distributions of embedded atomic energy predictions from different species, including Cl.
Figure 4 shows the atomic energy distributions from both networks for N, O and Cl for the set of Cl-containing molecules which neither network has been trained on. We note that since the embedded atomic energy was not a learning target of our neural network, then it should not be expected that that these distributions look identical, but we do notice a strong similarity for both N and O. Looking at the distribution for the Cl atomic energies, we also notice a strong similarity between the two distributions. This shows that, even though one of the networks was never trained using any molecules which contain Cl atoms, it has already reasonably learned to make sensible predictions for the embedded atomic energies of Cl atoms.

Figure 4: Probability distribution of embedded atomic energies predicted by two networks trained only on non-metals. Top, middle, and bottom panels are for nitrogen, oxygen, and chlorine predictions respectively. Distributions are included from the networks trained with (blue) and without (green) chlorine data included.

V Alchemical transformations

Another advantage of our neural network model chemistry allows us to interpolate between the elemental modes of two atomic species and make "alchemical" energy predictions. Alchemical free energy calculations are an important tool for pharmaceutical drug discovery used by many researchers today.

Hauser et al. (2018); Mey, Jiménez, and Michel (2018); Harger et al. (2017); Williams-Noonan, Yuriev, and Chalmers (2017); Matricon et al. (2017) These calculations are often difficult to set up, taking great care in switching on and off of force field parameters. Further, many molecular dynamics software packages do not include support for alchemical free energies. We have implemented the ability for our network to make potential energy predictions for the intermediate states of an alchemical transition. We use a linear switching term to interpolate the feature vectors of two atoms so make the feature vectors of the intermediate states by


where is the switching parameter, and are the feature vectors for atom which is present before the alchemical transition, and atom which is present after the alchemical transition. The elemental modes of atoms and are similarly interpolated before interacting with the feature vectors throught the and matrices in equation 4. The interpolated feature vector is then fed into the network as normal to make predictions for the intermediate states.
We used our network to run a molecular dynamics simulation with an alchemical transformation of an ethanol dimer into a water hexamer by slowly transitioning the carbon atoms of each ethanol into oxygen atoms. Figure 5

shows the atomization energies predicted by our network over this simulation. The total simulation time was 10 picoseconds (ps) with a 0.5 femtosecond (fs) time step. The first 2 ps occurred as purely the ethanol dimer allowing for some equilibration time. At 2 ps, the alchemical transition begun and was spread out over 3 ps. At 5 ps, the transition had completed leaving hydronium and hydroxyl ions along with water molecules which was followed by proton transfer at about 6 and 6.5 ps. We take particular note that during the transition time of 3 ps, no pathological behavior occurs, so we believe our network should be able to provide suitable alchemical free energies. We also note that because our implementation is in Google’s TensorFlow package, then taking derivatives with respect to the switching parameter

is done by a single line of code and greatly simplifies a lot of the work needed for thermodynamic integration calculations.

Figure 5: Plot: Atomization energy of an MD trajectory with an alchemical transformation of an ethanol dimer into a water hexamer over a 10 ps simulation time. Insets: a) The initial ethanol dimer geometry. b) The ethanol dimer geometry just before the alchemical transformation begins at 2 ps. c) The water hexamer just after the alchemical transformation completes at 5 ps. d) The first and e) second proton transfers from hydronium to hydroxide at about 6 and 6.5 ps. Proton transfer is denoted with a dashed line.

Vi Discussion and Conclusions

Our work has presented a machine learned representation of atomic species, which is suitable to make improvements in several areas of computational and predictive chemistry. The atomic species representation is learned by compressing many physical properties into a smaller dimensional space using an auto-encoder. The compressed representation, which we have called the elemental modes, was shown to retain many of the periodic trends.
We used the elemental modes to then show that they can perform well in tasks where we wish to rapidly screen materials such as elpasolites to help predict which structures may be stable for experimental researchers to pursue in further research. This same task can similarly be performed on other datasets of materials which all follow the same structural pattern and only differ in the atomic species. Prediction of other materials properties, such as band gaps should also be a trivial extension. Allowing more generalizations could be readily achievable as well. For example, allowing mixed species which occupy the D lattice site in elpasolites could be achieved by extending the feature vector of a material to allow for both species to exist.
We also have shown the elemental modes to be useful in parameterizing neural network model chemistries. Previous works have been limited to predictions of molecules with only four different atomic species. Extending this was not straight forward since it would quickly cause issues with both computational efficiency and memory limitations of GPUs. The elemental modes allowed us to eliminate these problems and to improve the efficiency of training these neural networks. Further, we also showed the potential to simplify the process of making alchemical transformations. All of the code used in this work will be made available at Additionally the trained neural network for the elpasolite formation energy predictions will be available as well.
Machine learning is becoming a well established method for making chemical predictions and reducing research costs. The vast size of chemical space is well beyond what can be explored by experiment and current computational methods alone. Machine learning algorithms show the promise to increase the rate at which we can find new candidate molecules and materials for many areas of important research by orders of magnitude. Representation of the candidate in the feature vector is certainly one of the most challenging and important parts of further research in this area. We believe that condensed representations such as the elemental modes will play an important role in improving much of the research to come in this blooming field.

The authors gratefully acknowledge Notre Dame’s College of Science for startup funding, Oak Ridge national laboratory for a grant of supercomputer resources and Nvidia corporation.


  • Schwaller et al. (2018) P. Schwaller, T. Gaudin, D. Lanyi, C. Bekas,  and T. Laino, Chemical science 9, 6091 (2018).
  • Nam and Kim (2016) J. Nam and J. Kim, arXiv preprint arXiv:1612.09529  (2016).
  • Liu et al. (2017) B. Liu, B. Ramsundar, P. Kawthekar, J. Shi, J. Gomes, Q. Luu Nguyen, S. Ho, J. Sloane, P. Wender,  and V. Pande, ACS central science 3, 1103 (2017).
  • Nielsen et al. (2018) M. K. Nielsen, D. T. Ahneman, O. Riera,  and A. G. Doyle, Journal of the American Chemical Society 140, 5004 (2018).
  • Ahneman et al. (2018) D. T. Ahneman, J. G. Estrada, S. Lin, S. D. Dreher,  and A. G. Doyle, Science 360, 186 (2018).
  • Yao et al. (2017) K. Yao, J. E. Herr, S. N. Brown,  and J. Parkhill, J. Phys. Chem. Lett.  (2017).
  • Nebgen et al. (2018) B. Nebgen, N. Lubbers, J. S. Smith, A. E. Sifain, A. Lokhov, O. Isayev, A. E. Roitberg, K. Barros,  and S. Tretiak, Journal of chemical theory and computation 14, 4687 (2018).
  • Sifain et al. (2018) A. E. Sifain, N. Lubbers, B. T. Nebgen, J. S. Smith, A. Y. Lokhov, O. Isayev, A. E. Roitberg, K. Barros,  and S. Tretiak, chemrXiv  (2018).
  • Faber et al. (2016) F. A. Faber, A. Lindmaa, O. A. Von Lilienfeld,  and R. Armiento, Physical review letters 117, 135502 (2016).
  • Zhou et al. (2018) Q. Zhou, P. Tang, S. Liu, J. Pan, Q. Yan,  and S.-C. Zhang, Proceedings of the National Academy of Sciences , 201801181 (2018).
  • Montavon et al. (2013) G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Hansen, A. Tkatchenko, K.-R. Müller,  and O. A. von Lilienfeld, New J. Phys. 15, 095003 (2013).
  • Kitchin (2018) J. R. Kitchin, Nature Catalysis 1, 230 (2018).
  • Schütt et al. (2017a) K. T. Schütt, F. Arbabzadah, S. Chmiela, K. R. Müller,  and A. Tkatchenko, Nat. Commun. 8, 13890 (2017a).
  • Rupp et al. (2012) M. Rupp, A. Tkatchenko, K.-R. Müller,  and O. A. von Lilienfeld, Phys. Rev. Lett. 108, 058301 (2012).
  • Hansen et al. (2015a) K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O. A. Von Lilienfeld, K.-R. Müller,  and A. Tkatchenko, The journal of physical chemistry letters 6, 2326 (2015a).
  • Yao et al. (2018) K. Yao, J. E. Herr, D. W. Toth, R. Mckintyre,  and J. Parkhill, Chemical Science  (2018).
  • Smith, Isayev, and Roitberg (2017) J. S. Smith, O. Isayev,  and A. E. Roitberg, Chem. Sci.  (2017).
  • Behler and Parrinello (2007) J. Behler and M. Parrinello, Phys. Rev. Lett. 98, 146401 (2007).
  • Behler (2011a) J. Behler, Phys. Chem. Chem. Phys. 13, 17930 (2011a).
  • Behler et al. (2008) J. Behler, R. Martoňák, D. Donadio,  and M. Parrinello, Physical review letters 100, 185501 (2008).
  • Behler, Lorenz, and Reuter (2007) J. Behler, S. Lorenz,  and K. Reuter, The Journal of chemical physics 127, 07B603 (2007).
  • Yao, Herr, and Parkhill (2017) K. Yao, J. E. Herr,  and J. Parkhill, J. Chem. Phys. 146, 014106 (2017).
  • Yao and Parkhill (2016) K. Yao and J. Parkhill, J. Chem. Theory Comput. 12, 1139 (2016).
  • Gastegger et al. (2016) M. Gastegger, C. Kauffmann, J. Behler,  and P. Marquetand, J. Chem. Phys. 144, 194110 (2016),
  • Gastegger, Behler, and Marquetand (2017) M. Gastegger, J. Behler,  and P. Marquetand, Chem. Sci. 8, 6924 (2017).
  • Brockherde et al. (2017) F. Brockherde, L. Vogt, L. Li, M. E. Tuckerman, K. Burke,  and K.-R. Müller, Nature communications 8, 872 (2017).
  • Li et al. (2016) L. Li, J. C. Snyder, I. M. Pelaschier, J. Huang, U.-N. Niranjan, P. Duncan, M. Rupp, K.-R. Müller,  and K. Burke, Int. J. Quantum Chem. 116, 819 (2016).
  • Snyder et al. (2013) J. C. Snyder, M. Rupp, K. Hansen, L. Blooston, K.-R. Müller,  and K. Burke, J. Chem. Phys. 139, 224104 (2013).
  • Snyder et al. (2012) J. C. Snyder, M. Rupp, K. Hansen, K.-R. Müller,  and K. Burke, Phys. Rev. Lett. 108, 253002 (2012).
  • Behler (2011b) J. Behler, J. Chem. Phys. 134, 074106 (2011b).
  • Handley and Popelier (2010) C. M. Handley and P. L. Popelier, J. Phys. Chem. A 114, 3371 (2010).
  • Zhang et al. (2018) L. Zhang, J. Han, H. Wang, R. Car,  and E. Weinan, Physical review letters 120, 143001 (2018).
  • Han et al. (2018) J. Han, L. Zhang, R. Car,  and W. E, Commun. Comput. Phys. 23, 629 (2018).
  • Chmiela et al. (2017) S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Schütt,  and K.-R. Müller, Sci. Adv. 3, e1603015 (2017).
  • Schütt et al. (2017b) K. T. Schütt, F. Arbabzadah, S. Chmiela, K. R. Müller,  and A. Tkatchenko, Nat. Commun. 8, 13890 EP (2017b).
  • Thomas et al. (2018) N. Thomas, T. Smidt, S. Kearnes, L. Yang, L. Li, K. Kohlhoff,  and P. Riley, arXiv preprint arXiv:1802.08219  (2018).
  • Herr et al. (2018) J. E. Herr, K. Yao, R. McIntyre, D. W. Toth,  and J. Parkhill, The Journal of Chemical Physics 148, 241710 (2018).
  • Smith et al. (2018) J. S. Smith, B. Nebgen, N. Lubbers, O. Isayev,  and A. E. Roitberg, The Journal of Chemical Physics 148, 241733 (2018).
  • Schütt et al. (2018) K. T. Schütt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko,  and K.-R. Müller, The Journal of Chemical Physics 148, 241722 (2018).
  • Schütt et al. (2017c) K. Schütt, P.-J. Kindermans, H. E. S. Felix, S. Chmiela, A. Tkatchenko,  and K.-R. Müller, in Advances in Neural Information Processing Systems (2017) pp. 991–1001.
  • Gastegger et al. (2018) M. Gastegger, L. Schwiedrzik, M. Bittermann, F. Berzsenyi,  and P. Marquetand, The Journal of Chemical Physics 148, 241709 (2018).
  • De et al. (2016) S. De, A. P. Bartók, G. Csányi,  and M. Ceriotti, Physical Chemistry Chemical Physics 18, 13754 (2016).
  • Bartók et al. (2017) A. P. Bartók, S. De, C. Poelking, N. Bernstein, J. R. Kermode, G. Csányi,  and M. Ceriotti, Science advances 3, e1701816 (2017).
  • Willatt, Musil, and Ceriotti (2018) M. J. Willatt, F. Musil,  and M. Ceriotti, arXiv preprint arXiv:1807.00236  (2018).
  • Faber et al. (2018) F. A. Faber, A. S. Christensen, B. Huang,  and O. A. von Lilienfeld, The Journal of Chemical Physics 148, 241717 (2018).
  • Pan, Yang et al. (2010) S. J. Pan, Q. Yang, et al., IEEE Transactions on knowledge and data engineering 22, 1345 (2010).
  • Faber et al. (2017) F. A. Faber, L. Hutchison, B. Huang, J. Gilmer, S. S. Schoenholz, G. E. Dahl, O. Vinyals, S. Kearnes, P. F. Riley,  and O. A. von Lilienfeld, J. Chem. Theory Comput.  (2017).
  • Hinton and Salakhutdinov (2006) G. E. Hinton and R. R. Salakhutdinov, science 313, 504 (2006).
  • Bartók, Kondor, and Csányi (2013) A. P. Bartók, R. Kondor,  and G. Csányi, Physical Review B 87, 184115 (2013).
  • Huang and Von Lilienfeld (2016) B. Huang and O. A. Von Lilienfeld, J. Chem. Phys. 145, 161102 (2016).
  • Hansen et al. (2015b) K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O. A. Von Lilienfeld, K.-R. Müller,  and A. Tkatchenko, J. Phys. Chem. Lett. 6, 2326 (2015b).
  • Collins et al. (2017) C. R. Collins, G. J. Gordon, O. A. von Lilienfeld,  and D. J. Yaron, arXiv preprint arXiv:1701.06649  (2017).
  • Richard and Herbert (2012) R. M. Richard and J. M. Herbert, The Journal of Chemical Physics 137, 064113 (2012).
  • Mayhall and Raghavachari (2012) N. J. Mayhall and K. Raghavachari, Journal of chemical theory and computation 8, 2669 (2012).
  • Medders et al. (2015) G. R. Medders, A. W. Götz, M. A. Morales, P. Bajaj,  and F. Paesani, J. Chem. Phys. 143, 104102 (2015).
  • Ruff, Harmon, and Pappu (2015) K. M. Ruff, T. S. Harmon,  and R. V. Pappu, The Journal of chemical physics 143, 12B607_1 (2015).
  • John and Csanyi (2017) S. T. John and G. Csanyi, The Journal of Physical Chemistry B 121, 10934 (2017).
  • Brown et al. (1992) P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra,  and J. C. Lai, Computational linguistics 18, 467 (1992).
  • Mikolov et al. (2013) T. Mikolov, K. Chen, G. Corrado,  and J. Dean, arXiv preprint arXiv:1301.3781  (2013).
  • Turian, Ratinov, and Bengio (2010) J. Turian, L. Ratinov,  and Y. Bengio, in Proceedings of the 48th annual meeting of the association for computational linguistics (Association for Computational Linguistics, 2010) pp. 384–394.
  • Kim et al. (2016) Y. Kim, Y. Jernite, D. Sontag,  and A. M. Rush, in 

    Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

     (AAAI Press, 2016) pp. 2741–2749.
  • Sutskever, Martens, and Hinton (2011) I. Sutskever, J. Martens,  and G. E. Hinton, in Proceedings of the 28th International Conference on Machine Learning (ICML-11) (2011) pp. 1017–1024.
  • Shao et al. (2015) Y. Shao, Z. Gan, E. Epifanovsky, A. T. Gilbert, M. Wormit, J. Kussmann, A. W. Lange, A. Behn, J. Deng, X. Feng, et al., Mol. Phys. 113, 184 (2015).
  • Chai and Head-Gordon (2008) J.-D. Chai and M. Head-Gordon, Phys. Chem. Chem. Phys. 10, 6615 (2008).
  • Hauser et al. (2018) K. Hauser, C. Negron, S. K. Albanese, S. Ray, T. Steinbrecher, R. Abel, J. D. Chodera,  and L. Wang, Communications Biology 1, 70 (2018).
  • Mey, Jiménez, and Michel (2018) A. S. Mey, J. J. Jiménez,  and J. Michel, Journal of computer-aided molecular design 32, 199 (2018).
  • Harger et al. (2017) M. Harger, D. Li, Z. Wang, K. Dalby, L. Lagardère, J.-P. Piquemal, J. Ponder,  and P. Ren, Journal of computational chemistry 38, 2047 (2017).
  • Williams-Noonan, Yuriev, and Chalmers (2017) B. J. Williams-Noonan, E. Yuriev,  and D. K. Chalmers, Journal of medicinal chemistry 61, 638 (2017).
  • Matricon et al. (2017) P. Matricon, A. Ranganathan, E. Warnick, Z.-G. Gao, A. Rudling, C. Lambertucci, G. Marucci, A. Ezzati, M. Jaiteh, D. Dal Ben, et al., Scientific reports 7, 6398 (2017).