Coarse-Grained (CG) molecular modeling has been used extensively to simulate complex molecular processes at a lower computational cost than all-atom simulations Agostino et al. (2017); Huang et al. (2010). By reducing the full atomistic model into a reduced number of pseudo atoms, CG methods focus on the slow collective atomic motions and average out fast local motions. Current approaches generally focus on parametrizing coarse-grained potentials from atomistic simulations Noid et al. (2008) (bottom-up) or experimental statistics (top-down) Marrink et al. (2007); Periole et al. (2009). Although it has been shown that the design of the all-atom to CG mapping plays an important role in recovering structural distribution functions with fidelity, Rudzinski and Noid (2014) there are no systematic methods proposed in the literature to address the learning of coarse-grained representations from atomistic trajectories. Collective atomistic motions are key in phenomena such as glass formation Pazmiño Betancourt et al. (2018) or protein dynamics He et al. (2011), but usually the criteria for hand-picking CG mappings are based on a priori considerations and chemical intuition.
One of the central themes in learning theory is finding optimal hidden representations from complex data setsLawrence (2005)
. Such hidden representations can be used to capture the highest possible fidelity over complex statistical distributions with the fewest variables. We propose that finding the coarse-grained variables can be formulated as a problem of learning the latent variables in the atomistic data distribution. Recent works in unsupervised learning have shown great potential to uncover the hidden structure of complex dataTolstikhin et al. (2018); Kingma and Ba (2015); Goodfellow et al. (2014). As a powerful unsupervised learning technique, variational auto-encoders (VAEs) compress data through an information bottleneck Tishby and Zaslavsky (2015) that continuously maps an otherwise complex data set into a low dimensional and easy-to-sample space. VAEs have been applied successfully to a variety of tasks, from image de-noising Vincent et al. (2010) to learning compressed representations for text Bowman et al. (2015), celebrity faces, Tolstikhin et al. (2018) arbitrary grammars Kusner et al. (2017) or molecular structures Gómez-Bombarelli et al. (2018); Jin et al. (2018). Recent works have applied VAE-like structures to learn collective molecular motions by reconstructing time-lagged configurations Wehmeyer and Noé (2018)2018).
Here we apply an auto-encoder architecture with constraints to: 1) compress atomistic molecular dynamics (MD) data into a rigorously coarse-grained representation in 3D space; 2) train a reconstruction loss to help capture salient collective features from the all-atom data; and 3) adopt a supervised instantaneous force-matching approach to variationally find the optimal coarse-grained potential that matches the instantaneous mean force acting on the all-atom training data.
We propose a semi-supervised learning approach based on auto-encoders to create an all-atom to CG mapping function, and a potential in CG coordinates that can later be used to carry out new simulations for larger systems and lower computational cost. The latent structure is thus shaped by training both an unsupervised reconstruction task and a supervised force-matching task. To learn corresponding force fields that can be transferred, the model carries out a variational coarse-grained force matching that incorporates the learning of the coarse-grained mapping in the force-matching functional.
ii.1 Coarse-Graining Auto-encoding
Noid et al. have studied the general requirements for a physically rigorous mapping function Noid et al. (2008). In order to address those requirements, Autograin is trained to optimize the reconstruction of atomistic configurations by propagating them through a low-dimension bottleneck in Cartesian coordinates. Unlike most instances of VAEs, the dimensions of the CG latent space have physical meaning. Since the CG space needs to represent the system in position and momentum space, latent dimensions need to correspond to real-space Cartesian coordinates and maintain the structural information of molecules.
We make our encoding function a linear mapping in Cartesian space where n is the number of atoms and N is the desired number of coarse-grained particles.
Each atom contributes to at most one coarse-grained variable
where is the matrix element in E, is the index for atoms, is the index for coarse-grained atoms. Requirement (2) defines the coarse-grained variables to be the statistical averages of the Cartesian coordinates of contributing atoms. In order to maintain the consistency in the momentum space after the coarse-grained mapping, the coarse-grained masses are rigorously redefined as Darve (2006); Noid et al. (2008). And this definition of mass is a corollary of requirement (3).
To specifically satisfy requirement (3), we design the encoder based on Gumbel-Softmax Jang et al. (2017) with a tunable fictitious “temperature” that can be adjusted during the training to learn discrete variables. The detailed algorithm is described as in Algorithm 1.
The softmax function is thus used to ensure that the encoding function represents the atomic contributions for each of the coarse-grained pseudo atoms. We apply the Gumbel-Softmax function with a fictitious inverse “temperature” on a separate weight matrix which is used as a mask on the encoding weight matrix. By gradually increasing
toward a sufficiently high inverse “temperature”, the mask will asymptotically choose only one coarse-grained variable for each of the atom which satisfies requirement (3). This is equivalent to an attention mechanism, which is widely used in deep learningVaswani et al. (2017).
The decoding of coarse grained pseudo atoms has received little attention in the literature, so we opt for a simple decoding approach. Thus, we use a matrix of dimension by that maps coarse-grained variables back to the original space. Hence, both the encoding and decoding mappings are deterministic. Further developments in more powerful decoding functions will are discussed below. The unsupervised optimization task is thus to minimize the reconstruction loss:
ii.2 Variational Force Matching
The CG auto-encoder provides an unsupervised variational method to learn the coarse grained coordinates. In order to learn the coarse-grained potential energy as a function of also-learned coarse grained coordinates, we propose an instantaneous force-matching functional that is conditioned on the encoder. The proposed functional enables the learning of empirical force fields parameters and the encoder simultaneously by including the optimization of in the force-matching procedure. Training empirical potentials from forces has a series of advantages: (i) the explicit contribution on every atom is available, rather than just pooled contributions to the energy, (ii) it is easier to learn smooth potential energy surfaces and energy-conserving potentials Chmiela et al. (2017) and (iii) instantaneous dynamics, which represent a trade-off in coarse-graining, can be captured better. Forces are always available if the training data comes from molecular dynamics simulations, and for common electronic structure methods based on density functional theory, forces can be calculated at nearly the same cost as self-consistent energies.
The force-matching approach builds on the idea that the average force generated by the coarse grained potential should reproduce the coarse-grained atomistic forces from the thermodynamic ensemble Izvekov et al. (2005); Zhang et al. (2018); Ciccotti et al. (2005). Given an atomistic potential energy function , the probabilistic distribution of atomistic configurations is:
The distribution function of coarse-grained variables and corresponding many-body potential of mean force are:
where is the mean force and
represents a family of possible vector fields such that. We further define to be the instantaneous force and its conditional expectation is equal to the mean force . It is important to note that is not unique and depends on the specific choice of Kalligiannaki et al. (2015); Ciccotti et al. (2005); Den Otter (2000), but their conditional averages return the same mean force.
With as a function of , we adopt the force-matching scheme introduced by Izvekov et al. Izvekov and Voth (2006); Izvekov et al. (2005), in which the mean square error is used to match the mean force and the “coarse-grained force” is the negative gradient of the coarse-grained potential. The optimizing functional, developed based on Izvekov et al., is
where is the parameters in and
represents the “coarse grained forces” which can be obtained from automatic differentiation as implemented in open-source packages like PyTorchPaszke et al. (2017). However, to compute the mean force would require constrained dynamics Ciccotti et al. (2005) to obtain the average of the fluctuating microscopic force. According to Zhang et al Zhang et al. (2018), the force-matching functional can be alternatively formulated by treating the instantaneous mean force as an instantaneous observable with a well-defined average being the mean force :
on the condition that .
Now the original variational functional becomes instantaneous in nature and can be reformulated as the following minimization target:
Instead of matching mean forces that need to be obtained from constrained dynamics, our model minimizes with respect to and . can be shown to be related to with some algebra : : Zhang et al. (2018). This functional provides a variational way to find a CG mapping and its associated force fields functions.
ii.3 Model Training
The overall loss function to be optimized is the joint loss of the reconstruction loss and instantaneous force-matching loss. The total loss function is. The schematic for optimization stack is shown in Fig. 1.
We train the model from atomistic trajectories with the atomistic forces associated with each atom at each frame. The model is trained to minimize the reconstruction loss along with force-matching loss as shown in Figure 1. It is propagated in the feed-forward direction and its parameters are optimized using back-propagation Hecht-Nielsen (1989).
In practice, we first train the auto-encoder in an unsupervised way to obtain a representative coarse-graining mapping. The supervised force-matching task is then trained jointly with the auto-encoder to variationally find and further optimize and to achieve a final coarse-grain mapping and its associated force fields.
For the convenience of optimization, we choose our functional form to be empirical force fields which include bonded and non-bonded interactions. However, empirical classical force fields may not be sufficiently expressible to fit the complicated many-body potential of mean force but are fast to evaluate and transferable between different systems. Simple functional forms have the speed and scaling desired for large-scale CG simulations. One can also use cubic splines and neural network potentialsBehler and Parrinello (2007) which have more flexibility in fitting.
Autograin is first demonstrated on coarse-graining single-molecule trajectories of ortho-terphenyl (OTP) and aniline () in a vacuum. We initially train an auto-encoder for reconstruction and subsequently include the supervised force-matching task.
For the OTP molecule, we choose as the dimension of the coarse-grained space and each coarse-grained super-atom is treated as different species. The model is first initialized with random weights and trained as described in Algorithm 1 by gradually increase the value . The coarse-grain encoding gradually learned the most representative coarse-grained mapping by minimizing . For the case of OTP, the coarse-grained rules automatically captured by the model is to group each of the phenyl rings into a bead (Fig 2 b). For the coarse-graining of aniline into two pseudo atoms, our model selects the coarse-grain mapping that partitions the molecules into two beads: one contains the amino group, and the closest three phenyl carbons plus their attached hydrogens, the other groups three carbons and their associated hydrogens (Fig 2 b). This optimal mapping is not necessarily the first intuitive mapping one could propose, a more obvious choice being one particle on the phenyl and one on the amino group.
We then performed new calculations in the coarse-grained variables using to obtain validation trajectories in CG space, and compared the equilibrium structural correlations with held-out data from the all-atom simulations. As shown in Figure 3, the mapped atomistic distributions derived from agree well with the Boltzmann distribution derived from
for each degree of freedom in the case of OTP. Figure4 shows good agreement between bond distributions for aniline.
Generally in coarse-graining, an arbitrary highly-complex potential can be always be trained to reproduce radial distribution functions, of then at the expense of non-physicality (multiple local minima in two-body potentials, repulsive regions in between attractive regions, etc). Our approach was able to learn simple harmonic potentials that should result in higher transferability. When a highly expressive neural potential is trained, the curves are reproduced almost exactly, by at the expense of a less physical functional form.
Fig 5 shows a demonstration of the decoding of the OTP molecule. Because the coarse-graining encoding condenses the atomistic trajectories through an informational bottleneck, CG structures do not contain all the structural information in its original detail. By inspecting the decoded structure of OTP, we note that while the central phenyl rings can be decoded back with good fidelity, the two side phenyl rings however cannot be decoded back with original resolution. This is unsurprising, because the coarse-grained representation lacks the degrees of freedom to describe the relative orientations among phenyl rings. The coarse-grained super-atoms condense different relative rotation of the two side phenyl rings into the same coarse-grained states, and the information about rotational degrees of freedom is lost. Therefore, the decoder learns to map the coarse-grained variables into a averaged mean structure that represents the ensemble of relative rotations of the two side phenyl rings. The prospect of stochastic decoding functions to capture thermodynamic up-scaling is discussed below.
We have also applied Autograin on liquid system of methane and ethane. The training trajectories are for 64 all-atom molecules. The encoder and force-matching functional we trained as described as above. After training, the learned coarse-grained mapping and was applied to coarse grain a test system of 512 methane and 343 ethane molecules with the same density. The relevant pairwise structural correlation functions for each individual system were then compared.
For methane, we choose and only include 12-6 Lenard Jones interactions in . As shown in Fig 6, the correlation function of coarse-grained particles and obtain nearly perfect agreement between the CG and atomistic simulations. This is a expected result because the pairwise term is the only potential energy form in and therefore there are no cross correlations between different energy terms.
For ethane, we choose and include the bonded potential and a 9-6 Lenard Jones potentials to describe the Van der Waals interactions in . From training, we obtain a coarse-grained mapping that groups each moiety into one pseudo atom. As seein in Figure 7, reasonable is obtained agreement in the correlation function between the CG and mapped atomistic trajectories. We postulate that the discrepancy arises from a combination of: 1) The form of only including classical bonded and non-bonded term, and thus lacking sufficient flexibility to fit any arbitrary interacting potentials. As discussed above, during coarse-graining it is common compensate the high-order correlations lost from spatial coordinates into complex spurious contributions to the potential. 2) The force-matching method does not address structural cross correlation and it is not necessarily guaranteed to recover the atomisic correlation function perfectly, as discussed by Noid et al. and Lu et al. Noid (2013); Lu et al. (2013). The structural cross-correlation consideration is addressed in other CG methods like generalized Yvon-Born-Green method Mullinax and Noid (2010) and iterative force matching Lu et al. (2013)
. However, these methods cannot be directly be incorporated in our proposed framework based on stochastic gradient descent optimization.
Iv Discussion and further research
In this work, we propose a coarse-grain mapping optimization scheme based on semi-supervised deep learning. In Autograin, an auto-econder framework is coupled to a rigorous mini-batch force-matching algorithm. By training on all-atom molecular dynamics simulations, the model can simultaneously learn optimal coarse-grain mappings and an accompanying potential with the desired functional form, from classical to neural. Results on simple model systems have shown that Autograin can learn intuitive and not so intuitive mappings, and effectively train potentials that can recover structural properties with good accuracy.
Within the current framework, there are several possibilities for future research directions, regarding both the supervised and unsupervised parts.
Here, we have presented a choice of deterministic encoder and decoder. However, such a deterministic CG mapping results, by construction, in an irreversible loss of information. This is reflected in the reconstruction of average all-atom structures instead of the reference instantaneous configurations. A probabilistic auto-encoder can go further by learning a reconstruction probability distribution that reflects the thermodynamics of the degrees of freedom averaged out by the coarse-graining. Using this framework as a bridge between different scales of simulation, generative models can help build better hierarchical understanding of multi-scale simulations.
Here, very simple forms were chosen for the coarse-grained potential, consisting of classical approximate explicit forms for bonded and non-bonded terms. This is convenient for optimization and less prone to over-fitting and converging to contrived functional forms with poor transferability. Autograin can be extended the use spline potentials, which are common in coarse-graining, or neural network force fields. These have the ability to capture higher order correlations and non-linear effect in the potential of mean force Behler and Parrinello (2007). The choice of force-matching approach does not guarantee the recovery of individual pair correlation functions derived from full atomistic trajectories Lu et al. (2013); Noid (2013). To include the learning of structural cross-correlations, our method can optimized to incorporate iterative force matching Lu et al. (2013) and relative entropy Shell (2016).
The automatic learning of multi-particle force fields on the fly requires automatic classification of atoms and variationally building empirical force-field topologies at training time. In the current model, a pre-determined topology is needed to calculate the total potential energy. It would be ideal to develop a probabilistic way to generate force field toplogies for discrete particle types that are variationally optimized along coarse-graining encoding. Recent advances in learning graphs shed some light in this line of research Jin et al. (2018); Xie and Grossman (2018); Duvenaud et al. (2015).
Methods based on force-matching, like other bottom-up approaches such as relative entropy, attempt to reproduce structural correlation functions at one point in the thermodynamic space. As such, they are not guaranteed to capture non-equilibrium transport properties Noid (2013); Davtyan et al. (2015) and are not necessarily transferable among different thermodynamic conditions Noid (2013); Carbone et al. (2008); Krishna et al. (2009). The data-driven approach we propose enables learning over different thermodynamic conditions. In addition, this framework opens new routes to understanding the coarse-grained representation influences transport properties by training on time-series data. A related example in the literature is to use to use a time-lagged auto-encoder Wehmeyer and Noé (2018) to learn a latent representation that best captures molecular kinetics.
In summary, we propose to treat coarse-grained coordinates as latent variables which can be sampled with molecular dynamics. By regularizing the latent space with force matching, we jointly train the encoding mapping, a deterministic decoding, and a transferable potential that can be used to simulate larger systems for longer times and thus accelerate molecular dynamics. Our work also opens up possibilities to use statistical learning as a basis to bridge across multiscale simulations
V Computational details
Molecular trajectory data was obtained by using the OPLS force field generated using the LigParGen server Dodda et al. (2017). For gas-phase single-molecule trajectories, we use a 6-ps trajectory of 3000 frames obtained by Langevin dynamics with a friction coefficient of 1
. Models were pre-trained for 100 epochs with mini-batches of size 10 in the unsupervised. The Adam optimizerKingma and Ba (2015) was used. We used PyTorch for training our model Paszke et al. (2017).
For single-molecule OTP we learn a classical potential consisting of two bonds and angle, we train the force-matching task to find the harmonic bond and angle potential parameters that best matches the forces from the training data. The CG structural distribution is obtained by computing the normalized Boltzmann probability for the bonds and angle distributions: and where , , and are obtained from training the CG potential.
In the case of molecular liquids, the training trajectories are obtained using NVT at 100K for 64 methane molecules and 120K for 64 ethane molecules.
WW thanks Toyota Research Institute for financial support. RGB thanks MIT DMSE and Toyota Faculty Chair for support. WW and RGB thank Prof. Adam P. Willard for helpful discussions.
- Agostino et al. (2017) M. D. Agostino, H. J. Risselada, A. Lürick, C. Ungermann, and A. Mayer, Nature 551, 634 (2017).
- Huang et al. (2010) D. M. Huang, R. Faller, K. Do, A. J. Moulé, A. J. Moul, and A. J. Moule, Journal of Chemical Theory and Computation 6, 1 (2010).
- Noid et al. (2008) W. G. Noid, J. W. Chu, G. S. Ayton, V. Krishna, S. Izvekov, G. A. Voth, A. Das, and H. C. Andersen, Journal of Chemical Physics 128, 243116 (2008).
- Marrink et al. (2007) S. J. Marrink, H. J. Risselada, S. Yefimov, D. P. Tieleman, and A. H. De Vries, Journal of Physical Chemistry B 111, 7812 (2007).
- Periole et al. (2009) X. Periole, M. Cavalli, S.-J. Marrink, and M. A. Ceruso, Journal of Chemical Theory and Computation 5, 2531 (2009).
- Rudzinski and Noid (2014) J. F. Rudzinski and W. G. Noid, The Journal of Physical Chemistry B 118, 8295 (2014).
- Pazmiño Betancourt et al. (2018) B. A. Pazmiño Betancourt, F. W. Starr, and J. F. Douglas, The Journal of Chemical Physics 148, 104508 (2018).
- He et al. (2011) Y. He, J.-Y. Chen, J. R. Knab, W. Zheng, and A. G. Markelz, Biophysj 100, 1058 (2011).
- Lawrence (2005) N. Lawrence, Journal of Machine Learning Research 6, 1783 (2005).
- Tolstikhin et al. (2018) I. Tolstikhin, O. Bousquet, S. Gelly, B. Schölkopf, and B. Schoelkopf, in International Conference on Learning Representations (2018).
- Kingma and Ba (2015) D. P. Kingma and J. Ba, in International Conference on Learning Representations (2015) arXiv:1412.6980 .
- Goodfellow et al. (2014) I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Advances in Neural Information Processing Systems (2014).
- Tishby and Zaslavsky (2015) N. Tishby and N. Zaslavsky, arXiv:1503.02406 (2015).
- Vincent et al. (2010) P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, Journal of Machine Learning Research 11, 3371 (2010).
- Bowman et al. (2015) S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz, and S. Bengio, (2015), arXiv:1511.06349 .
- Kusner et al. (2017) M. J. Kusner, B. Paige, and J. M. Hernández-Lobato, in Proceedings of the 34th International Conference on Machine Learning (2017).
- Gómez-Bombarelli et al. (2018) R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández-Lobato, B. Sánchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams, and A. Aspuru-Guzik, ACS Central Science 4, 268 (2018).
- Jin et al. (2018) W. Jin, R. Barzilay, and T. Jaakkola, in Proceedings of the 35th International Conference on Machine Learning (2018).
- Wehmeyer and Noé (2018) C. Wehmeyer and F. Noé, Journal of Chemical Physics 148 (2018).
- Mardt et al. (2018) A. Mardt, L. Pasquali, H. Wu, and F. Noé, Nature Communications 9, 5 (2018).
- Darve (2006) E. Darve, in New Algorithms for Macromolecular Simulation (2006) pp. 213–249.
- Jang et al. (2017) E. Jang, S. Gu, and B. Poole, in International Conference on Learning Representations (2017).
- Vaswani et al. (2017) A. Vaswani, G. Brain, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, in Advances in Neural Information Processing Systems (2017).
- Chmiela et al. (2017) S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Schütt, and K.-R. Müller, Science Advances 3, e1603015 (2017).
- Izvekov et al. (2005) S. Izvekov, G. A. Voth, S. I. And, and G. A. Voth*, Journal of Physical Chemistry B 109, 2469 (2005).
- Zhang et al. (2018) L. Zhang, J. Han, H. Wang, R. Car, and W. E, The Journal of Chemical Physics 149, 034101 (2018).
- Ciccotti et al. (2005) G. Ciccotti, R. Kapral, and E. Vanden-Eijnden, ChemPhysChem 6, 1809 (2005).
- Kalligiannaki et al. (2015) E. Kalligiannaki, V. Harmandaris, M. A. Katsoulakis, and P. Plecháč, The Journal of Chemical Physics 143, 84105 (2015).
- Den Otter (2000) W. K. Den Otter, Journal of Chemical Physics 112, 7283 (2000).
- Izvekov and Voth (2006) S. Izvekov and G. A. Voth, Journal of Chemical Theory and Computation 2, 637 (2006).
- Paszke et al. (2017) A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, in NIPS-W (2017).
- Hecht-Nielsen (1989) Hecht-Nielsen, in International Joint Conference on Neural Networks (IEEE, 1989) pp. 593–605 vol.1.
- Behler and Parrinello (2007) J. Behler and M. Parrinello, Physical Review Letters 98, 146401 (2007).
- Noid (2013) W. G. Noid, “Perspective: Coarse-grained models for biomolecular systems,” (2013).
- Lu et al. (2013) L. Lu, J. F. Dama, and G. A. Voth, Journal of Chemical Physics 139, 121906 (2013).
- Mullinax and Noid (2010) J. W. Mullinax and W. G. Noid, The Journal of Physical Chemistry C 114, 5661 (2010).
- Shell (2016) M. S. Shell, in Advances in Chemical Physics, Vol. 161 (Wiley-Blackwell, 2016) pp. 395–441.
- Xie and Grossman (2018) T. Xie and J. C. Grossman, Physical Review Letters 120, 145301 (2018).
- Duvenaud et al. (2015) D. K. Duvenaud, D. Maclaurin, J. Aguilera-Iparraguirre, R. Gómez-Bombarelli, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams, in Advances in Neural Information Processing Systems (2015) pp. 2215–2223.
- Davtyan et al. (2015) A. Davtyan, J. F. Dama, G. A. Voth, and H. C. Andersen, The Journal of Chemical Physics 142, 154104 (2015).
- Carbone et al. (2008) P. Carbone, H. A. K. Varzaneh, X. Chen, and F. Müller-Plathe, Journal of Chemical Physics 128, 64904 (2008).
- Krishna et al. (2009) V. Krishna, W. G. Noid, and G. A. Voth, The Journal of Chemical Physics 131, 24103 (2009).
- Dodda et al. (2017) L. S. Dodda, I. Cabeza de Vaca, J. Tirado-Rives, and W. L. Jorgensen, Nucleic Acids Research 45, W331 (2017).