Molecular Latent Space Simulators

07/01/2020 ∙ by Hythem Sidky, et al. ∙ The University of Chicago 71

Small integration time steps limit molecular dynamics (MD) simulations to millisecond time scales. Markov state models (MSMs) and equation-free approaches learn low-dimensional kinetic models from MD simulation data by performing configurational or dynamical coarse-graining of the state space. The learned kinetic models enable the efficient generation of dynamical trajectories over vastly longer time scales than are accessible by MD, but the discretization of configurational space and/or absence of a means to reconstruct molecular configurations precludes the generation of continuous all-atom molecular trajectories. We propose latent space simulators (LSS) to learn kinetic models for continuous all-atom simulation trajectories by training three deep learning networks to (i) learn the slow collective variables of the molecular system, (ii) propagate the system dynamics within this slow latent space, and (iii) generatively reconstruct molecular configurations. We demonstrate the approach in an application to Trp-cage miniprotein to produce novel ultra-long synthetic folding trajectories that accurately reproduce all-atom molecular structure, thermodynamics, and kinetics at six orders of magnitude lower cost than MD. The dramatically lower cost of trajectory generation enables greatly improved sampling and greatly reduced statistical uncertainties in estimated thermodynamic averages and kinetic rates.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

footnotetext:  Pritzker School of Molecular Engineering, University of Chicago, Chicago, USA. E-mail: andrewferguson@uchicago.edufootnotetext:  Department of Physics, University of Illinois at Urbana-Champaign, Urbana, USA.

1 Introduction

Molecular dynamics (MD) simulates the dynamical evolution of molecular systems by numerically integrating the classical equations of motion frenkel2002understanding. Modern computer hardware stone2010gpu, Shaw2014 and efficient and scalable simulation algorithms phillips2005scalable, chow2008desmond, glaser2015strong, plimpton1993fast have enabled the simulation of billion abraham2002work, abraham2002brittle and trillion-atom systems tchipev2019twetris. Advancing the barrier in time scale has proven far more challenging. Stability of the numerical integration requires time steps on the order of femtoseconds commensurate with the fastest atomic motions elber2016perspective, which limits simulations to microseconds on commodity processors elber2016perspective and milliseconds on special purpose hardware Shaw2014. Enhanced sampling techniques apply accelerating biases and analytical corrections to recover thermodynamic averages torrie1977nonphysical, mcdonald1967machine, abrams2014enhanced, miao2016unconstrained, sidky2020machine but – except in special cases and the limit of small bias chodera2011dynamical, donati2018girsanov – no analogous approaches exist to recover unbiased dynamical trajectories from biased simulations.

The MD algorithm propagates a molecular configuration at time to via transition densities noe2018machine, Fernandez2020

. Assuming ergodicity, the probability density over microstates converges to the stationary distribution as

. Breaking the time scale barrier requires a surrogate model for that can be more efficiently evaluated and with larger time steps than MD. Accurately approximating this propagator in the high-dimensional -atom configurational space is intractable. In general, for sufficiently large there is an emergent low-dimensional simplicity that admits accurate modeling of the dynamics by a low-dimensional propagator within a latent space . The relation between MD and latent space dynamics can be represented as noe2018machine, Fernandez2020,


This scheme defines three learning problems noe2018machine: (i) encoding of molecular configurations to the latent space , (ii) propagation of the latent space dynamics according to transition densities , and (iii) decoding (or generating) of molecular configurations from the latent space noe2018machine.

Markov state models (MSM) husic2018markov, pande2010everything, Prinz2011, bowman2013introduction, Sidky2019, Wehmeyer2019, Mardt2018, wu2017variational and the equation-free approach of of Kevrekidis and co-workers kevrekidis2003equation, kevrekidis2004equation, kevrekidis2009equation, Mori1965, Zwanzig1973, zwanzig2001nonequilibrium, risken2012fokker respectively employ configurational and dynamical coarse graining to parameterize low-dimensional propagators, but both methods lack molecular decoders. Recently, numerous deep learning approaches have been proposed to learn , , and

from MD trajectories, including time-lagged autoencoders

Wehmeyer2018, time-lagged variational autoencoders Hernandez2017, and time-lagged autoencoders with propagators Lusch2018. Training these networks requires a time-lagged reconstruction term within the loss, which can cause the network to fail to approximate the true slow modes Chen2019. Further, time-lagged autoencoders and time-lagged variational autoencoders do not learn valid propagators noe2018machine, and the inherent stochasticity of MD appears to frustrate learning of the propagator and decoder in time-lagged autoencoders with propagators noe2018machine. Deep generative MSMs (DeepGenMSM) simultaneously learn a fuzzy encoding to metastable states and “landing probabilities” to decode molecular configurations Wu2018. The method computes a proper propagator and generatively decodes novel molecular structures, but – as with all MSM-based approaches – it configurationally discretizes the latent space and relies on the definition of long-lived metastable states.

In this work, we propose molecular latent space simulators (LSS) as a means to train kinetic models over limited MD simulation data that are capable of producing novel all-atom molecular trajectories at orders of magnitude lower cost. The LSS can be conceived as means to augment conventional MD by distilling a kinetic model from training data, efficiently generating continuous all-atom trajectories, and computing high-accuracy estimates of any all-atom structural, thermodynamic, or kinetic observable.

The LSS is based on three deep learning networks independently trained to (i) learn an encoding into a latent space of slow variables using state-free reversible VAMPnets (SRV) Chen2019a, (ii) learn a propagator to evolve the system dynamics within this latent space using mixture density networks (MDN) Bishop1994, bishop2006pattern, and (iii) learn a decoding from the latent space to molecular configurations using a conditional Wasserstein generative adversarial network (cWGAN) Gulrajani2017. Separation of the learning problems in this manner makes training and deployment of the LSS modular and simple. The stochastic nature of the MDN propagator means that the trained kinetic model generates novel trajectories and does not simply recapitulate copies of the training data. The approach is distinguished from MSM-based approaches in that it requires no discretization into metastable states Fernandez2020, Wu2018. The continuous formulation of the propagator in the slow latent space shares commonalities with the equation-free approach kevrekidis2003equation, kevrekidis2009equation, but we eschew parameterizing a stochastic differential equation in favor of a simple and efficient deep learning approach, and also equip our simulator with a generative molecular decoder.

2 Methods

A schematic diagram of the LSS and the three deep networks of which it is comprised is presented in Fig. 1. We describe each of the three component networks in turn and then describe LSS training and deployment.

Fig.  1:

Schematic diagram of the latent space simulator (LSS) comprising three back-to-back deep neural networks. A state-free reversible VAMPnet (SRV) learns an encoding

of molecular configurations into a latent space spanned by the leading eigenfunctions of the transfer operator (Eqn. 

1). A mixture density network (MDN) learns a propagator to sample transition probabilities within the latent space. A conditional Wasserstein GAN (cWGAN) learns a generative decoding of molecular configurations conditioned on the latent space coordinates. The trained LSS is used to to generate ultra-long synthetic trajectories by projecting the initial configuration into the latent space using the SRV, sampling from the MDN to generate long latent space trajectories, and decoding to molecular configurations using the cWGAN.

2.1 Encoder: State-free Reversible VAMPnets

The transfer operator at a lag time

is the propagator of probability distributions over microstates with respect to the equilibrium density

under transition densities koltai2018optimal, klus2018data. For sufficiently large the dynamics may be approximated as Markovian so is time homogenous,


In equilibrium systems obeying detailed balance , is identical to the Koopman operator, self-adjoint with respect to , and possesses a complete orthonormal set of eigenfunctions

with real eigenvalues

Noe2013, Nuske2014, klus2018data, Chen2019a, wu2020variational,


The pair (=,=) corresponds to the equilibrium distribution at and the remainder to a hierarchy of increasingly quicker relaxing processes with implied time scales Chen2019a. The evolution of after applications of is expressed in this basis as,


The variational approach to conformational dynamics (VAC) defines a variational principle to approximate these eigenfunctions as within a basis by solving for optimal expansion coefficients Noe2013, Nuske2014, Chen2019a. SRVs Chen2019a – themselves based on VAMPnets, a deep learning-based method for MSM construction Mardt2018, and closely related to extended dynamic mode decomposition with dictionary learning li2017extended – employ deep canonical correlation analysis (DCCA) Andrew2013 to learn both the optimal expansion coefficients and

optimal basis functions as nonlinear transformations of the (featurized) molecular coordinates. This is achieved by training twin-lobed deep neural networks to minimize a VAMP-r loss function

Mardt2018. SRVs trained over MD trajectories furnish an encoding (Eqn. 1) into a -dimensional latent space spanned by , where is determined by a gap in the eigenvalue spectrum. This spectral encoding into the leading modes of neglects fast processes with implied timescales (Eqn. 4) and is an optimal parameterization of the system for a low-dimensional long-time propagator noe2018machine.

2.2 Propagator: Mixture Density Networks

At sufficiently large the latent space = supports an autonomous dynamical system in the leading modes of . We train MDNs to learn transition densities from MD trajectories projected in the latent space. MDNs combine deep neural networks with mixture density models to overcome poor performance of standard networks in learning multimodal distributions Bishop1994, bishop2006pattern. Transition densities are approximated as a linear combination of kernels,


where we choose to be -dimensional Gaussians. The -dependent Gaussian means

, variances

, and linear mixing coefficients are learned by a deep feedforward neural network that minimizes the loss function , where indexes pairs of time-lagged training data observations. The normalization =1 is enforced by softmax activations and the bounded using sigmoid activations.

The trained MDN defines the latent space propagator (Eqn. 1) and we sample transition densities to advance the system in time (Fig. 1). Propagation is conducted entirely within the latent space and does not require recurrent decoding and encoding to the molecular representations that can lead to accumulation of errors and numerical instability noe2018machine, Pathak2018. The transition densities are learned from the statistics transitions in the training data and new trajectories are generated by sampling from these transition densities. These new trajectories therefore represent novel dynamical pathways over the latent space and are not simply recapitulations or approximate copies of those in the training data. Successful MDN training is contingent on the low-dimensional and Markovian nature of the latent space dynamics at large discovered by the SRVs.

2.3 Decoder: Conditional Wasserstein GAN

Generative adversarial networks are a leading neural network architecture for generative modeling Goodfellow2014. We employ a cWGAN Arjovsky2017, Gulrajani2017 to decode from the latent space to molecular configurations by performing adversarial training between a generator that outputs molecular configurations from inputs and a critic that evaluates the quality of a molecular configuration . The networks are jointly trained to minimize a loss function based on the Wasserstein (Earth Mover’s) distance,


where is the distribution over molecular configurations observed in the MD training trajectory and is a family of -Lipschitz functions enforced through a gradient penalty Arjovsky2017, Gulrajani2017. To generate molecular configurations consistent with particular states in the latent space we pass as a conditioning variable to and Mirza2014 and drive the generator with -dimensional Gaussian noise . The noise enables to generate multiple molecular configurations consistent with each latent space location. We train the cWGAN over pairs by encoding each frame of the MD training trajectory into the latent space using the SRV. The trained cWGAN decoder (Eqn. 1) generates molecular configurations from the latent space trajectory produced by the propagator (Fig. 1).

3 Results and Discussion

3.1 4-well potential

We validate the LSS in an application to a 1D four-well potential Prinz2011 for which analytical solutions are available. In this simple 1D system we construct the propagator directly in , so encoding and decoding are unnecessary and this test validates that the MDN can learn transition densities to accurately reproduce the system thermodynamics and kinetics. We generate a time step Brownian dynamics trajectory in a dimensionless gauge with diffusivity ==1000 and a time step =0.001 beauchamp2011msmbuilder2. A MDN was trained using Adam kingma2014adam with early stopping over the scaled trajectory at a lag time of =100, with

=8 Gaussian kernels, and two hidden layers of 100 neurons with ReLU activations

goodfellow2016deep. The trained MDN was used to generate a step trajectory of the same length as the training data. Analytical transition densities were computed by partitioning the domain into 100 equal bins and defining the probability of moving from bin to bin as for , where is the potential at the center of bin and normalizes the total transition probability of bin Chen2019a.

The Brownian dynamics and synthetic LSS stationary distributions are in quantitative agreement with the analytical solution for the stationary density (Fig. 2a) and show very similar kinetic behaviors in their transitions between the four metastable wells (Fig. 2b). This agreement is due to the excellent correspondence between the analytical and learned transition densities (Fig. 2c,d).

Fig.  2: Validation of the LSS in a 1D four-well potential. The MDN propagator predicts (a) a stationary distribution, (b) kinetic transitions, and (c,d) transition densities in excellent accord with analytical and Brownian dynamics results.

3.2 Trp-cage miniprotein

We train our LSS over the 208 s all-atom simulation of the 20-residue TC10b K8A mutant of the Trp-cage mini-protein performed by D.E. Shaw Research Lindorff-Larsen2011. Generation of these MD trajectories would require 2.5 days (2 million CPU-h) on the special purpose Anton-2 supercomputer or 6 months on a commodity GPU card Shaw2014.

The SRV encoder was trained over a featurization the trajectory employing backbone and sidechain torsions and C pairwise distances as informative and roto-translationally invariant descriptors Sidky2019. We trained a SRV with two hidden layers with 100 neurons,

activations, and batch normalization using Adam

kingma2014adam with a batch size of 200,000, learning rate of 0.01, and early stopping based on the validation VAMP-2 score Chen2019a, Sidky2019. A lag-time of =20 ns was chosen based on convergence of the transfer operator eigenvalues, and a =3-dimensional latent space encoding based on a gap in the eigenvalue spectrum. The MDN propagator was trained over the latent space projection of the MD trajectory at a lag time of =20 ns using Adam kingma2014adam with early stopping, =24 Gaussian kernels, and two hidden layers of 100 neurons with ReLU activations. The cWGAN decoder comprised a generator and critic with three hidden layers of 200 neurons with Swish ramachandran2017swish activations and a

=50-dimensional noise vector. The training loss stabilized after 52 epochs. The cWGAN is trained to generate the Trp-cage C

backbone by roto-translationally aligning MD training configurations to a reference structure. Training of the full LSS pipeline required 1 GPU-h on a NVIDIA GeForce GTX 1080 Ti GPU core.

The trained LSS was used to produce 100208 s synthetic trajectories each requiring 5 s on a single NVIDIA GeForce GTX 1080 Ti GPU core. The LSS trajectories comprise the same total number of frames as the 208 s all-atom trajectory but contain 1070 folding/unfolding transitions compared to just 12 in the training data and were generated at six orders of magnitude lower cost. This observation illuminates the crux of the value of the approach: the LSS learns a kinetic model over limited MD training data and is then used to generate vastly longer novel all-atom trajectories that enable the observation of states and events that are only sparsely sampled in the training data. We now validate the thermodynamic, structural, and kinetic predictions of the LSS.

Thermodynamics. The free energy profiles projected into the slowest latent space coordinate = show excellent correspondence between the MD and LSS (Fig. 3). The free energy of the folded (0) and unfolded (0.9) basins and transition barrier are in quantitative agreement with a root mean squared error between the aligned profiles of 0.91 . The LSS profiles contain 10-fold lower statistical uncertainties than the MD over the same number of frames due to the 100-fold longer LSS data set enabled by their exceedingly low computational cost.

Fig.  3: Free energy profiles for the MD and LSS trajectories projected into the slowest latent space coordinate

. Shaded backgrounds represent standard errors estimated by five-fold block averaging. The profiles agree within a 0.91

root mean squared error. Ten representative structures from the MD and LSS ensembles are sampled from the folded (0), unfolded (0.9), and metastable (0.45) regions.

Structures. The MD and LSS molecular structures within the folded basin () and metastable transition state () possess a relative -RMSD of 0.29 nm and 0.37 nm, respectively. Relative to the Trp-cage native state (PDB ID: 2JOF), the MD and LSS folded configurations possess a

-RMSD of 0.20 nm and 0.28 nm, respectively. The mean and standard deviation of the time-averaged radii of gyration (

) for the MD (0.870.16) nm and LSS (0.870.13) nm trajectories are indistinguishable with standard errors computed by five-fold block averaging. These results demonstrate that the LSS molecular structures are in excellent accord with MD.

Kinetics. We compare the MD and LSS kinetics through the autocorrelation times corresponding to the relaxation time scales associated with the =3 leading kinetic processes. All three time scales are in excellent agreement and again the LSS uncertainties are approximately 10-fold lower than the MD (Table 1).

Timescale MD (s) LSS (s)
3.00 0.61 2.89 0.12
0.54 0.37 0.43 0.04
0.45 0.12 0.42 0.01
Table 1: Implied time scales of leading Trp-cage modes. Standard errors estimated by five-fold block averaging.

We then employ time-lagged independent component analysis (TICA)

perez2013identification, noe2013variational, nuske2014variational, noe2015kinetic, noe2016commute, perez2016hierarchical, schwantes2013improvements, klus2018data to determine whether the LSS trajectory possesses the same slow (linear) subspace as the MD. We featurize the trajectories with pairwise C distances and perform TICA at a lag time of =20 ns. Projection of the free energy surfaces into the leading three MD TICA coordinates show that the leading kinetic variance in the MD data is quite accurately reproduced by the LSS (Fig. 4). The only substantive disagreement is absence in the LSS projection of a small high-free energy metastable state at (TIC1 , TIC3 ) corresponding to configurations with Pro18 dihedral angles . These configurations are only transiently occupied due to rare Pro18 dihedral flips that occur only twice during the 208 s MD trajectory and are not contained in the =3-dimensional latent space.

Fig.  4: Free energy profiles of the MD and LSS trajectories projected into the leading three MD TICA coordinates.

4 Conclusions

We have presented LSS as a method to learn efficient kinetic models by training three state-of-the-art deep learning networks over MD training data and then using the trained model to generate novel all-atom trajectories at six orders of magnitude lower cost. The spirit of the approach is similar to MSM-based and equation-free approaches that use limited MD training data to parameterize highly-efficient kinetic models that can then be used to generate dynamical trajectories over vastly longer time scales than are possible with conventional MD. In contrast to these approaches, the absence of any discretization of the configurational space and provisioning with a molecular decoder enables the LSS to produce continuous all-atom molecular trajectories. Importantly, the probabilistic and generative nature of the approach means that the generated molecular trajectories are novel and not simply a reproduction of the training data, and the statistics of these trajectories accurately reproduce the structural, thermodynamic and kinetic properties of the molecular system.

The dramatic reduction in the cost of trajectory generation opens a host of valuable possibilities: vastly improved sampling of configurational space and dynamical transitions enable estimates of thermodynamic averages and kinetic rates with exceedingly low statistical uncertainties; parameterization of kinetic models with modest training data enable the production of ultra-long trajectories on commodity computing hardware; representation of the kinetic model as the parameters of a trio of deep networks enables efficient sharing of a “simulator in a box” that can then be used for rapid on-demand trajectory generation. The properties of the trained kinetic model – the dimensionality of the slow latent space, the structural correspondence of the slow modes, and the transition probabilities of the propagator – also provide fundamental insight and understanding of the physical properties of the molecular system.

As with all data-driven approaches, the primary deficiency of the LSS approach is that the resulting kinetic models are not necessarily transferable to other conditions or systems and are subject to systematic errors due to approximations in the molecular force fields and incomplete sampling of the relevant configurational space in the training data. The latter issue means that although the generated LSS trajectories are – similar to MSM-based and equation-free approaches – largely interpolative. The stochastic nature of the MDN propagator and generative nature of the cWGAN generator means that we may anticipate local extrapolations beyond the exact training configurations

Wu2018. There is no expectation, however, that the trained model will discover new metastable states or kinetic transitions, and certainly not do so with the correct thermodynamic weights or dynamical time scales.

The present work has demonstrated LSS in a data-rich training regime where the MD training data comprehensively samples configurational space. The next step is to establish an adaptive sampling paradigm – similar to that in MSM construction pande2010everything and some enhanced sampling techniques chen2018collective, chiavazzo2017intrinsic, preto2014fast, zheng2013rapid – to enable its application in a data-poor regime. The adaptive sampling approach interrogates the kinetic model to identify under-sampled states and transitions that contribute most to uncertainties in the model predictions (i.e., “known unknowns”) and initializes new MD simulations to collect additional training data in these regions. This interleaving of MD training data collection and model retraining can dramatically reduce the required quantity of training data pande2010everything. Moreover, new simulations initialized in under-sampled regions may also occasionally be expected to transition into new configurational states not present in the initial training data (i.e., “unknown unknowns”) preto2014fast. Iterating this process until convergence can expand the range of the trained kinetic model to encompass the relevant configurational space and minimize the cost of training data collection.

Finally, we also envisage applications of the LSS approach to other fields of dynamical modeling where stiff or multi-scale systems of ordinary or partial differential equations, or the presence of activated processes or rare events introduces a separation of time scales between the integration time step and events of interest. For example, there may be profitable adaptations of the approach for dynamical modeling in such fields as cosmology, ecology, immunology, epidemiology, and climatology.

Conflicts of interest

A.L.F. is a consultant of Evozyne and a co-author of US Provisional Patents 62/853,919 and 62/900,420 and International Patent Application PCT/US2020/035206.


This work was supported by MICCoM (Midwest Center for Computational Materials), as part of the Computational Materials Science Program funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division and the National Science Foundation under Grant No. CHE-1841805. H.S. acknowledges support from the National Science Foundation Molecular Software Sciences Institute (MolSSI) Software Fellows program (Grant No. ACI-1547580) krylov2018perspective, wilkins2018nsf. We are grateful to D.E. Shaw Research for sharing the Trp-cage simulation trajectories.

Notes and references