Exploring the configuration space of large atomic and molecular systems is a problem of fundamental importance for many applications, including protein folding, materials design, and understanding chemical reactions, etc. There are several difficulties associated with these applications. The first is that the dimensionality of the configuration space is typically very high. The second is that there are often high energy barriers associated with the exploration. Both difficulties can be reduced by the introduction of collective variables (CVs) and the mapping of the problem to the CV space. The problem then becomes finding the free energy surface (FES) associated with the set of CVs, a problem that has attracted a great deal of interest in the last few decades kumar1992weighted ; kumar1995multidimensional ; voter1997hyperdynamics ; sugita1999replica ; vandevondele2000efficient ; earl2005parallel ; christ2008multiple ; gao2008self ; laio2002escaping ; barducci2008well ; maragliano2006temperature ; maragliano2008single ; abrams2008efficient ; yu2011temperature . One of the most effective techniques is metadynamics laio2002escaping , which computes a biasing potential by depositing Gaussian bases along the trajectory in the CV space. It is shown that the biasing potential converges to the inverted free energy at the end of the calculation barducci2008well
. Also closely related to our work are the recent papers that propose to use machine learning methods to help parameterizing FESstecher2014free ; mones2016exploration ; galvelis2017neural ; schneider2017stochastic . In particular, the deep neural network (DNN) model has shown promise in effectively representing the FES defined on high dimensional CV space galvelis2017neural ; schneider2017stochastic .
In this work, we take metadynamics and machine learning methods one step further by making an analogy between reinforcement learning sutton1998reinforcement
and the task of configuration space exploration and FES calculation. Classical reinforcement learning scheme involves a state space, an action space, and a reward function. The objective is to find the best policy function, which is a mapping from the state space to the action space, that optimizes the cumulative reward function. Our problem can be thought of as being a multi-scale reinforcement learning problem. We have a micro-state space, the configuration space of the detailed atomic system, and a macro-state space, the space of the CVs. The action space will be represented by the biasing potential in the biased molecular dynamics on the micro-state space. The optimal policy function is the inverted FES, defined on the macro-state space. The FES is parameterized by a carefully designed DNN model. Among other things, this allows us to handle cases with a large set of CVs. In the absence of an explicit reward function, we introduce an uncertainty indicator that can be used to quantify the accuracy of the FES representation. It is defined as the standard deviation of the predictions from an ensemble of DNN models, which are trained using the same dataset but different initialization of the model parameters. The bias is only adopted in regions where the uncertainty indicator is low, i.e. regions that are sufficiently explored, and thus the exploration in the insufficiently explored region is encouraged. We call this scheme the “reinforced dynamics”, to signal its analogy with reinforcement learning.
Roughly speaking, reinforced dynamics works as follows: The biasing potential, or the action, is initialized at 0 and is expected to converge to the inverted FES as the dynamics proceeds. Each step of the macro-iteration involves the following components. First, a biased MD is performed, in which the system is biased only in the regions where the uncertainty indicator is low. The biased simulation is likely to visit the CV regions never visited before or where the FES representation quality is poor. Next, a certain number of the newly visited CV values in regions where the uncertainty indicator is high are added to the training dataset. A restrained MD is performed to obtain the mean force, or the negative gradient of the FES, at each of the newly added CV values. Finally, the accumulated CV values and the mean forces are used as labels to train several network models, which give the current estimate of the biasing potential and the uncertainty indicator. This process is repeated iteratively until convergence is achieved, when the newly visited CV values all fall in the regions where the uncertainty indicator is low.
The quality of the free energy surface is determined by the quality of the CVs. Ideally we would like the FES to capture the structural and dynamic information of the system, such as the important metastable states and transitions between the metastable states. For many years, since our ability to accurately approximate the FES has been limited to systems with a small number of CVs, we have always faced the dilemma that choosing the right CVs is both critical and practically impossible. We believe that the ability of the reinforced dynamics to handle a large set of CVs will make the issue of choosing the right CVs much less critical.
In this paper, we give a systematic presentation of the theoretical and practical aspects of reinforced dynamics. We first focus on methodology and introduce the theory and flowchart of the reinforced dynamics scheme. Then we use the classical example of alanine dipeptide and tripeptide with two and four CVs, respectively, as illustrations due to their intuitive appeal. The solvent effect is explicitly considered in both examples. The FESs constructed by the reinforced dynamics are compared with those constructed by long brute-force simulations (5.1 s for alanine dipeptide and 47.7 s for tripeptide) to demonstrate the accuracy and efficiency of the method. Finally, an application to the structural optimization of the polyalanine-10 system with 20 CVs is presented to demonstrate the practical promise of reinforced dynamics.
ii.1 Free energy and mean forces
We assume that the system we are studying has atoms, with their positions denoted by . The potential energy of the system is denoted by . Without loss of generality, we consider the system in a canonical ensemble. Given predefined CVs, denoted by , the free energy defined on the CV space is
with being the normalization factor. The brute-force way of computing the free energy (1
) is to sample the CV space exhaustively and to approximate the probability distributionby making a histogram of the CVs. This approach may easily become prohibitively expensive. In such a case, an alternative way of constructing the FES is to fit the mean forces acting on the CVs, i.e.,
Several ways of computing have been proposed ciccotti2005blue ; maragliano2006temperature ; abrams2008efficient . We will adopt the approach of restrained dynamics proposed in maragliano2006temperature . In this formulation, a new term is added to the potential of the system to represent the effect of the spring forces between the configuration variables and the CVs. It can be shown that the mean force is given by for , where the -th component of is defined to be
Here is the normalization factor, are the spring constants for the harmonic restraining potentials, and is defined by
In practice, the spring constants are chosen to be large enough to guarantee the convergence to the mean forces. The time duration for the restrained dynamics should be longer than the largest relaxation timescale of the fast modes of the system, in order for the ensemble average in Eq. (3) to be approximated adequately by the time average. In the rest of the paper, we do not explicitly distinguish and .
ii.2 Free energy representation
The free energy will be represented by a deep neural network (DNN) model, in which the input CVs are first preprocessed, then passed through multiple fully connected hidden layers, and, in the end, mapped to the free energy. The structure of the DNN model is schematically illustrated in Fig. 1. Mathematically, a DNN representation with hidden layers is given by
where “” denotes function composition. The differentiable operator represents the system-dependent preprocessing procedure for the CVs, which will be illustrated by the examples in Sec. III. For the -th hidden layer, which has neurons , is the operation that maps to , using:
Here and are coefficients of a linear mapping, often called weights.
is the so-called activation function, which is in general nonlinear. In this project we use the component-wise hyperbolic tangent function for. The output layer is defined by
where and are the weights of the linear mapping. Finally, and constitute all the DNN model parameters to be determined. We note that the gradient, representing the mean force
is well defined since each layer of the construction (5) is differentiable, and hence the DNN representation of the free energy is also differentiable.
It should be noted that the design of the DNN model can be adapted to different kinds of problems. We use the fully-connected DNN model here for simplicity of discussion. For example, for some condensed systems, an alternative network model resembling the one used in the Deep Potential method should be preferred han2017deep ; zhang2017deep . We leave this to future work.
ii.3 Training and uncertainty indicator
The DNN representation of the free energy is obtained by solving the following minimization problem
The loss functionis defined by
where denotes the set of training data and denotes the size of the dataset . Here comes from the DNN model, and is the collected mean force for the data . Precise ways of collecting the data will be discussed later. It should be noted that at the beginning of the training process, we have no data. Data is collected as the training process proceeds.
To guarantee accuracy for this model, we require that the CV values in is an adequate sample of the CV space. This is made difficult due to the barriers on the energy landscape. The MD will tend to be stuck at metastable states without being able to escape. To help overcome this problem, we introduce a biased dynamics. Details of that will be discussed in the next subsection.
A key notion for reinforced dynamics is the uncertainty indicator. This quantity is important in the data collection step as well as in the biased dynamics step. Our intuition is that the DNN model should produce a reasonably accurate prediction of the free energy in regions that are adequately covered by , but is much less so in regions that are covered poorly by (or have not been visited by the MD). To quantify this, we introduce a small ensemble of DNN models, where the only difference between these models is the random weights used to initialize them. We can then define the uncertainty indicator as , the standard deviation of the force predictions, viz.
where the ensemble average is taken over this ensemble of models. One expects that this ensemble of models give rise to predictions of the mean forces that are close to each other in regions well covered by . In the regions that are covered poorly by , the predictions will scatter much more. This is confirmed by our numerical results.
Finally, it is worth noting that the minimization problem (9
) is solved by the stochastic gradient descent (SGD) method combined with the back-propagation algorithmlecun2012efficient . This has become the de facto standard algorithm for training DNN models. In all the test examples, we first adopt a random initialization procedure for the weights, where each component in in Eq. (6
) is initialized from a normal distribution with mean 0 and standard deviation, and each component in is initialized from a normal distribution with mean 0 and standard deviation 1. Then at each training step, the weights are updated based on the evaluation of the loss function on a small batch, or subset of the training data , i.e.,
ii.4 Adaptive biasing
A way of encouraging the MD to overcome the barriers in the energy landscape and escape metastable regions is to add a bias to the potential. The force on the -th atom then becomes:
Since the FES is the best approximation of the potential energy in the space of CVs, it is natural to use the current approximation of the FES, with a negative sign added, as the biasing potential, as is done in metadynamics laio2002escaping ; barducci2008well . We will adopt the same strategy but we propose to switch on the biasing potential only in regions where we have low uncertainty on the DNN representation of the FES:
where the biasing potential is the mean of the predefined ensemble of DNN models, and is a smooth switching function defined by
Here and are two uncertainty levels for the accuracy of the DNN model. In regions where the uncertainty indicator is smaller than the level , the accuracy of the DNN representation of is adequate, and hence the system will be biased by . In the regions where is larger than level , the accuracy of the DNN representation is inadequate, and the system will follow the original dynamics governed by the potential energy . In between and , the DNN model is partially used to bias the system via a rescaled force term .
ii.5 Data collection
After the biased MD, a number of the newly visited CV values that are in the regions with high uncertainty are added to the training dataset . The regions with high uncertainty are defined to be the CV values that give rise to large uncertainty indicator, viz., . A reasonable choice of the threshold is . For each value of the CV in , we use the restrained dynamics to calculate the mean forces via Eq. (3). These values, together with those computed in previous iterations, are used as the labels for training the next updated model.
ii.6 The reinforced dynamics scheme
Fig. 2 is a flowchart of the reinforced dynamics scheme. Given an initial guess of the FES represented by the DNN, a biased MD, i.e. Eq. (14), is performed to sample the CV space from an arbitrarily chosen starting point. If no a priori information on the FES is available, then a standard MD is carried out. The visited CV values are recorded at a certain time interval and tested by the uncertainty indicator to see whether they belongs to a region with high uncertainty in the CV space. If all the newly sampled CV values from the biased MD trajectory belong to the region with low uncertainty, it can be (1) the biased MD is not long enough, so parts of the CV space are not explored, (2) the interval for recording CV values along the biased MD is not small enough, so some visited CV values belonging to the region with high uncertainty are missed, or (3) the DNN representation for FES is fully converged, then the iteration should be stopped and one can output the DNN representation for the FES, namely the mean of the predefined ensemble of models. Case (1) can be excluded by systematically increasing the length of the biased simulation. Case (2) can be excluded by decreasing the recording interval.
If CV values belonging to the region with high uncertainty are discovered, they will be added to the training dataset . The CV values that are already in the training dataset should be retained and serve as training data for later iterations. The mean forces at the added CV values are computed by the restrained dynamics Eq. (3). A new ensemble of DNN models for the FES are then trained, using different random initial guesses for . The standard deviation of the predictions from these models is again used to estimate the uncertainty indicator . The iteration starts again using the biased MD simulation with the new DNN models.
Finally, it is worth noting that the restrained MD simulations for mean forces, which take over most of the computation time in the reinforced dynamics scheme, are embarrassingly parallelizable. The training of the ensemble of DNN models is also easily parallelizable. Several independent walkers can be set up simultaneously for a parallelized biased simulation, and this provides a more efficient exploration of the FES. These techniques can help accelerating the data collection process and benefit large-scale simulations for complex systems.
Iii Numerical examples: alanine dipeptide and tripeptide
iii.1 Simulation setup
We investigate the FES of the alanine dipeptide (ACE-ALA-NME) and alanine tripeptide (ACE-ALA-ALA-NME) modeled by the Amber99SB force field hornak2006comparison . The molecules are dissolved in 342 and 341 TIP3P jorgensen1983comparison water molecules, respectively, in a periodic simulation cell. All the MD simulations are performed using the package GROMACS 5.1.4 abraham2015gromacs . The cut-off radius of the van der Waals interaction is 0.9 nm. The dispersion correction due to the finite cut-off radius is applied to both energy and pressure calculations. The Coulomb interaction is treated with smooth particle mesh Ewald method essmann1995spm with a real space cut-off 0.9 nm and reciprocal space grid spacing 0.12 nm. The system is integrated with the leap-frog scheme at timestep 2 fs. The temperature of the system is set to 300 K by velocity-rescale thermostat bussi2007canonical with a relaxation time 0.2 ps. The solute and solvent are coupled to two independent thermostats to avoid the hot-solvent/cold-solute problem lingenheil2008hot . Parrinello-Rahman barostat parrinello1981polymorphic (GROMACS implementation) with a relaxation timescale 1.5 ps and compressibility is coupled to the system to control the pressure to 1 Bar. For both the alanine dipeptide and tripeptide, any covalent bond that connects a hydrogen atom is constrained by the LINCS algorithm hess1997lincs . The H-O bond and H-O-H angle of water molecules are constrained by the SETTLE algorithm miyamoto2004settle .
For the alanine dipeptide, two torsion angles (C, N, , C) and (N, , C, N), are chosen as CVs for this system, i.e. . While for the alanine tripeptide, the same torsion angles associated with the first and second s, denoted by , , and , , respectively, are used as CVs for the system, i.e. . The GROMACS source code is modified and linked to PLUMED 2.4b tribello2014plumed to carry out the biased and restrained simulations. The PLUMED package is modified to compute the DNN biasing force, viz., Eq. (14). The DNN models used in both examples contain three hidden layers of size . The preprocessing operator for the alanine dipeptide is taken as , so the periodic condition of the FES is guaranteed. Similarly, the preprocessing operator for the alanine tripeptide isabadi2016tensorflow , using the Adam stochastic gradient descent algorithm kingma2014adam with a batch size of . The learning rate is 0.001 in the beginning and decays exponentially according to , where is the training step, is the decay rate, and is the decay step. The total number of training steps is
. Currently, the DNN structure and hyperparameters in the training algorithm are decided empirically. Before performing the full reinforced dynamics, we typically accumulate some data from some small scale simulations, test the performance of different DNN models and training schemes, and then fix the optimal strategy in terms of accuracy and efficiency. In practice, we find that a DNN model with a decreasing number of nodes going from the innermost to the outermost hidden layers performs better in our test cases.
In each reinforced dynamics step, four DNN models with independent random initialization are trained in the same way to compute the uncertainty indicator. The biased MD simulations of alanine dipeptide and tripeptide last for 100 ps and 140 ps, respectively. The CV values along the MD trajectories are computed and recorded in every 0.2 ps. We assume no a priori information regarding the FES, so a brute-force simulation is performed for the 0th iteration step (we count the iterations from 0). In each iteration at most 50 recorded CV values in the region with high uncertainty are added to the training dataset . Restrained MD simulations with spring constant are performed to estimate the mean forces by Eq. (3). Each restrained MD simulation is ps and 140 ps long for the alanine dipeptide and tripeptide, respectively. The CV values are recorded in every 0.01 ps along the restrained MD trajectory to estimate the mean forces. Both of the alanine dipeptide and tripeptide examples are carried out on a desktop computer with an Intel i7-3770 CPU and 32 GB memory.
iii.2 Free energy surface construction
The FES of the alanine dipeptide on the - plane (known as the Ramachandran plot) is reported in Fig. 3. We perform 6 independent brute-force MD simulations, with each 860 ns long, thus in total 5.1 s MD trajectories are used to estimate the FES and compare with the reinforced dynamics result. The system has 5 metastable states , , , and , as noted in Fig. 3 (a). The , regions correspond to the dihedral angles observed in the -strands conformations. The and regions correspond to the dihedral angles of right- and left-handed -helix conformations, respectively. The transition between the and has to go over an energy barrier of 25 kJ/mol, or equivalently . The mean first passage time from the state to is shown to be 43 ns for the same model trendelkamp2016efficient .
In Fig. 3, the FES of alanine dipeptide sampled by the brute-force MD (a) is compared with the one constructed by reinforced dynamics (b) with uncertainty levels kJ/mol/rad and kJ/mol/rad. At the 9th iteration for (b), the biased simulation does not produce any CV value that belongs to the region with high uncertainty, thus the computation stops. In total (from the 0th to the 8th iteration) 198 CV values are added to the training dataset to train the FES. It is observed that the reinforced dynamics is able to reproduce, with satisfactory accuracy, the FES at the important metastable states and transition paths of the system. The difference between (a) and (b) is plotted in (c). The error of FES at states , and is below 0.5 kJ/mol, while the error at and is around 1.5 kJ/mol. The total biased MD simulation time is . The total restrained MD simulation time is . Thus the total MD simulation time is 20.8 ns, which is only % of the brute-force simulation length and half of the mean first passage time from to of the brute-force MD simulation. The total wall time of all the trainings is s, while the total wall time of all the restrained MD simulations is s.
It is noted that the accuracy of the FES can be systematically improved by using more strict uncertainty levels. The result of using kJ/mol/rad and kJ/mol/rad is reported in Fig. 3 (d) and (e). In this case, the biased MD simulation does not generate CV values belonging to the region with high uncertainty at the 21st iteration. In total (from the 0th to the 20th iteration) 303 CV values are added to the training dataset to construct the FES. The error of FES at all metastable states and transition regions is uniformly below 0.5 kJ/mol. The total biased MD simulation time is . The total restrained MD simulation time is . Thus the total MD simulation time is 32.5 ns, which is 50 % longer than the reinforced dynamics with higher uncertainty levels ( kJ/mol/rad and kJ/mol/rad), but still shorter than the mean first passage time from to of the brute-force simulation (43 ns).
The information of the four-dimensional FES of the alanine tripeptide constructed by brute-force MD sampling and the reinforced dynamics is presented in Fig. 4, by projecting on the , and planes. For example, the projection onto the variables is defined by
where is a constant that is chosen to normalize the minimum value of to zero. Projected free energies and are defined analogously. The uncertainty levels of the reinforced dynamics are set to kJ/mol/rad and kJ/mol/rad. The biased MD simulation of the 72nd iteration does not find any CV value belonging to the region with high uncertainty, so the process stops. From the 0th to the 71st iteration, 1363 CV values are added to the training dataset . The total biased MD simulation time is ns, while the total restrained MD simulation time is ns. The total wall time of the restrained MD simulations is s, while the total wall time for training the networks is s. For comparison, we carried out 18 independent brute-force MD simulation, each of which is 2.65 s long, so the total length of brute-force MD trajectories is 47.7 s. Fig. 4 shows that the reinforced dynamics is able to reproduce the FES with satisfactory accuracy on all the projected planes. It is noted that the projected FESs on both the and variables are different from the FES of alanine dipeptide, which indicates the correlation of backbone atoms.
iii.3 Illustration of the adaptive feature
To highlight the adaptive feature of the reinforced dynamics, we take the alanine dipeptide as an example, and illustrate in Fig. 5 the CV values visited in each biased MD simulation and those iteratively added to the training dataset . The uncertainty levels are , kJ/mol/rad, and the reinforced dynamics stops at the 9th iteration. In the 0th iteration, no a priori information of the FES is available, so the MD simulation is not biased. The starting state of the simulation is , and the system spontaneously transforms to states and in the 0.1 ns simulation 111 The mean first passage time from to and are 0.041 and 0.255 ns, respectively. See Ref. trendelkamp2016efficient ., thus the visited CV values cover , and , and 50 of them are randomly chosen as training data. The first DNN representation of FES is trained by these CV values, and is used to bias the system at the 1st iteration. Since the first DNN representation is of good quality at states , and , the system diffuses out of , and , and is trapped by a new metastable state . Only the visited CV values that sample the metastable state are added to the training dataset. The DNN representation trained by the updated dataset is of good quality at states , , and .
Following this observation, in the 2nd iteration, although the visited CV values cover a wide region including the metastable states , , and , only those in the transition regions between and , and between and / are added to the training set. The CV values added in the 3rd iteration are those that sample the metastable state and the transition region between and . The CV values added in the 4th iteration are those that sample the transition region between and .
From the 5th to the 8th iteration, the DNN representation of the FES is of relatively good quality. The CV values added to the training dataset are those that sample the border of high energy peaks at rad and rad. At the 9th iteration, no CV value belonging to regions with high uncertainty is found because the pushing-back events happen so quickly that the CV values are not recorded by the biased MD trajectory with the 0.2 ps recording interval. However, if we reduce the CV recording interval from 0.2 ps to 0.04 ps, 19 CV values can still be identified to be in the regions with high uncertainty and used to start the next biasing-and-training iteration. Since the construction of high energy FES peaks is of less interest, for the sake of computational cost, we do not use the smaller recording interval in our result. This means that the we ignore the FES regions with sharp gradient so that the biased system can only stay for a time scale that is much shorter than the recording interval. Better stopping criteria that guarantee the representation quality of the important structures of FES and excludes the irrelevant energy peaks are left for future studies.
iii.4 Remark on the choice of CVs
One important issue is to find the right set of CVs in order to capture the structural and dynamics information that we are interested in. However, this is a difficult problem and is not the topic of this work. Here, we will study how the enhanced sampling and free-energy estimation are affected when (unnecessary) additional CVs are included. We will see that the estimated free energy for the larger set of CVs is consistent with the one for the smaller set of CVs in the sense that after projecting the former onto the smaller set of CVs, one recovers the latter.
To this end, we compute the FES of alanine dipeptide in a four-dimensional CV space with two additional torsion angles (O, C, N, ) and (, C, N, H). The uncertainty levels that we use are and kJ/mol/rad. The result is shown in Fig. 6. The 4-dimensional FES projected on the – plane is shown in plot (c) of the figure. The native state locates at , while three metastable states are discovered at , and . They are denoted by , , and , respectively. The barrier between the native state and the metastable state / is around 70 kJ/mol. The free energy of metastable states , , and are 21 kJ/mol, 24 kJ/mol, and 46 kJ/mol, respectively, thus their contribution to the free energy projection on the – plane is negligible. A direct comparison of the free energy projection on the – plane with the brute-force MD result is shown in plot (b) of Fig. 6. The error is less than 2 kJ/mol. The result is consistent with the – free energy computed using reinforced dynamics (shown in Fig. 3).
Iv Application to polyalanine-10
In reinforced dynamics, both the neural network representation of the FES and the restrained simulation for mean forces are relatively insensitive to the dimensionality of the CV space. Thus it has the potential to be able to handle systems with a large set of CVs. As an illustrative example, we investigate the metastable conformations of a polyalanine-10 (ACE-(ALA)-NME) molecule. In this example, rather than constructing an accurate free energy in the whole space of CVs, our goal is to efficiently search for the most stable structures in the conformational space of the system. We will demonstrate that reinforced dynamics allows us to explore very efficiently the most relevant metastable conformations of this molecule, including the -helix and -strand conformations, and to provide estimates for the relative stability between different metastable states.
One technical remark is that for computational efficiency, we adopt a multi-walker scheme of reinforced dynamics for this relatively high-dimensional case. In each iteration of this scheme, different walkers undergo biased dynamics independently under the same biased potential. Next, a set of CV values with high uncertainty are selected and restrained simulations are performed to calculate the associated mean forces. Finally, the selected CV values and associated mean forces provided by all the walkers are merged and added to the dataset. An ensemble of new neural network models are then trained with this larger dataset. The multi-walker scheme improves the efficiency of the data collection step and it helps to accelerate the exploration procedure.
iv.1 Simulation setup
The system of polyalanine-10 (ACE-(ALA)-NME) is modeled by the Amber96 forcefield kollman1996advances . The molecule is in the gas phase and is set in a simulation region. To start with, we prepare misfolded initial configurations of the molecule in three stages. In the first stage, starting from an alpha-helix configuration, two ends of the molecule is pulled along the direction in an extended simulation region () at rate 0.1 nm/ps for 100 ps. During this process, no thermostat is used for the system. At the end of this stage, the backbone of the molecule is fully extended, and the temperature of the system increases to 1155 K. In the second stage, the pulling force is removed and the molecule is equilibrated at 300 K for 200 ps, using the velocity-rescaling thermostat bussi2007canonical with 0.2 ps of relaxation time and an integration time step of 1 fs. In the third stage, an unbiased productive simulation of 200 ps is carried out at 300 K with a time step of 2 fs. 100 candidate configurations along the trajectory of this simulation are saved in every other 2 ps. Finally, 14 independent walkers are initialized with randomly chosen configurations from these candidates.
The torsion angles and associated to all the s are used as CVs for the system, thus the dimension of the CV space is 20. The DNN model used in this example consists of 5 hidden layers of size . We found that the following procedure to be more efficient for the network training. In the first 6 iterations, the weights in different DNN models are randomly initialized, and are trained using the Adam stochastic gradient descent algorithm kingma2014adam with a batch size of . The learning rate is 0.003 in the beginning and decays exponentially according to , where is the training step, is the decay rate, and is the decay step. The total number of training steps is . After iteration 6, instead of randomly initializing the weights, we restart the training process using weights inherited from the previous iteration. We use the same batch size and decay rate of the first 6 iterations, but use different learning rate of 0.0003 and decay step of . This reduces the total number of training steps in each iteration to . The biased MD simulations are 100 ps long. The uncertainty levels are set to and kJ/mol/rad. The CV values are computed and recorded in every 0.2 ps along the biased trajectories. For each walker, at most 12 CV values in the region with high uncertainty are added to the training dataset . For each added CV value, a ps restrained MD simulation is carried out, wherein the CV values are recorded in every 0.01 ps to estimate the mean forces using Eq. (3). The simulations are carried out on one cluster node with two Intel Xeon E5-2680 v4 CPUs and 64 GB memory.
iv.2 Structure optimization
To find different metastable states and their relative stability, we combine the exploration stage, provided by the adaptively biasing procedure in reinforced dynamics, with an optimization stage, which can be viewed as a postprocessing of the explored configurations. In the exploration stage, due to the complexity of the 20-dimensional FES, we do not wait for the reinforced dynamics to stop by itself. Instead, we stop the process at the 210th iteration. The outputs of the biased MD simulations in each iterations, in total configurations, are thus selected for the next stage. We remark that basins associated to important metastable conformations may not be visited during the 210 iterations. This seems to be a common issue of algorithms for conformation space exploration, no matter by enhanced sampling or by brute-force simulation. However, reinforced dynamics drastically accelerates the efficiency of exploration and, due to the biasing procedure, new low-energy states are more likely to be explored in earlier iterations. Although we stop the process at a certain number of iteration, further tests based on the accumulated dataset and restarted from the simulation can always be performed to check the results. In the optimization stage, the 2954 configurations are first relaxed by brute-force MD for 200 ps. Then the CV values corresponding to the relaxed configurations are taken as initial guesses for the unconstrained minimization on the DNN represented FES, which is solved by the Broyden-Fletcher-Goldfarb-Shanno (known as BFGS) method fletcher1987practical , and the solutions are local minima of the FES. The configurations are further relaxed with a restrained MD simulation centered at the corresponding local minima for 100 ps at a time step of 1 fs.
The local minimum with the lowest free energy corresponds to the native conformation, which is the -helix conformation (see C004 in Fig. 7). The FES is thus shifted by the -helix free energy so that the global minimum takes the value of 0. Among the 2954 configurations, 1047 configurations that have the free energy lower than 31.67 kJ/mol are collected. These configurations are clustered into 30 clusters according to the root mean square deviation (RMSD) of s by using the agglomerative clustering method with average-linkage criterion scikit-learn , and are coded as . The largest averaged pairwise RMSD within one cluster is 0.86 Å (C003), which indicates a high conformational similarity within the clusters. The configuration with the lowest free energy in one cluster is assigned to be the representative of that cluster, and its free energy is referred to as “the free energy” of the conformation.
The native conformation (C004) and five metastable conformations with the lowest free energies are presented in Fig. 7. Their relative stability with respect to the native state and the standard deviations of the free energy predictions are also presented in the figure. The metastable conformation C000 corresponds to the -strand conformation, while the metastable conformations C008, C009, C012 and C027 are misfolded conformations. The predicted free energies of the metastable conformations are very close, thus considering the uncertainties in these free energies, we can not tell whether one metastable state is more stable than another from the current reinforced dynamics simulation.
We also computed the transition paths from the native state to the five metastable state using the string method e2002string ; e2007simplified . The strings are discretized by 224 nodes. At each node a restrained MD of length 1600 ps is performed, and the CV values are recorded every 0.01 ps to compute the mean force by Eq. (3). The free energies are then computed by using thermodynamic integration along the string (see the green lines in Fig. 8). As a comparison, the free energies predicted by the reinforced dynamics along the same paths are plotted as the red lines, the standard deviations in the free energy model predictions are presented by the red shadows. The free energy predicted by the reinforced dynamics is in satisfactory agreement with the thermodynamic integration for the transitions C004C000, C004C009 and C004C012. The computation of the transition paths C004C009 and C004C012 are easier, because the -helical segments in the conformations C009 and C012 make them closer to the native state. It is also observed that the free energy barriers in transitions C004C009 and C004C012 are lower than others. Along the paths C004C008 and C004C027, the reinforced dynamics is quite accurate near the native and the metastable states. However, in the middle section of the paths, there are clear differences from the result of the thermodynamic integration. Many factors may contribute to this: Between the native and a metastable state, there may exist multiple transition paths; The path computed by the string method may not be the most probable path; Some conformations along the path may not be well sampled by the reinforced dynamics.
V Conclusion and perspective
In summary, reinforced dynamics is a promising tool for exploring the configuration space and calculating the free energy of atomistic systems. Even though we only presented examples of bio-molecules, it should be clear that the same strategy should also be applicable to many different tasks like studying the phase diagrams of condensed systems. In particular, due to the ability of the deep neural networks in representing high dimensional functions han2017deep ; zhang2017deep ; schneider2017stochastic ; lecun2015deep , we expect the reinforced dynamics to be particularly powerful when the dimensionality of the CV space is high. In addition, one should be able to couple it with optimization algorithms in order to perform structural optimization.
Acknowledgements.We are grateful to Jiequn Han and Eric Vanden-Eijnden for their helpful comments. We also thank Luca Maragliano for sharing with us the data of alanine dipeptide from the single-sweep method. The work of L. Zhang and W. E is supported in part by ONR grant N00014-13-1-0338, DOE grants DE-SC0008626 and DE-SC0009248, and NSFC grants U1430237 and 91530322. The work of H. Wang is supported by the National Science Foundation of China under Grants 11501039 and 91530322, the National Key Research and Development Program of China under Grants 2016YFB0201200 and 2016YFB0201203, and the Science Challenge Project No. JCKY2016212A502. Part of the computational resources is provided by the Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund under Grant No.U1501501.
- (1) Shankar Kumar, John M Rosenberg, Djamal Bouzida, Robert H Swendsen, and Peter A Kollman. The weighted histogram analysis method for free-energy calculations on biomolecules. i. the method. Journal of computational chemistry, 13(8):1011–1021, 1992.
- (2) Shankar Kumar, John M Rosenberg, Djamal Bouzida, Robert H Swendsen, and Peter A Kollman. Multidimensional free-energy calculations using the weighted histogram analysis method. Journal of Computational Chemistry, 16(11):1339–1350, 1995.
- (3) Arthur F Voter. Hyperdynamics: Accelerated molecular dynamics of infrequent events. Physical Review Letters, 78(20):3908, 1997.
- (4) Yuji Sugita and Yuko Okamoto. Replica-exchange molecular dynamics method for protein folding. Chemical physics letters, 314(1):141–151, 1999.
- (5) Joost VandeVondele and Ursula Rothlisberger. Efficient multidimensional free energy calculations for ab initio molecular dynamics using classical bias potentials. The Journal of Chemical Physics, 113(12):4863–4868, 2000.
- (6) David J Earl and Michael W Deem. Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 7(23):3910–3916, 2005.
- (7) Clara D Christ and Wilfred F van Gunsteren. Multiple free energies from a single simulation: Extending enveloping distribution sampling to nonoverlapping phase-space distributions. The Journal of chemical physics, 128:174112, 2008.
- (8) Y.Q. Gao. Self-adaptive enhanced sampling in the energy and trajectory spaces: Accelerated thermodynamics and kinetic calculations. The Journal of chemical physics, 128:134111, 2008.
- (9) Alessandro Laio and Michele Parrinello. Escaping free-energy minima. Proceedings of the National Academy of Sciences, 99(20):12562–12566, 2002.
- (10) Alessandro Barducci, Giovanni Bussi, and Michele Parrinello. Well-tempered metadynamics: A smoothly converging and tunable free-energy method. Physical review letters, 100(2):020603, 2008.
- (11) Luca Maragliano and Eric Vanden-Eijnden. A temperature accelerated method for sampling free energy and determining reaction pathways in rare events simulations. Chemical physics letters, 426(1):168–175, 2006.
- (12) Luca Maragliano and Eric Vanden-Eijnden. Single-sweep methods for free energy calculations. The Journal of chemical physics, 128(18):184110, 2008.
- (13) Jerry B Abrams and Mark E Tuckerman. Efficient and direct generation of multidimensional free energy surfaces via adiabatic dynamics without coordinate transformations. The Journal of Physical Chemistry B, 112(49):15742–15757, 2008.
- (14) Tang-Qing Yu and Mark E Tuckerman. Temperature-accelerated method for exploring polymorphism in molecular crystals based on free energy. Physical review letters, 107(1):015701, 2011.
- (15) T Stecher, N Bernstein, and G Csányi. Free energy surface reconstruction from umbrella samples using gaussian process regression. Journal of chemical theory and computation, 10(9):4079, 2014.
- (16) L Mones, N Bernstein, and G Csányi. Exploration, sampling, and reconstruction of free energy surfaces with gaussian process regression. J Chem Theory Comput, 12:5100–5110, 2016.
- (17) Raimondas Galvelis and Yuji Sugita. Neural network and nearest neighbor algorithms for enhancing sampling of molecular dynamics. J. Chem. Theory Comput, 13(6):2489–2500, 2017.
- (18) Elia Schneider, Luke Dai, Robert Q Topper, Christof Drechsel-Grau, and Mark E Tuckerman. Stochastic neural network approach for learning high-dimensional free energy surfaces. Physical Review Letters, 119(15):150601, 2017.
- (19) Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998.
- (20) Giovanni Ciccotti, Raymond Kapral, and Eric Vanden-Eijnden. Blue moon sampling, vectorial reaction coordinates, and unbiased constrained dynamics. ChemPhysChem, 6(9):1809–1814, 2005.
- (21) Jequn Han, Linfeng Zhang, Roberto Car, and Weinan E. Deep potential: a general representation of a many-body potential energy surface. Communications in Computational Physics, 23(3):629–639, 2018.
- (22) Linfeng Zhang, Jiequn Han, Han Wang, Roberto Car, and Weinan E. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. arXiv preprint arXiv:1707.09571, 2017.
- (23) Yann A LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. Efficient backprop. In Neural networks: Tricks of the trade, pages 9–48. Springer, 2012.
- (24) Viktor Hornak, Robert Abel, Asim Okur, Bentley Strockbine, Adrian Roitberg, and Carlos Simmerling. Comparison of multiple amber force fields and development of improved protein backbone parameters. Proteins: Structure, Function, and Bioinformatics, 65(3):712–725, 2006.
- (25) William L Jorgensen, Jayaraman Chandrasekhar, Jeffry D Madura, Roger W Impey, and Michael L Klein. Comparison of simple potential functions for simulating liquid water. The Journal of chemical physics, 79(2):926–935, 1983.
- (26) Mark James Abraham, Teemu Murtola, Roland Schulz, Szilárd Páll, Jeremy C Smith, Berk Hess, and Erik Lindahl. Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX, 1:19–25, 2015.
- (27) U. Essmann, L. Perera, M.L. Berkowitz, T. Darden, H. Lee, and L.G. Pedersen. A smooth particle mesh ewald method. The Journal of Chemical Physics, 103(19):8577, 1995.
- (28) G. Bussi, D. Donadio, and M. Parrinello. Canonical sampling through velocity rescaling. The Journal of chemical physics, 126:014101, 2007.
- (29) M. Lingenheil, R. Denschlag, R. Reichold, and P. Tavan. The ”hot-solvent/cold-solute” problem revisited. Journal of Chemical Theory and Computation, 4(8):1293–1306, 2008.
- (30) M. Parrinello and A. Rahman. Polymorphic transitions in single crystals: A new molecular dynamics method. Journal of Applied Physics, 52:7182, 1981.
- (31) B. Hess, H. Bekker, H.J.C. Berendsen, and J.G.E.M. Fraaije. Lincs: a linear constraint solver for molecular simulations. Journal of Computational Chemistry, 18(12):1463–1472, 1997.
- (32) S. Miyamoto and P.A. Kollman. Settle: an analytical version of the shake and rattle algorithm for rigid water models. Journal of Computational Chemistry, 13(8):952–962, 2004.
- (33) Gareth A Tribello, Massimiliano Bonomi, Davide Branduardi, Carlo Camilloni, and Giovanni Bussi. Plumed 2: New feathers for an old bird. Computer Physics Communications, 185(2):604–613, 2014.
- (34) Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In OSDI, volume 16, pages 265–283, 2016.
- (35) Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- (36) Benjamin Trendelkamp-Schroer and Frank Noe. Efficient estimation of rare-event kinetics. Phys. Rev. X, 6:011009, 2016.
- (37) Peter A Kollman. Advances and continuing challenges in achieving realistic and predictive simulations of the properties of organic and biological molecules. Accounts of Chemical Research, 29(10):461–469, 1996.
- (38) R Fletcher. Practical methods of optimization. 1987.
- (39) F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- (40) Weinan E, Weiqing Ren, and Eric Vanden-Eijnden. String method for the study of rare events. Physical Review B, 66(5):052301, 2002.
- (41) Weinan E, Weiqing Ren, and Eric Vanden-Eijnden. Simplified and improved string method for computing the minimum energy paths in barrier-crossing events. The Journal of Chemical Physics, 126(16):164103, 2007.
- (42) Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.