Accurate Machine Learned Quantum-Mechanical Force Fields for Biomolecular Simulations

Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes. Accurate MD simulations require computationally demanding quantum-mechanical calculations, being practically limited to short timescales and few atoms. For larger systems, efficient, but much less reliable empirical force fields are used. Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations, offering similar accuracy as ab initio methods at orders-of-magnitude speedup. Until now, MLFFs mainly capture short-range interactions in small molecules or periodic materials, due to the increased complexity of constructing models and obtaining reliable reference data for large molecules, where long-ranged many-body effects become important. This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations (GEMS) by training on "bottom-up" and "top-down" molecular fragments of varying size, from which the relevant physicochemical interactions can be learned. GEMS is applied to study the dynamics of alanine-based peptides and the 46-residue protein crambin in aqueous solution, allowing nanosecond-scale MD simulations of >25k atoms at essentially ab initio quality. Our findings suggest that structural motifs in peptides and proteins are more flexible than previously thought, indicating that simulations at ab initio accuracy might be necessary to understand dynamic biomolecular processes such as protein (mis)folding, drug-protein binding, or allosteric regulation.


page 3

page 8

page 24

page 25


BIGDML: Towards Exact Machine Learning Force Fields for Materials

Machine-learning force fields (MLFF) should be accurate, computationally...

Symmetry-adapted graph neural networks for constructing molecular dynamics force fields

Molecular dynamics is a powerful simulation tool to explore material pro...

Protofold II: Enhanced Model and Implementation for Kinetostatic Protein Folding

A reliable prediction of 3D protein structures from sequence data remain...

Representations of molecules and materials for interpolation of quantum-mechanical simulations via machine learning

Computational study of molecules and materials from first principles is ...

An Extendible, Graph-Neural-Network-Based Approach for Accurate Force Field Development of Large Flexible Organic Molecules

An accurate force field is the key to the success of all molecular mecha...

SE(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials

This work presents Neural Equivariant Interatomic Potentials (NequIP), a...

Accelerated Simulations of Molecular Systems through Learning of their Effective Dynamics

Simulations are vital for understanding and predicting the evolution of ...


Figure 1: Insights from GEMS simulations. (A) Overview of the GEMS method. Different interaction scales on the potential-energy surface (PES) of a large system are learned from a combination of ab initio reference data for top-down and bottom-up fragments. The resulting model is able to accurately reconstruct the PES of the molecular system and then used to study its dynamics. (B) Prediction accuracy for energies and forces of AceAlaNme conformations of GEMS compared to AmberFF [Lindorff-Larsen2010AMBER99SB-ILDN] with respect to the PBE0/def2-TZVPP+MBD [adamo1999toward, weigend2005balanced, tkatchenko2012accurate, ambrosetti2014] reference. (C) GEMS simulations show that the folding of AceAlaNme from a fully extended structure (left) to a helical conformation (right) occurs via intermediate conformations characterized by hydrogen bonding between backbone atoms of adjacent residues (middle). (D

) Overlay of representative conformations (obtained from cluster analysis) sampled during an aggregated 10 ns of dynamics of crambin in aqueous solution. Simulations with GEMS (blue) lead to greater structural fluctuations compared to AmberFF (red), indicating that the protein is more flexible.

Molecular dynamics (MD) simulations allow to determine the motion of individual atoms in chemical and biological processes, enabling mechanistic insights into molecular properties and functions, as well as providing a detailed interpretation of experimental studies. MD simulations require a reliable model of the forces acting on each atom at every time step of the dynamics [karplus2002molecular]. It is most desirable to obtain atomic forces from accurate solutions to the many-body Schrödinger equation [schrodinger1926undulatory], but this is only feasible for short MD simulations of few atoms for the foreseeable future [mouvet2022recent].

For larger systems, it is common practice to derive the forces from empirical models of the potential energy. Such force fields (FFs) approximate chemical interactions with computationally efficient terms and enable MD simulations of millions of atoms [schulz2009scaling] for up to several milliseconds of dynamics [shaw2008anton]. A disadvantage of FFs is their limited accuracy due to the neglect of important quantum-mechanical effects, such as changes to hybridization states, interactions between orbitals delocalized over several atoms, or electronic correlations between distant molecular fragments. Further, many FFs require a predetermined covalent bonding structure, preventing bond breaking and formation. When additional accuracy and flexibility is required, for example to study an enzymatic reaction, a possible alternative are quantum mechanics/molecular mechanics (QM/MM) simulations [senn2009qm, mouvet2022recent]: The system is divided into a small QM region modelled with ab initio methods (e.g. substrate and active site of an enzyme) and an MM region (e.g. the remaining protein and solvent molecules) described with an FF. However, the high computational cost associated with an accurate treatment of the QM region and the fact that it is often unclear which atoms need to be included for an adequate description of the process of interest [kulik2016large] may limit the applicability of QM/MM methods.

In recent years, machine learned force fields (MLFFs) have emerged as an alternative means to execute MD simulations, combining the computational efficiency of traditional FFs with the high accuracy of quantum-chemistry methods [unke2021machine]. To construct an MLFF, a machine learning (ML) model is trained on ab initio reference data to predict energies and forces from atomic positions – without the need to explicitly solve the Schrödinger equation outside of the reference data. MLFFs have led to numerous insights, e.g. regarding reaction mechanisms [rivero2019reactive], or the importance of quantum-mechanical effects for the dynamics of molecules [sauceda2019molecular]. Despite these successes, until now, MLFFs have been applied primarily to MD simulations of small to medium-sized systems (tens of atoms), or periodic materials (e.g. metallic copper) [jia2020pushing]. Applications to large heterogeneous systems, like proteins or other biologically relevant systems, have remained elusive, due to the increased complexity of constructing physically-informed ML architectures and obtaining reliable reference data for long-range interactions, which are known to play a key role in biomolecular dynamics  [rossi2015stability, stoehr2019quantum].

This work proposes a general approach to constructing accurate machine-learned force fields for large-scale molecular simulations (GEMS). Based on the divide-and-conquer principle, MLFFs for large heterogeneous systems are trained on molecular fragments of varying size, which cover the relevant chemical interactions, but are still amenable to electronic-structure calculations. From this fragment data, the model infers to recompose the original system, which allows GEMS to successfully address the long-standing challenge of biomolecular simulations at ab initio quality (Fig. 1A). While MLFFs can successfully learn local chemical interactions from small molecules [huang2020quantum], a sufficient number of larger fragments is needed to learn long-range effects necessary to generalize to larger systems and achieve high prediction accuracy ( meV/atom for energies and  meV/Å for forces). In this work, we rely on the recently proposed SpookyNet architecture [unke2021spookynet], which models dispersion and electrostatics explicitly by embedding physically motivated interaction terms into the ML architecture and learning their parameters from reference data. In addition, an empirical term for short-ranged repulsion between atomic nuclei increases the robustness of the model for strong bond distortions. SpookyNet also includes a mechanism to describe effects like non-local charge transfer, which other MLFFs (with some exceptions [ko2021fourth]) are typically unable to. Taken together, these components enable the model to generalize to larger molecules when trained on appropriate reference data. However, ultimately, the quality and reliability of an MLFF should be judged by its predictions of experimental measurements – for example, we show that GEMS is able to quantitatively reproduce experimental results regarding the helix stability of polyalanine systems at different temperatures and correctly describe infrared peak positions of water for a solvated protein.

GEMS is applied to MD simulations of model peptides and the 46-residue protein crambin in aqueous solution (25k atoms). When comparing to conventional force fields, such as AMBER99SB-ILDN [Lindorff-Larsen2010AMBER99SB-ILDN] (AmberFF), GEMS approximates energies and forces computed from density-functional theory much more closely (Fig. 1B). Interestingly, our findings reveal previously unknown intermediates in the folding pathway of poly-alanine peptides (Fig. 1C) and a dynamical equilibrium between - and -helices. In the simulations of solvated crambin, GEMS indicates that protein motions are qualitatively different and more flexible when compared to computations with a conventional FF (Fig. 1D), showing contrasting short and long timescale dynamics. These results suggest that simulations at ab initio accuracy may be necessary to fully understand dynamic processes like protein (mis)folding, drug–protein binding, or allosteric regulation.


Machine learning force fields for large systems

Figure 2: Generation of top-down and bottom-up fragments. (A) Top-down fragments are generated by cutting out a spherical region around an atom (including solvent molecules) and saturating all dangling bonds (the right side shows four top-down fragments generated from different regions). They are crucial for learning weak but long-ranged interactions, which are important for the dynamics of large systems. (B) Bottom-up fragments are generated by constructing chemical graphs consisting of one to eight non-hydrogen atoms (not all possible graphs are shown). The graphs are then converted to three-dimensional structures by adding hydrogen atoms. Due to their small size, multiple ab initio calculations for many different conformers of each generated structure can be performed, allowing extensive sampling of the potential energy surface which is necessary for training robust models.

We start by generating reference data for smaller molecular fragments in order to train an MLFF, where the learned model accurately reflects the full large system. There are several strategies to achieve this goal. On one hand, the model needs to be able to learn all relevant chemical interactions that are necessary to reconstruct a complete and accurate picture of the system of interest from the fragment data. This is important to capture weak, but long-ranged interactions, which collectively dominate e.g. relative energy differences of different conformations of large molecules. On the other hand, it is necessary to prevent “holes” in the potential energy surface (PES) [behler2014representing] – regions with low potential energy corresponding to unphysical structures, e.g. featuring unnaturally large or short bond lengths. The existence of holes in the PES prevents stable MD simulations, because long trajectories eventually may become trapped by such artefacts and behave unphysically [stocker2022robust]. To achieve both requirements, we propose the use of two complementary methods to construct fragments, which allow models to learn different aspects of the PES of large systems. The first method follows a top-down approach, where fragments are constructed by “cutting out” spherical regions of the system of interest, which also includes solvent molecules in the condensed phase (Fig. 2A) [gastegger2017machine]. They are chosen as large as possible to sample important long-range effects, but still small enough such that reference energies and forces computed with quantum chemistry methods are accessible in a reasonable time. As our tests on poly-alanine systems demonstrate (see below), the top-down fragments we choose are sufficiently large for the systems studied in this work.

Since it is difficult to collect enough reference data for the large top-down fragments to train robust models, they are enriched by smaller fragments, for which data points for many different conformations can be collected. Starting from single atoms, molecules similar to local bonding patterns of the system of interest [huang2020quantum] are systematically constructed by growing chemical graphs in a bottom-up fashion (Fig. 2B). By limiting the size of these fragments, it is possible to sample many different conformations, allowing models to learn the effects of strong distortions in local structural patterns, which is key to preventing holes in the PES. As a result, the combination of bottom-up and top-down fragments enables learning accurate and robust MLFFs for large systems.

Poly-alanine systems

Figure 3: Accurate simulations of poly-alanine systems with GEMS. (A) Relative stabilization of the -helical conformation of AceAlaLys + H per added alanine residue. Shown here is the double difference , where is the relative energy of the -helical conformation and the fully extended structure (FES) of AceAlaLys + H in gas phase. The prediction of GEMS (blue) is compared to ab initio reference data computed at the PBE0-MBD [adamo1999toward, weigend2005balanced, tkatchenko2012accurate, ambrosetti2014] level of theory. (B) Number of - and -helical H-bonds during MD simulations of helical AceAlaLys + H in gas phase at 700 K and 800 K with GEMS. The sharp drop in the number of H-bonds in the dynamics at 800 K indicates the formation of a random coil (see Fig. LABEL:Sfig:hbonds_mlffA for an extended version of this figure with a greater range of temperatures). (C

) Secondary structural motifs determined by STRIDE

[frishman1995knowledge] along typical folding trajectories of AceAla

Nme. Dotted vertical lines indicate the temporal position of the shown snapshots. The trajectory computed with GEMS (top) folds via a distinct “wavy” intermediate (classified primarily as “turn”) and settles into a dynamic equilibrium between

- and -helices. In contrast, the trajectory computed with the AmberFF (bottom) folds more directly and then stays primarily -helical (see Fig. LABEL:Sfig:acala15nme_additional_folding_trajectories for an analysis of additional trajectories).

We apply GEMS to predict the properties and dynamics of several peptides consisting primarily of alanine. These are popular model systems for proteins and well studied both theoretically and experimentally. In addition, by limiting the number of residues, it is still possible to perform electronic-structure calculations for the full system. Thus, the predictions of an ML model trained only on fragment data can be directly compared to reference calculations, which allows to verify the ability of GEMS to reconstruct the properties of larger systems from the chemical knowledge extracted from smaller molecules.

As a first test case, we consider the cooperativity between hydrogen bonds in poly-alanine peptides capped with an N-terminal acetyl group and a protonated lysine residue at the C-terminus (AceAlaLys + H). In

-helices, the local dipole moments of hydrogen bonds formed between backbone peptide groups are aligned, leading to a cooperative polarization effect

[park2000stabilization]. Thus, the relative stabilization energy of an -helix compared to a fully extended structure (FES) fluctuates non-trivially with helix length and is a challenging prediction task. We find that GEMS closely agrees with the reference ab initio method, demonstrating that large-scale effects can be learned effectively from fragment data (Fig. 3A).

Alanine-based peptides have a strong tendency to form helical structures. While short isolated helices are only marginally stable in solution, AceAlaLys + H is known to form stable helices in gas phase. Experimental results suggest that AceAlaLys + H remains helical up to temperatures of 725 K [kohtani2004extreme], allowing a direct comparison with theoretical predictions. By running GEMS simulations at different temperatures, we confirm that the peptide remains primarily helical up to 700 K, but forms a random coil at 800 K (see supplementary video 1 at An analysis of the formed hydrogen bonds reveals that the average number of -helical hydrogen bonds decreases with increasing temperature (see Fig. LABEL:Sfig:hbonds_mlffA), while the number of -helical hydrogen bonds remains almost constant until a sudden drop at 800 K (see Fig. 3B), which agrees with results from ab initio simulations [tkatchenko2011unraveling]. Interestingly, the long-ranged interactions learned from top-down fragments seem to be crucial to reproduce the experimental results, as a model that was only trained on bottom-up fragments predicts reduced thermal stability (see Fig. LABEL:Sfig:hbonds_mlffB).

To investigate whether there are fundamental differences between GEMS and dynamics simulations performed with conventional FFs, we study the room temperature (300 K) folding process of a pure poly-alanine peptide capped with an N-terminal acetyl group and a C-terminal N-methyl amide group (AceAlaNme) in gas phase. Starting from the FES, MD simulations with GEMS suggest that AceAlaNme has a strong tendency to form H-bonds between peptide groups of directly adjacent residues within the first 100 ps of dynamics. The formed arrangements exhibit a “wavy” structure and  and  backbone dihedral angles of  and , which lie in a sparsely populated region of the Ramachandran plot. These intermediates are typically short-lived with lifetimes of 25–50 ps, and fold readily into helical configurations via a characteristic twisting motion. There is still some controversy between theoretical and experimental results regarding the predominance of different helical conformations [topol2001alpha]. We find that there may be cases where no single motif is preferred: Once a helix is formed, its structure fluctuates between pure - and -helices, as well as hybrids of both helix types (see supplementary  video 2 at for a complete folding trajectory). A 10 ns trajectory of the helical state suggests a dynamical equilibrium with a 38%/62% mixture of - and -helices. In contrast, MD simulations with a conventional FF yield qualitatively different results, suggesting that a more rigid and primarily -helical configuration is formed from the FES without distinct structural intermediates (see Fig. 3C).

As a final test for the accuracy of GEMS, we compare the predictions of the ML model to ab initio data computed at the at the PBE0/def2-TZVPP+MBD [adamo1999toward, weigend2005balanced, tkatchenko2012accurate, ambrosetti2014] level of theory. To this end, we use 1554 and 1000 AceAlaNme structures sampled from densely and sparsely populated regions (rare events) of the configurational space visited in 100 aggregated 250 ps MD trajectories (25 ns total) in the NVT ensemble at 300 K simulated with GEMS (see Section LABEL:Ssec:AlanineClusters for details). We find that predicted energies and forces are in good agreement with the reference values in both cases. For energies and forces, correlation coefficients are and , respectively, and mean absolute errors (MAEs) are  meV/atom and  meV/Å. Again, we find that the inclusion of top-down fragments during training is crucial for high accuracy, as prediction errors for a model trained only on bottom-up fragments are much larger (see Fig. LABEL:Sfig:correlations_ml_no_topdown). In comparison, even though predictions with the conventional AmberFF [Lindorff-Larsen2010AMBER99SB-ILDN] are remarkably accurate with correlation coefficients of (for energy) and (for forces), the MAEs are much larger at  meV/atom and

 meV/Å (distributions of predicted and reference energy values were shifted to have a mean of zero prior to computing MAEs in both cases, such that constant energy offsets between different methods do not influence the results). As a general trend, we observe that predictions with GEMS reproduce the reference across the whole range of values without the presence of a single outlier, whereas the AmberFF systematically under- and over-predicts small and large energy values, respectively (see Fig. 

1B). These findings show that GEMS gives accurate predictions even for rare configurations and the simulated MD trajectories are essentially ab initio quality (see Figs. LABEL:Sfig:correlations_ml and LABEL:Sfig:correlations_ff for a more detailed analysis of correlations within the different subsets of configurations).


Figure 4: Analysis of dynamics simulations of crambin in aqueous solution. (A) Power spectrum of crambin in water obtained from 125 ps of dynamics computed with the AmberFF and GEMS. In the GEMS spectrum, peaks associated with bending and stretching vibrations of water molecules are closer to experimentally expected values at around 1600 cm–1 and 3500 cm–1 [ir_reference] (vertical black lines). (B) Ramachandran map for crambin (color-coded by residue number). The scatter shows the (, )-dihedral angles sampled during an aggregated 10 ns of dynamics, points with black outline show values of the crystal structure [bang2009role] for reference. Dynamics with GEMS (right) generally sample a broader distribution compared to AmberFF (left), indicating that the protein is more flexible. (C) Two-dimensional UMAP [mcinnes2018umap] projection of the path through conformational space sampled during a 5 ns trajectory of crambin in aqueous solution. Compared to the trajectory computed with the AmberFF, dynamics with GEMS sample a wider distribution and are less likely to revisit previously visited regions of conformational space. (D) Distribution of root mean square deviations (RMSDs, excluding hydrogen atoms) between conformations sampled at times and . Solid lines depict the mean, whereas the shaded region indicates the area between the 25th and 75th percentiles. Dynamics with the AmberFF (red) show larger structural fluctuations on short time scales, whereas fluctuations on longer timescales are larger for dynamics computed with GEMS (blue).

GEMS enables accurate molecular simulations in the condensed phase. The 46-residue protein crambin in aqueous solution (25257 atoms) is chosen as a model system. Crambin contains 15 out of the 20 natural amino acids and forms common structural motifs such as -sheets, -helices, turns/loops, and disulfide bridges. To assess qualitative differences between simulations with a conventional FF (here, the AmberFF is chosen) and GEMS, we consider the power spectrum [thomas2013computing] computed from 125 ps of dynamics at a temporal resolution of 2.5 fs (Fig. 4A). The power spectrum is related to the internal motions of the system and reveals the dominant frequencies of molecular vibrations, which are influenced by the atomic structure and characteristic for the presence of certain functional groups. In comparison to the results obtained from the dynamics with a conventional FF, peaks in the power spectrum computed with GEMS are shifted towards lower wavenumbers and lie close to the frequency ranges expected from measured infrared spectra. For example, the dominant peaks above 1000 cm–1 correspond to bending and stretching vibrations of water molecules, which are experimentally expected at around 1600 cm–1 and 3500 cm–1, respectively [ir_reference], which is consistent with the GEMS spectrum. In contrast, the corresponding peaks for the conventional FF are blue-shifted several hundreds of wavenumbers. Additionally, peaks in the GEMS spectrum are broader, indicating that the frequencies of characteristic vibrations are influenced stronger by intermolecular interactions, hence broadening their frequency range. Long-ranged interactions may particularly influence slow proteins motions, i.e. the low-frequency parts of the power spectrum, where notable differences between GEMS and AmberFF can be observed.

Similar to the results for AceAlaNme, we find that in comparison to simulations with the AmberFF, crambin is more flexible in GEMS simulations (Fig. 1D). Although a straightforward quantitative comparison is not possible, qualitatively, the increased flexibility agrees more closely to structures modelled from nuclear magnetic resonance (NMR) spectroscopy measurements (see Fig. LABEL:Sfig:nmr_comparison). The increased flexibility is also indicated by a Ramachandran map of the backbone dihedral angles of crambin (Fig. 4B), which shows that a wider range of values is sampled in simulations with GEMS. This becomes even more apparent by projecting the trajectories into a low-dimensional space that allows a direct visualization of the path taken through conformational space (Fig. 4C). However, a time-resolved analysis of the trajectories reveals that structural fluctuations with GEMS are only larger on timescales in excess of 200 ps (Fig. 4D). On shorter time scales on the other hand, the trend is reversed. This suggests that there are qualitative differences between simulations with conventional FFs and GEMS on all timescales and simulations with ab initio accuracy might be crucial to fully understand protein dynamics.

Discussion and conclusion

Figure 5: Towards accurate quantum-mechanical protein–protein interactions: Gas-phase binding curves of the ACE2 (blue) and the receptor binding domain (RBD) of the SARS-CoV spike protein (red). The ACE2 and RBD proteins are displaced along the line connecting their centers of mass relative to their equilibrium position in solution (computed with the AmberFF), keeping their internal structure fixed. Different binding motifs (taken from Ref.  delgado2021molecular) are distinguished (values in brackets are the maximum well-depth for the corresponding motif). All energy values are referenced with respect to infinite separation of ACE2 and RBD. The displayed value gives the difference in well-depth (averaged over all binding motifs) between SARS-CoV-2 and SARS-CoV-1. The between AmberFF (B) and GEMS (A) is  eV.

Modeling quantum-mechanical properties of large molecules is an outstanding challenge and it holds promise for broad application in chemistry, biology, pharmacology, and medicine. We have developed a general framework for constructing MLFFs – GEMS – for large molecular systems such as proteins by learning from ab initio reference data of small(er) fragments without the need to perform electronic-structure calculations for a whole protein – as the latter would constitute a computationally impractical task. The proposed divide-and-conquer strategy employing a library of 3 million DFT+MBD computations on fragments and using a machine learning model that incorporates physical constraints and long-range interactions allows to efficiently construct MLFFs that accurately reflect the quantum-mechanical energies and atomic forces in large molecules. An interesting insight of our ab initio accurate simulations is that proteins are significantly more flexible than previously thought. These molecular fluctuations and associated low-frequency vibrations are expected to strongly contribute to dynamical processes such as protein folding, drug–protein binding, and allosteric regulation [levy1982molecular].

While our work focuses exclusively on the study of peptides and proteins, the proposed framework can be applied to any atomic system too large to study with ab initio methods. We find that even small poly-alanine peptides display qualitatively different dynamics when simulated with GEMS in comparison to dynamics with conventional FFs. For example, GEMS simulations suggest that the folding of AceAlaNme from the FES to a helical conformation occurs via short-lived intermediates characterized by hydrogen bonding between peptide groups of adjacent residues. Once a helix is formed, its structure fluctuates between - and -helices in a dynamical equilibrium. This is in stark contrast to simulations with a conventional FF, where the peptide forms a rigid -helix without visiting a common intermediate. These results are reminiscent of the first MD study of a protein [mccammon1977dynamics], which showed that proteins are less rigid than previously thought [phillips1981biomolecular]. The current findings, already alluded to above, indicate that proteins might be even more flexible, and our simulations of crambin suggest that the general trend observed for peptides in gas phase also holds for proteins in solution. In particular, crambin samples a larger conformational space in GEMS simulations and its backbone dihedral angles have broader distributions. Interestingly however, structural fluctuations on short time scales are reduced in comparison to simulations with a conventional FF. These observations show that there are qualitative differences in the dynamics of proteins when they are simulated with ab initio quantum-mechanical accuracy. We conjecture that these variations in the dynamics might be crucial for the understanding of effects like allostery, or processes like enzyme catalysis and protein (mis)folding. For example, a description of the potential energy at ab initio quality might be necessary to gain deeper insights into Levinthal’s paradox [levinthal1968there]. A promising avenue for future work is to extend GEMS to larger systems and longer timescales, for example by distributing GEMS simulations over multiple accelerators, which requires non-trivial modifications to the way the MLFF is evaluated. Other possible extensions to GEMS include incorporating nuclear quantum effects, which were demonstrated to significantly change the dynamics of small molecules [sauceda2021dynamical]. It is likely that similar effects can be observed for larger systems.

Let us discuss some limits of MLFFs when compared to classical MD simulations. Although MLFFs are orders of magnitude more computationally efficient than ab initio calculations, their computational efficiency is lower than that of conventional FFs (as to be expected). For example, simulating a single timestep of NPT dynamics of crambin in aqueous solution on an NVIDIA A100 GPU with GEMS takes roughly 500 ms, whereas GROMACS [hess2008gromacs] only requires 2 ms for a single time step with a conventional FF on similar hardware. Consequently, at this moment, GEMS simulations are limited to shorter time scales. In addition, evaluating MLFFs usually requires increased memory compared to conventional FFs, limiting the maximum system size that can be simulated with GEMS. Nonetheless, GEMS allows to simulate several nanoseconds of dynamics for systems consisting of thousands of atoms with ab initio accuracy (it may be possible to combine GEMS with other ML methods to achieve further speed-ups, e.g. by allowing larger time steps during the dynamics [winkler2022super]). Furthermore, GEMS like every other MLFF may lead to unphysical dynamics, if not properly trained (see e.g. Ref.  stocker2022robust for a discussion of such phenomena). As a rule, MLFF simulations should therefore always be subjected to more scrutiny than results from mechanistic FFs. In particular, the resulting trajectories need to be carefully checked for unphysical bond breaking or formation, or otherwise unphysically distorted conformations, which are prevented in traditional FFs by construction. Nevertheless it should be emphasized again that compared to simulations with a conventional FF, GEMS offers highly improved accuracy as well as enables to study chemical transformations such as the making and breaking of chemical bonds and proton transfer processes.

Another advantage of using accurate MLFFs is the availability of arbitrary derivatives – including the potential to obtain alchemical derivatives [saravanan2017alchemical, sheppard2010alchemical]. This may enable the optimization of accessible observables, such as docking/binding energies, with respect to local (nearly isosteric) mutations. In a more conventional approach, MLFFs can be used to describe the effects of mutations via thermodynamic integration as regularly performed with classical FFs [blas2002theoretical, beierlein2011predicting, krepl2013effect]. Given the incorporation of non-locality in the present methodology, such analyses could naturally account for longer-range phenomena like (static) allosteric effects and the inherent non-additivity of interactions known to be relevant for the free energy of binding or stability  [dicera1998sitespecific]. In a similar vein, this may allow to perform sensitivity analyses or engage explainability methods (see, e.g., Ref.  samek2021explaining) to identify allosteric hotspots and networks, which have also been speculated to play an important role for the evolutionary aspects of proteins and the biomolecular machinery [lockless1999evolutionarily, raman2016allostery]. Again, we would like to stress that such studies and the above approaches are not limited to biomolecular systems. They may equally well be applied in materials design; for example, studying and optimizing point defects in solid state systems as relevant to the design of quantum materials.

Finally, we would like to highlight a promising application of GEMS to modeling protein-protein interactions. Fig. 5 shows the binding curves of the angiotensin converting enzyme 2 (ACE2) and the receptor binding domain (RBD) of the spike protein of SARS-CoV-1 and SARS-CoV-2 variants using either AmberFF or GEMS (in gas phase). Here, as expected from experimental evidence  [wrapp2020cryo] we observe a stronger binding of the SARS-CoV-2 both for the classical FF as well as GEMS. Interestingly, however, GEMS yields a substantially stronger binding by –1.1 eV. While the obtained binding energies do not account for solvation or entropic effects and cannot be directly compared to experimental binding affinities, this application provides evidence of the importance of ab initio accuracy when studying interactions between complex biological systems. We therefore would like to stress the high promise of GEMS for enabling quantum-mechanical insight in broad application domains across enzyme and protein chemistry or heterogeneous materials. Although top-down fragments in this work are system-specific, in the future, they may be generated to cover a wider range of systems and enable GEMS simulations with a chemically transferable and size-extensive “universal” machine-learned force field.

Materials and methods

Construction of fragment data

Bottom-up fragments

The construction of bottom-up fragments follows an approach similar in spirit to the one described by Huang et al. [huang2020quantum]. Ignoring hydrogen atoms, increasingly large chemical graphs with the same local bonding patterns as the system of interest are constructed until a maximum number of heavy atoms is reached. This is achieved by starting from graphs consisting of a single heavy atom. Larger graphs are constructed by successively adding additional heavy atoms and pruning graphs which do not appear as substructures in the original system. Once the graphs are constructed, they are converted to bottom-up fragments by saturating all valencies with hydrogen atoms and generating the corresponding three-dimensional molecular structure (e.g. using Open Babel [o2011open]). For all structures, multiple conformers are sampled using either MD simulations at high temperatures or normal mode sampling [smith2017ani]. Here, we use the bottom-up fragments for proteins generated in earlier work  [unke2019solvated] (see Ref.  unke2019physnet for details). Since all similar local structures are covered by the same graphs, the bottom-up fragments optimally exploit any structural redundancies, often resulting in a surprisingly small number of fragments. For example, just 2 307 chemical graphs (with a maximum of eight heavy atoms) are sufficient to cover all local bonding patterns appearing in proteins consisting of the twenty natural amino acids, even when considering different protonation states and the possibility of disulfide bridges [unke2019physnet].

Top-down fragments

Starting from an MD snapshot of the system of interest (sampled from conventional MD simulations, see below and Section LABEL:Ssec:input_for_top_down_fragments for more details), all atoms outside a spherical region around a central atom are deleted. The cutoff radius for selecting the spherical region should be chosen as large as possible, but still resulting in fragment sizes for which reference calculations are feasible (in this work, we choose 8 Å). Then, any resulting dangling single bonds on heavy atoms are saturated with hydrogen atoms. Valencies situated on hydrogen atoms or corresponding to double bonds are eliminated by including the bonded atom in the original system (outside the cutoff). This process is repeated until all valencies are saturated [gastegger2017machine]. By choosing different central atoms, several (partially overlapping) top-down fragments can be constructed from a single configuration of the original system.

Density Functional Theory calculations

Density Functional Theory (DFT) reference calculations were performed using the Psi4 software package [parrish2017psi4, smith2018psi4numpy] at the PBE0/def2-TZVPP+MBD [adamo1999toward, weigend2005balanced, tkatchenko2012accurate, ambrosetti2014] level of theory on Google Cloud Platform (GCP). This level of theory has been found to be well-suited for modelling interactions of proteins in water [distasio2014individual, stoehr2019quantum, schubert2015exploring]. Each fragment was run on an independent Docker container within a cloud compute engine virtual machine. We mostly used n2d-higmem-4 and n2-highmem-4 virtual machine instances with 4 cores, 32 GB RAM and 768 GB of disk space each, with some larger fragments being manually relaunched on higher-memory machines if they crashed with out-of-memory errors. Execution was parallelized on up to 20 000 CPU cores. Calculations where shut down if they did not complete within 21 days, which was the case for a few outliers, but median execution time per fragment was hours.

Training the MLFF

All MLFFs in this work use the recently proposed SpookyNet architecture (see Ref.  unke2021spookynet for details). We use three different trained ML models in this work, one for the simulations of all poly-alanine systems, one for the simulation of crambin in aqueous solution, and one for the gas-phase ACE2/SARS CoV-1/2 RBD binding curves shown in Fig. 5

(CoV model). The poly-alanine and CoV models use the recommended architectural hyperparameters of

interaction modules and features [unke2021spookynet]. Due to hardware limitations when performing MD simulations for thousands of atoms, the crambin model uses interaction modules and features to reduce memory requirements. All models use a short-range cutoff of ( Å). The crambin and CoV models additionally employ a long-range cutoff of ( Å) for the computation of the analytical electrostatic and dispersion correction terms included in the SpookyNet energy prediction (to achieve sub-quadratic scaling with respect to the number of atoms). We follow the training protocol described in Ref.  unke2021spookynet

for fitting the parameters to reference energies, forces, and dipole moments, however, the mean squared loss function was replaced by the adaptive robust loss described in Ref. 

barron2019general. All models were trained on NVIDIA V100 GPUs on GCP using the same 2 713 986 bottom-up fragments, and 45 948 (for the poly-alanine model), 5 624 (for the crambin model), or 129 942 (for the CoV model) top-down fragments. During training, structures were randomly drawn in equal amounts from bottom-up and top-down fragments, i.e. top-down fragments were oversampled to mitigate the imbalance in the numbers of bottom-up/top-down fragments.

MD simulations

Conventional FF

All classical MD simulations have been performed with the GROMACS 2020.3 software package using NVIDIA V100 or A100 GPUs in a Kubernetes system on GCP. Throughout this work, we have employed the AMBER99SB-ILDN force field [Lindorff-Larsen2010AMBER99SB-ILDN] for the conventional MD simulations. Standard amino acid definitions have been adapted to accommodate charged Lys + H termini in accordance with the AMBER99SB-ILDN parametrizations where needed. In the MD simulations of ACE2/SARS CoV-2 RBD, the binding of the Zn-cofactor in ACE2 has been described via harmonic restraints to the experimentally-determined ligands in order to avoid potential shortcomings in the description of the metal–ligand interaction. All solvated systems presented in this manuscript or used for sampling representative structures for generating top-down fragments were initially resolvated, optimized to a maximum atomic force of 1000 kJ/mol/nm, and equilibrated according to the protocol detailed in section LABEL:Ssec:md_simulations

. Simulations for studying non-equilibrium processes (i.e. the gas phase folding/unfolding of poly-alanine systems) have been started directly from optimized structures with velocities drawn from a Maxwell-Boltzmann distribution at twice the simulation temperature (such that the average kinetic energy during the simulation corresponds to the desired temperature

[Konermann2018pseudogp]). The gasphase simulations have thereby been realized in a pseudo-gasphase setting as proposed and validated in Ref.  Konermann2018pseudogp. All constant temperature MD simulations have been performed using temperature coupling via stochastic velocity rescaling  [Bussi2007vrescale] and a Parrinello-Rahman barostat  [Parrinello1981barostat] has been used for NPT simulations. To speed up computations, standard MD simulations involved the commonly employed constraint of bonds involving hydrogen, while the power spectra reported in this work have been obtained from fully unconstrained simulations. The starting structures of poly-alanine systems have been generated with the Avogadro software [hanwell2012avogadro] and the initial structure of crambin has been taken from PDB entry 2FD7 [bang2009role], where the incorrectly described residues (SER11 and VAL15) have been remodeled using PyMOL [delano2002pymol]. Our simulations of the ACE2/SARS CoV-1/2 RBD complex have been initiated from a set of representative conformations as identified in Ref.  delgado2021molecular or pointwise mutations thereof. Currently available experimental results on the mutations present in the -, -, -, and -variants of SARS CoV-2 do not indicate considerable structural changes to the spike RBD. After partial relaxation, simple pointwise mutations of the structural representatives obtained for the -variant can thus be assumed to represent viable starting points for MD simulations of the different variants.


All MD simulations with the GEMS method were performed using the SchNetPack [schuett2018schnetpack] MD toolbox with a timestep of 0.5 fs and without any bond constraints. Simulations for poly-alanine systems were performed on NVIDIA V100 GPUs on GCP, whereas crambin simulations were performed on NVIDIA A100 GPUs with 80GB. To mimic experimental conditions [kohtani2004extreme], the simulations of AceAlaLys + H helix stability were performed in the NVE ensemble starting from an optimized structure with initial velocities drawn from a Maxwell-Boltzmann distribution at twice the simulation temperature as explained above. The folding simulations of AceAlaNme were performed in the NVT ensemble at 300 K starting from the optimized FES using the same method to assign initial velocities. Simulations of crambin in aqueous solution were performed in the NPT ensemble at a temperature of 300 K and a pressure of 1.01325 bar, starting from an optimized structure and initial velocities drawn from a Maxwell-Boltzmann distribution according to the simulation temperature (the first 1 ns of dynamics was discarded to allow the system to equilibrate). Constant temperature and/or pressure simulations use the Nosé-Hoover chain thermostat/barostat [tobias1993molecular, martyna1996explicit] implemented in SchNetPack using a chain length of 3.


We thank Michael Brenner for insightful comments. OTU acknowledges funding from the Swiss National Science Foundation (Grant No. P2BSP2_188147). MG works at the BASLEARN – TU Berlin/BASF Joint Lab for Machine Learning, co-financed by TU Berlin and BASF SE. MS and AT acknowledge financial support from the Fond National de la Recherche Luxembourg (FNR) under AFR PhD grant “CNDTEC (11274975)” as well as from the European Research Council via ERC Consolidator Grant “BeStMo (725291)”. This work was supported in part by the German Ministry for Education and Research under Grant Nos. 01IS14013A-E, 01GQ1115, 01GQ0850, 01IS18025A, 031L0207D, and 01IS18037A. KRM was partly supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grants funded by the Korea government(MSIT) (No. 2019-0-00079, Artificial Intelligence Graduate School Program, Korea University and No. 2022-0-00984, Development of Artificial Intelligence Technology for Personalized Plug-and-Play Explanation and Verification of Explanation). Correspondence to OTU, AT and KRM. There are no competing interests to declare. DFT reference data for training the models and scripts for running MD simulations with GEMS will be made available upon publication.