Computing spatially resolved rotational hydration entropies from atomistic simulations

by   Leonard P. Heinz, et al.

For a first principles understanding of macromolecular processes, a quantitative understanding of the underlying free energy landscape and in particular its entropy contribution is crucial. The stability of biomolecules, such as proteins, is governed by the hydrophobic effect, which arises from competing enthalpic and entropic contributions to the free energy of the solvent shell. While molecular dynamics simulations have revealed much insight, solvation shell entropies are notoriously difficult to calculate, especially when spatial resolution is required. Here, we present a method that allows for the computation of spatially resolved rotational solvent entropies via a non-parametric k-nearest-neighbor density estimator. We validated our method using analytic test distributions and applied it to an atomistic simulation of a water box. With an accuracy of better than 9.6 resolution should shed new light on the hydrophobic effect and the thermodynamics of solvation in general.



There are no comments yet.


page 11


A microscopic description of acid-base equilibrium

Acid-base reactions are ubiquitous in nature. Understanding their mechan...

Nearest neighbor density functional estimation based on inverse Laplace transform

A general approach to L_2-consistent estimation of various density funct...

Population Synthesis via k-Nearest Neighbor Crossover Kernel

The recent development of multi-agent simulations brings about a need fo...

Graphical Gaussian Process Regression Model for Aqueous Solvation Free Energy Prediction of Organic Molecules in Redox Flow Battery

The solvation free energy of organic molecules is a critical parameter i...

BOOST-Ising: Bayesian Modeling of Spatial Transcriptomics Data via Ising Model

Recent technology breakthrough in spatial molecular profiling has enable...

Understanding the Sources of Error in MBAR through Asymptotic Analysis

Multiple sampling strategies commonly used in molecular dynamics, such a...

Nearest neighbor ratio imputation with incomplete multi-nomial outcome in survey sampling

Nonresponse is a common problem in survey sampling. Appropriate treatmen...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Competing enthalpic and entropic contributions to the solvation free energies give rise to the hydrophobic effect1, which is vital for protein function and folding2, 3, 4. Despite extensive theoretical work1, 5, a quantitative understanding of the hydrophobic effect particularly at heterogeneous surfaces, such as of proteins and mixed bilayers, remains elusive.

Because surface-water shows a significantly altered behaviour compared to bulk6, 7, it is essential for our understanding of the thermodynamics and energetics of protein solvation to better characterize, e.g., the relative contributions by different solvation shells or the effect of individual protein side chains on the solvent. Molecular dynamics (MD) simulations describe the hydrophobic effect at an atomic level8, 9, but a deeper understanding of the molecular driving forces, requires a quantitative and spatially resolved picture of the solvation shell thermodynamics, which poses considerable challenges.

Methods like thermodynamic integration (TI)10, 11 allow for the calculation of solvation entropies based on MD simulations, but the lack of a spatial resolution precludes detailed analysis of how local features of the solvent-surface interface contribute and interact. Various order parameters12, 13, 14, 15, 16 assess both the local translational and the local rotational order of water molecules, but yield only a qualitative picture of the thermodynamic entropy.

Here, we limit our analysis to absolute rotational water entropies and present a method to reach a spatial resolution from atomistic simulations or Monte Carlo ensembles. Our method employs a mutual information expansion (MIE) to calculate the total entropy of water molecules based on the contributions of each molecule individually and the entropy losses due to correlations between molecule pairs and triples. A similar approach was taken e.g. by the grid inhomogeneous solvation theory (GIST)17, 18, 19, 20, 21, 22, which, like 3D-2PT23, 24, 25, considers contributions by voxels, rather then entropic contributions by the correlations between individual molecules directly. Convergence of these correlation terms is challenging, as they require sampling and density estimates in high-dimensional configuration spaces.

MIE terms were calculated using a

-nearest-neighbor (kNN) density estimator, typically used in Euclidean spaces

26, 27, 28, which we modified and optimized for , the cartesian products of the group of rotations. We considered different metrices for the -nearest neighbors in , determined an optimal -value, and provide a computationally efficient framework for rotational entropy calculation.

For easier notation, will develop our method for water molecules, although it is general and applicable to any system with rotational degrees of freedom.

In the following sections, we will first provide the conceptual foundation and then describe our rotation entropy approach. Subsequently, we will apply it to analytical test distributions, as well as to MD water boxes.

2 Theory

2.1 Absolute entropy

When separating the entropy of water into rotational and translational contributions

correlations between translation and rotation give rise to the mutual information (MI) term . Here, we focus on the rotational contribution.

Let the rotation of water molecules of the simulation system be described by the Hamiltonian , with angular momenta , orientations , the kinetic energy , and the potential energy , typically described by a molecular dynamics force field. The total entropy is

with the Boltzmann constant , Planck’s constant , and the phase space density . Because factorizes, the entropy can be split into a kinetic and a configurational term

where with arbitrary , and

are the eigenvalues of the moment-of-inertia tensor of a water molecule.

Because can be solved analytically, the challenge is to estimate .

2.2 Entropy estimation

Because the rotational entropy integral in dimensions usually cannot be computed directly, we use a truncated mutual information expansion29, 30, 31, 32 (see section 2.2.1) to expand the full-high dimensional integral into multiple low-dimensional integrals over marginal distributions, which can be calculated numerically. To obtain these marginal entropies, a -nearest-neighbor estimator (see section 2.2.2), which estimates the density at each sample point by finding the closest neighboring sample points and dividing by the volume of a ball that encloses the points, was used. Here, the orientations of water molecules in different samples, e.g., frames of a computer simulation trajectory, were represented by a series of quaternions (see section 2.2.3) with . We then defined suitable distance metrics, as required by the kNN algorithm, which are not trivial in the curved spaces of rotations (see section 2.2.4), and then calculated the volumes of balls, as induced by the metrics (see section 2.2.5). We will finally present a computationally efficient framework that allows finding neighbors to each sample point (see section 3.1).

2.2.1 Mutual information expansion

Figure 1A shows an example of an entropy expansion into mutual information (MI) terms of a system containing three subsystems, such as three water molecules, in a Venn diagram: The full entropy () is expanded into MI terms (), of which the first term represents the entropies of each molecule individually and the further terms are correlation terms of 2nd and 3rd order, respectively,


In this notation, is the entropy of the marginal distribution with respect to molecules with indices .

Figure 1: (A) Mutual information expansion illustrated for the entropy-breakdown of 3 particles. (B) Sketch of density estimation on SO(3) (here represented as a 2-sphere). Each dot on the sphere represents an orientation. For each point , the th neighbor according to a distance metric (e.g., or ) is found. The density is estimated via the volume of a ball with radius . (C) Visualization of the fill mode approach: A correlated dataset is shown on the left hand side. The identical data is decorrelated by applying a random permutation along one axis, as shown on the right. The entropy of the decorrelated data is the sum of both "marginal entropies".

For water molecules, the expansion consists of MI orders, of which the th term involves -dimensional integrals and takes all possible -molecule correlations into account. Approximating the full entropy by a truncated expansion thus leads to lower dimensional integrals, which can be better sampled. Although there is no guarantee that truncated orders are small and can be neglected, it has been shown that a truncated expansion provides accurate entropy estimates if the physical interactions are short ranged33, as for water.

Here we take up to -molecule correlations into account by truncating after the 3rd order, hence


where the first order includes the kinetic entropy contribution and a correction of , due to the two-fold symmetry of the water molecule.

2.2.2 kNN entropy estimation

To evaluate eq 2 from a given sample of orientations with , the marginal entropies from eq 1 are calculated using a kNN entropy estimator26, 34, 27, 28, 35. For , the th nearest neighbor with respect to the sample point is defined by a metric (see Figure 1B), and is estimated as , where is a fixed integer, is the volume of a ball with radius , the distance between and its th neighbor, and is a normalization constant. Results for and are obtained by generalizing the metric and the volume to higher dimensions. The choice of metrices, on which the results may depend for finite sampling, and their corresponding volumes in will be discussed in section 2.2.4 and section 2.2.5. The entropy is

where is a correction which accounts for the bias introduced by the th neighbors being, by definition, on the edges of the balls28. is the digamma function.

Because eqs 1b and 1c are sums and differences of integrals of different dimensionalities, biases are introduced: With increasing dimensionality and thus reducing sampling, the kNN estimator yields increasingly smoothed versions of the underlying true distributions. The estimator therefore overestimates entropies of distributions with higher-dimensional supports more than of those defined in lower-dimensional spaces, resulting in biases if entropies of different dimensionality are added or subtracted. To overcome this problem, the sampling space is expanded to equal dimensionality by using fill modes36, 32. , defined in eq 1b as the sum of integrals in and , can be rewritten as a sum of two integrals

if the corresponding joint distribution

factorizes to . To achieve statistical independence, the sample points corresponding to index were subjected to a random permutation , which decorrelates and , but leaves the marginal distributions unchanged, as sketched in Figure 1C. The joint entropy is thus the sum of the initial marginal entropies .

Similarly, the 3rd order MI term reads

2.2.3 Parametrization of orientations

From different parametrizations of orientations in 3d space, such as Euler angles, Tait-Bryan angles, Hopf coordinates37, 38, and spherical coordinates, we used quaternions39, which, contrary to most other charts of , do not suffer from Gimbal lock. They are defined as , where and are a normalized rotation axis and a rotation angle, respectively, and thus can be interpreted as elements of the 3-sphere, i.e., . Because there is a one-to-one mapping of the 3-sphere to the Special Unitary group , which in turn provides a double-covering of , each orientation is described by two equivalent quaternions, which differ only by a sign40.

2.2.4 Choice of metrics in

We next considered the proper choice of metrics in . At first sight, one might think that, of many possible metrics in 40, 41, only one, e.g., the geodesic metric shown in Figure 1B, yields the correct entropy. However, in the limit of infinite sampling, kNN entropy estimation with any metric is possible if used with its induced ball-volumes (see section 2.2.5)42. Our choice was therefore guided by the speed of convergence and computational efficiency.

We chose the quaternion metric43, 40

sketched in Figure 1B, which defines a metric between two rotations as the minimum Euclidean distance between unit quaternions, taking the sign ambiguity into account. In , the quaternion metric and the more natural geodesic metric yield identical nearest neighbors. They are functionally equivalent because a positive continuous strictly increasing function , such that (and vice versa), exists40. does not require evaluation of the inverse cosine function and thus is computationally more efficient; it was therefore preferred over .

Metrices in and were obtained by combining with the Euclidean norms in and , respectively,

with . When combined with the Euclidean norms, the quaternion metric and the more natural geodesic metric are not functionally equivalent and hence yield, in general, different nearest neighbors. For small distances, i.e., for high sampling, the metrices are asymptotically identical.

To test whether this choice of metrics impacts the accuracy of the MI results, we compared our choice to the composite metric using and the maximum-norm in , which is functionally equivalent to the geodesic composite metric but slightly less efficient to evaluate than the Euclidean norm. For frames, no significant difference between the MI values was seen.

2.2.5 Volumes of balls in

The volumes (dark green in Figure 1B), enclosed by the kNN radius , read

for in . The respective volumes for (in ) and (in ) reduce to

respectively, with and . The integrals were solved numerically for equally spaced values of using the software Mathematica 10.044

and the multidimensional rule; the results were stored in a lookup table. Cubic interpolation was used to obtain results from the stored values.

3 Methods

3.1 Nearest-neighbor search

Nearest-neighbor searches were performed using the Non-Metric Space Library (NMSLIB)111 using the above metrics. Each data set was indexed in a vantage-point tree46, 47 (VP-tree) that rests on the triangle inequality. Our version of the NMSLIB, modified to include the orientational metrices, is available online222

3.2 Accuracy assessment

3.2.1 Test distributions

To assess the accuracy of our method, we used analytical test-distributions in , , and , derived from

with a quaternion , the first quaternion component , the first azimuthal angle in spherical coordinates for the 3-sphere , and the appropriate normalization constant (Figure 2A). The analytical expression for the configurational entropy reads

using for simpler notation and the gamma function . As illustrated in Figure 2A, the distribution depends on the localization parameter ; a value of

yields a uniform distribution; larger

values yield increasingly narrower distributions.

For and

, probability distributions

and were used to obtain uncorrelated distributions with entropies , and , respectively.

To also assess the accuracy for correlated distributions with , the test-distribution

was used, which was designed such that the marginals with respect to and are and , respectively. The localization here controls the degree of correlation between and , ranging from an uncorrelated uniform distribution () on to strongly correlated distributions for lager values. The entropy of this distribution is

where is the entropy of a free rotor.

Samples were obtained using a rejection method: First, a random point in was drawn from a uniform distribution by drawing quaternions from the uniform distribution on the 3-sphere. Next, a random number was drawn from a uniform distribution between and and was accepted if and rejected otherwise. This process was repeated until the desired number of samples was obtained.

The accuracy of our method was assessed for each test distribution for localization parameters between and , nearest-neighbor -values of , and , and with to frames (). The computed entropy and MI values were compared to the analytical results. To obtain statistical error estimates, the calculations for each parameterset was repeated times.

3.3 Molecular dynamics simulations

All MD simulations were carried out using a modified version333 of the software package Gromacs 201848, 49, 50, 51, 52 with an additional center of mass motion (COM) removal53 method, used to individually constrain all oxygen atoms. We furthermore made small additional changes to apply COM removal to individual atoms and to overcome the limit of COM removal groups444 The CHARMM36m force field54, 55, 56, 57, 58 and the CHARMM-TIP3P water model59 were used. All water molecules were subjected to SETTLE60 constraints, and the leap frog integrator with a time step of fs was used. Electrostatic forces were calculated using the Particle-Mesh Ewald (PME) method61 with a nm real space cut-off; the same cut-off was used for Lennard-Jones potentials62. In all simulations, the V-rescale thermostat63 with a time constant of ps, and, if applicable, the Parrinello-Rahman barostat64, 65 with a time constant of ps and bar pressure were used.

A total of water molecules was placed within a cubic simulation box, and the system was equilibrated for ns at K as a NPT ensemble. From the equilibrated system, resulting in a box size of approximately nm, three s production runs were started, as shown in the first column of Figure 3: Run m ("mobile") was carried out as described above. To benchmark our method against the established method of thermodynamic integration (TI), a system with only rotational degrees of freedom was constructed. To this end, all oxygens were position-constrained using COM removal as shown in the first column of Figure 3 in run p ("pinned"), allowing only rotational movements around the oxygen atom under NVT conditions.

Because at K, the water molecules formed an almost rigid, ice-like hydrogen bond network with only very little dynamics, the temperature was increased to K. System sp ("sliced & pinned") was simulated as p but all water molecules within a slice of nm width were removed to create a water-vacuum interface.

3.3.1 Entropy calculation

For all three test systems, the entropy of rotation was calculated as described in section 2, each using a s trajectory with frames. For the MI terms, a cut-off on the distance between average molecule positions was used. Whereas including the MIs of many molecule pairs by using a large cut-off distance gives rise to a more accurate MIE, it also introduces larger noise due to limited sampling. For pairwise MI terms, the cut-off was chosen as nm, because for larger distances, the MI terms vanish within statistical errors (see Figure 3B and D). Similarly, triple MI contributions were cut off at nm.

Because the water molecules in system m were mobile, average positions across the obtained trajectory were unusable to define a cut-off. Therefore the water molecules were relabeled in each frame, such that they remained as close as possible to a simple-cubic reference structure using permutation reduction66, 67, which leaves the physics of the system unchanged. In systems p and sp, the molecules were immobilized and the oxygen positions where used for applying the cut-off.

To quantify the precision of the method, the MD simulations and the subsequent entropy analyses were repeated in 10 independent calculations.

3.3.2 Thermodynamic integration reference

Reference entropy values for systems p and sp were obtained using thermodynamic integration10, 11, 66 (TI). Interactions between water molecules were gradually switched off in a stepwise fashion to obtain the entropy difference between real water and non-interacting water. The absolute rotational entropy was obtained as the sum of the excess entropy, obtained via TI, and the ideal gas contribution,

where are the eigenvalues of the moment of inertia tensor of a water molecule.

Both TI calculations were performed using soft-core68 parameters and . Coulomb interactions were linearly switched off in windows of ns each and further windows were used to subsequently switch off the van-der-Waals interactions. The first nanosecond of each window was discarded.

4 Results and discussion

4.1 Test distributions

Figure 2: Analytic test distribution (A) compared to the entropies and MI values obtained from density estimates. Panel (A) shows the distribution for increasing localizations , illustrated by different colors. Here we represent as a 1-sphere and the distribution was renormalized accordingly for this 1d representation. The north pole and the quaternion are indicated with black crosses. Panels (B), (C), and (D) show entropies and MI values obtained using the test distributions , and for varying coupling parameters . Panel (E) shows the convergence of the results for increasing sample sizes. In panels (B) to (E), the analytical result is shown by a dashed line; results for different -values colored according to the legend at the bottom left of the figure. Values that were fixed during the calculation, such as the choice of the test distribution, the number of frames , or the coupling parameter , are stated in a corner of the respective panel. The shown errors denote regions.

We first assessed the accuracy of our method for three uncorrelated analytical test distributions (defined in , , and ) and one correlated analytical test distributions (defined in ), as described in section 3.2.1. The distributions depend on the localization parameter , which, for the uncorrelated distributions , and determines their width, as demonstrated in Figure 2A, and, for , controls the strength of the correlation.

As can be seen in Figure 2B, the kNN estimator largely agrees with the analytic results (dashed lines) for the uncorrelated distributions , and for between (uniform distribution) and (strongly peaked), and the tested -values between and . The graphs for the three distributions are scaled and offset as indicated in the figure. We find that for distributions and , our method accurately reproduces the true entropy for all tested -values within statistical errors, even for the small number of frames. The statistical errors amount to nats (natural units of information) for , , or less. Also for and , the analytical result is matched within statistical errors (nats at at maximum), whereas larger values of lead to overestimated entropies of up to nats (, ), caused by the limited sampling of just frames and the increased dimensionality of compared to the other tested distributions.

Next, we assessed the accuracy for the correlated test distribution. The panels Figure 2C and D show the entropy (calculated via the MIE as defined in eq 1b) and the MI of for frames, respectively. For the uniform distribution (), the algorithm yields the analytic values of and for entropy and mutual information, respectively. With increasing correlation , the entropy is increasingly overestimated as MI is underestimated. Both effects are more pronounced for larger -values: Whereas for , the algorithm yields accurate values within statistical errors up to a correlation of , the results deviate significantly for even for very small -values. Overall, small -values, such as , yield high accuracy but with reduced precision (i.e., larger statistical errors) compared to large -values like , which give rise to smaller statistical errors but reduced accuracy.

To further assess this trade-off and the convergence properties of our method, we calculated the relative entropy errors for for sampling between and frames, shown in Figure 2E. For and only frames, the method overestimates the true entropy by to , which quickly drops to below for more than frames. For larger -values, the entropy errors increase and the convergence becomes slower, e.g., requires frames to achieve an entropy error of less than %. The statistical errors at frames are % and % for values of and , respectively. Overall, yields somewhat lower precision but significantly faster convergence compared to larger values, which becomes even more pronounced in higher dimensions. We therefore find this value the optimal choice for the systems at hand and used it for all subsequent analyses.

The kNN entropy estimator rests on the assumption that the density is approximately constant and isotropic within each -nearest-neighbor ball (see Figure 1B). This assumption implies that features of the true distribution that are smaller than the average distance between sample points are not resolved, which, in case of poor sampling, inevitably leads to an overestimated entropy, as seen for with large or as shown in Figure 2E. The assumption of isotropy no longer holds for highly correlated datasets, such as for large values of . In this case, also the -nearest neighbors to each sample point are correlated and thus not isotropically distributed, which is not reflected by an isotropic kernel, i.e., a ball. For Euclidean spaces, this problem was addressed by using anisotropic kernels69, 70. Although this idea could also be applied in , we find that the correlation of water molecules at standard conditions is weak enough (see Figure 3A) to allow for sufficiently accurate results under the isotropy assumption.

The trade-off between accuracy and precision with respect to the -value is a general property of kNN entropy estimators, which has been characterized previously71, 70 and is intuitively accessible: Whereas averaging over an increasing number of neighbors reduces statistical uncertainties and thus improves precision, the assumptions of approximately constant isotropic densities are applied to increasingly larger balls, resulting in increasingly overestimated entropies for distributions with small scale features or strong correlations.

Overall, the kNN method with yields the most accurate results while being only slightly less precise than estimators with lager . It retrieves the analytical entropies within statistical errors for the uncorrelated distributions, as well as for the correlated distribution with using just and frames, respectively.

4.2 Entropy calculated from molecular dynamics simulations

Having assessed our rotational entropy method against analytic test distributions, we tested its accuracy for more realistic systems of up to interacting water molecules. To this end, we simulated three atomistic water MD systems (Figure 3 left column), as described in section 3.3. System m ("mobile") comprised 1728 unconstrained water molecules. In system p ("pinned"), the position of the oxygen atom of each individual molecule was fixed, permitting only rotations in order to allow direct entropy comparison using TI10, 11. In system sp ("sliced & pinned"), a nm slice of molecules was removed from the center of the simulations box (see Figure 3) in addition to pinning the oxygen atoms to assess the accuracy of the method also for systems with spatial variation. For all systems, 10 independent MD simulations were performed and for each, entropies were calculated via a MIE as explained in section 3.3.1. For the pinned systems, reference entropies were obtained by TI, as described in section 3.3.2.

Figure 3: Entropies and MI contributions for the systems m, p, and sp. The first column shows the three considered MD systems. Panels (A), (C), and (E) show the rotational entropy, computed using the mutual information expansion in purple; its breakdown into contributions by 1st to 3rd order is visualized underneath. For systems p and sp, the result is drawn in comparison to the TI values in gray. Panels (B) and (D) show the mutual information between all pairs of considered water molecules depending on their distance. The blue lines correspond to running Gaussian averages. Panel (F) displays between pairs of molecules that are closer than nm in relation to the their distance to the vacuum slice in system sp. The inset in green shows the molecule pair density with respect to the center of mass distance to the slice.

As shown in Figure 3A, an absolute rotational entropy of JmolK per molecule was obtained for system m, to which the first, second, and third MI orders contribute kJmol, JmolK, and JmolK

, respectively. Note that the provided values are average and standard deviation of the 10 independent calculations and that the errors are too small to be illustrated in Figure 

3. The pair mutual information terms , shown in Figure 3B, reach a maximum of JmolK for very close water molecules and vanish monotonically for molecules that are, after permutation reduction and on average, separated by more than nm. Note that the discrete nature of distances in Figure 3B is due to the choice of a simple cubic reference structure for permutation reduction.

To compare the obtained absolute entropies to TI, the water movement was restricted to the rotational degrees of freedom in system p by pinning each molecule as described in section 3.3. Here, the rotational entropy, shown in panel C, is reduced to JmolK. The 2nd and 3rd order mutual information terms contribute JmolK and JmolK, respectively. Compared to the results from TI shown in gray, the entropy is underestimated by % due to the limited sampling of the strongly correlated system. Similar to what we observed for the analytical test case of Figure 2D, the MI terms are underestimated for strong correlations, of which the 3rd order is most severely affected due to the high dimensionality of the sampling space.

The terms, illustrated in Figure 3D, show a maximum of JmolK and indicate that water molecules decorrelate beyond nm. The distribution shows secondary and tertiary peaks around nm and nm, which arise from indirect coupling via one or two mediating water molecules, as indicated by the structures shown in Figure 3D. In this case, the correlations between the molecule pairs are not due to direct interactions; instead, mediating water molecules (orange) enhance distant orientation correlations via short hydrogen-bonded chains (shown in red). This finding demonstrates that the method is able to identify regions of locally coupled water molecules and to quantify the resulting entropy losses, thus providing a spatially resolved picture of entropy changes.

To further assess and demonstrate the accuracy of the method for systems with spatial features, we included position dependent properties by introducing a vacuum-slice in system sp, for which the water molecules close to the surface show a different dynamic behavior than those within the bulk. For system sp, the accuracy of our entropy estimation relative to TI improved slightly to %, whereas the contributions by higher MI orders remained almost identical (see Figure 3E). We assume that the improved accuracy is due to the slightly smaller number of molecules (1728 vs. 1493 with slice). It may further be due to the vacuum slice limiting the range of many-particle correlations, which would not be captured by a 3rd order approximation.

Figure 3F shows the terms of molecule pairs that are closer than nm, i.e., are within their first hydration shells, relative to their distance to the slice, yielding increased correlations for pairs that are closer to the vacuum interface of JmolK on average compared to JmolK in bulk. The increased correlations at the surface and their associated entropy loss contribute to the thermodynamic unfavorability of water at a (hydrophobic) vacuum interface.

The MIE approaches the TI values for systems p and sp to % and %, respectively, and additionally yields information about individual correlations and their associated entropy losses, thus providing spatial resolution. Remarkably, about 25-fold less computer time was required for the MIE compared to TI for the shown examples.

The large 2nd and 3rd order contributions, illustrated in Figure 3C and E, show that both systems with pinned water exhibit strong correlations between water molecules. As for the test distributions illustrated in Figure 2, strong correlations result in systematically underestimated MI values. Due to their high dimensionality, and thus low sampling density, we expect the 3rd order MIE contributions for systems p and sp to be mostly affected, contributing to their overall underestimated entropy. For the same reason, we expect entropies calculated from more loosely coupled mobile water to yield markedly more accurate results.

Although a direct comparison to TI is impossible for system m, we expect that the errors due to the truncation of higher order MI terms, observed for the more tightly correlated systems p and sp, are larger than for unconstrained water. Therefore, the approximation of the truncated MIE yields more accurate results for realistic solute systems. These two effects combined, the performances obtained for the more correlated pinned water systems provide upper bounds for the expected errors.

5 Conclusion

We have developed an estimator for spatially resolved rotational solvent entropies based on a truncated mutual information expansion and the -nearest-neighbor algorithm on . Accuracy and computational efficiency were assessed for both analytical test distributions, and systems of up to 1728 water molecules, described by atomistic MD simulations.

For the uncorrelated test distributions in , , and , the estimator with yielded quite accurate entropies for as little as sample points. For the correlated test distribution , the entropies were overestimated for increasing coupling, caused by underestimating mutual information terms. The latter effect is especially pronounced for large -values. Precision increased only marginally for larger at the cost of decreased accuracy, which led us to conclude that represents the best trade-off for the problem at hand. We furthermore demonstrated convergence within frames for a correlated distribution () and therefore expect our approach to accurately describe correlations of water molecules already in relatively short MD trajectories of ns to s.

For the considered MD systems, we find agreement within % and % with TI for pinned waters in systems p and sp, respectively, corresponding to deviations of kJmol and kJmol per water molecule at K. The obtained rotational entropies are precise within kJmol and kJmol, respectively. For the binding of a small ligand that displaces water molecules at the binding pocket, we therefore expect to obtain absolute rotational entropies with an accuracy of at least kJmol and to resolve rotational entropy differences of at least kJmol. As seen in the second column of Figure 3, fully mobile water exhibits considerably smaller correlations than pinned water, rendering the tests using pinned water quite a tough benchmark compared to realistic solute systems. For a protein/water system, we would therefore expect markedly smaller error margins.

The algorithm provides spatial resolution by assessing the mutual information contributions on the level of individual molecules, distinguishing it from, e.g. GIST17, 18, 19, 20, 21, 22. E.g., for the hydrophobic vacuum interface, we calculated an entropy loss due to an increase in mutual information close to the surface. This ability to resolve the origin of entropy changes renders the method a promising tool to enhance our understanding of processes like the hydrophobic effect and the thermodynamics of solvated complex heterogeneous biomolecules in general.

Work on assessing the contributions by the translational entropy and the translation-rotation correlation to the overall entropy is in progress and will be published elsewhere.

Although here, we restricted the application and assessment of our approach to water only, generalization to other solvents is straight forward as long as the internal dynamics of the solvent molecules does not couple to their rotational dynamics. An implementation is available for download555 as a python package with a C++ backend for fast neighbor search.

L.P.H thanks the International Max Planck Research School for Physics of Biological and Complex Systems for support through a PhD Fellowship.


  • Ben-Naim 1975 Ben-Naim, A. Hydrophobic interaction and structural changes in the solvent. Biopolymers: Original Research on Biomolecules 1975, 14, 1337–1355.
  • Chandler 2005 Chandler, D. Interfaces and the driving force of hydrophobic assembly. Nature 2005, 437, 640.
  • Berne et al. 2009 Berne, B. J.; Weeks, J. D.; Zhou, R. Dewetting and hydrophobic interaction in physical and biological systems. Annual review of physical chemistry 2009, 60.
  • Dias et al. 2010 Dias, C. L.; Ala-Nissila, T.; Wong-ekkabut, J.; Vattulainen, I.; Grant, M.; Karttunen, M. The hydrophobic effect and its role in cold denaturation. Cryobiology 2010, 60, 91–99.
  • Grdadolnik et al. 2017 Grdadolnik, J.; Merzel, F.; Avbelj, F. Origin of hydrophobicity and enhanced water hydrogen bond strength near purely hydrophobic solutes. Proceedings of the National Academy of Sciences 2017, 114, 322–327.
  • Cheng and Rossky 1998 Cheng, Y.-K.; Rossky, P. J. Surface topography dependence of biomolecular hydrophobic hydration. Nature 1998, 392, 696.
  • Tarek and Tobias 2000 Tarek, M.; Tobias, D. J. The dynamics of protein hydration water: a quantitative comparison of molecular dynamics simulations and neutron-scattering experiments. Biophysical journal 2000, 79, 3244–3257.
  • Sugita and Okamoto 1999 Sugita, Y.; Okamoto, Y. Replica-exchange molecular dynamics method for protein folding. Chemical physics letters 1999, 314, 141–151.
  • De Vries et al. 2004 De Vries, A. H.; Mark, A. E.; Marrink, S. J. Molecular dynamics simulation of the spontaneous formation of a small DPPC vesicle in water in atomistic detail. Journal of the American Chemical Society 2004, 126, 4488–4489.
  • Kirkwood 1935 Kirkwood, J. G. Statistical mechanics of fluid mixtures. The Journal of Chemical Physics 1935, 3, 300–313.
  • Peter et al. 2004 Peter, C.; Oostenbrink, C.; van Dorp, A.; van Gunsteren, W. F. Estimating entropies from molecular dynamics simulations. The Journal of chemical physics 2004, 120, 2652–2661.
  • Errington and Debenedetti 2001 Errington, J. R.; Debenedetti, P. G. Relationship between structural order and the anomalies of liquid water. Nature 2001, 409, 318.
  • Errington et al. 2002 Errington, J. R.; Debenedetti, P. G.; Torquato, S. Cooperative origin of low-density domains in liquid water. Physical review letters 2002, 89, 215503.
  • Yan et al. 2007 Yan, Z.; Buldyrev, S. V.; Kumar, P.; Giovambattista, N.; Debenedetti, P. G.; Stanley, H. E. Structure of the first-and second-neighbor shells of simulated water: Quantitative relation to translational and orientational order. Physical Review E 2007, 76, 051201.
  • Godec et al. 2011 Godec, A.; Smith, J. C.; Merzel, F. Increase of both order and disorder in the first hydration shell with increasing solute polarity. Physical review letters 2011, 107, 267801.
  • Heyden 2019 Heyden, M. Heterogeneity of water structure and dynamics at the protein-water interface. The Journal of chemical physics 2019, 150, 094701.
  • Wallace 1987 Wallace, D. C. On the role of density fluctuations in the entropy of a fluid. The Journal of chemical physics 1987, 87, 2282–2284.
  • Baranyai and Evans 1989 Baranyai, A.; Evans, D. J. Direct entropy calculation from computer simulation of liquids. Physical Review A 1989, 40, 3817.
  • Lazaridis 1998 Lazaridis, T. Inhomogeneous fluid approach to solvation thermodynamics. 1. Theory. The Journal of Physical Chemistry B 1998, 102, 3531–3541.
  • Lazaridis 1998 Lazaridis, T. Inhomogeneous fluid approach to solvation thermodynamics. 2. Applications to simple fluids. The Journal of Physical Chemistry B 1998, 102, 3542–3550.
  • Nguyen et al. 2012 Nguyen, C. N.; Kurtzman Young, T.; Gilson, M. K. Grid inhomogeneous solvation theory: hydration structure and thermodynamics of the miniature receptor cucurbit [7] uril. The Journal of chemical physics 2012, 137, 044101.
  • Nguyen et al. 2012 Nguyen, C. N.; Young, T. K.; Gilson, M. K. Erratum:“Grid inhomogeneous solvation theory: Hydration structure and thermodynamics of the miniature receptor cucurbit [7] uril”[J. Chem. Phys. 137, 044101 (2012)]. The Journal of chemical physics 2012, 137, 149901.
  • Lin et al. 2003 Lin, S.-T.; Blanco, M.; Goddard III, W. A. The two-phase model for calculating thermodynamic properties of liquids from molecular dynamics: Validation for the phase diagram of Lennard-Jones fluids. The Journal of chemical physics 2003, 119, 11792–11805.
  • Lin et al. 2010 Lin, S.-T.; Maiti, P. K.; Goddard III, W. A. Two-phase thermodynamic model for efficient and accurate absolute entropy of water from molecular dynamics simulations. The Journal of Physical Chemistry B 2010, 114, 8191–8198.
  • Persson et al. 2017 Persson, R. A.; Pattni, V.; Singh, A.; Kast, S. M.; Heyden, M. Signatures of solvation thermodynamics in spectra of intermolecular vibrations. Journal of chemical theory and computation 2017, 13, 4467–4481.
  • Kozachenko and Leonenko 1987

    Kozachenko, L.; Leonenko, N. N. Sample estimate of the entropy of a random vector.

    Problemy Peredachi Informatsii 1987, 23, 9–16.
  • Singh et al. 2003 Singh, H.; Misra, N.; Hnizdo, V.; Fedorowicz, A.; Demchuk, E. Nearest neighbor estimates of entropy. American journal of mathematical and management sciences 2003, 23, 301–321.
  • Kraskov et al. 2004 Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Physical review E 2004, 69, 066138.
  • Matsuda 2000 Matsuda, H. Physical nature of higher-order mutual information: Intrinsic correlations and frustration. Physical Review E 2000, 62, 3096.
  • Hnizdo et al. 2007 Hnizdo, V.; Darian, E.; Fedorowicz, A.; Demchuk, E.; Li, S.; Singh, H. Nearest-neighbor nonparametric method for estimating the configurational entropy of complex molecules. Journal of computational chemistry 2007, 28, 655–668.
  • Hnizdo et al. 2008 Hnizdo, V.; Tan, J.; Killian, B. J.; Gilson, M. K. Efficient calculation of configurational entropy from molecular simulations by combining the mutual-information expansion and nearest-neighbor methods. Journal of computational chemistry 2008, 29, 1605–1614.
  • Fengler 2011 Fengler, M. Estimating Orientational Water Entropy at Protein Interfaces. Ph.D. thesis, Georg-August-Universität Göttingen, 2011.
  • Goethe et al. 2017

    Goethe, M.; Fita, I.; Rubi, J. M. Testing the mutual information expansion of entropy with multivariate Gaussian distributions.

    The Journal of chemical physics 2017, 147, 224102.
  • Tsybakov and Van der Meulen 1996 Tsybakov, A. B.; Van der Meulen, E. Root-n consistent estimators of entropy for densities with unbounded support. Scandinavian Journal of Statistics 1996, 75–83.
  • Evans 2008 Evans, D. A computationally efficient estimator for mutual information. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences. 2008; pp 1203–1215.
  • Hensen et al. 2010 Hensen, U.; Lange, O. F.; Grubmüller, H. Estimating absolute configurational entropies of macromolecules: the minimally coupled subspace approach. PloS one 2010, 5, e9179.
  • Hopf 1964 Hopf, H. Selecta Heinz Hopf; Springer, 1964; pp 38–63.
  • Yershova et al. 2010 Yershova, A.; Jain, S.; Lavalle, S. M.; Mitchell, J. C. Generating uniform incremental grids on SO (3) using the Hopf fibration. The International journal of robotics research 2010, 29, 801–812.
  • Karney 2007 Karney, C. F. Quaternions in molecular modeling. Journal of Molecular Graphics and Modelling 2007, 25, 595–604.
  • Huynh 2009 Huynh, D. Q. Metrics for 3D rotations: Comparison and analysis. Journal of Mathematical Imaging and Vision 2009, 35, 155–164.
  • Huggins 2014 Huggins, D. J. Comparing distance metrics for rotation using the k-nearest neighbors algorithm for entropy estimation. Journal of computational chemistry 2014, 35, 377–385.
  • Singh and Póczos 2016 Singh, S.; Póczos, B. Analysis of k-nearest neighbor distances with application to entropy estimation. arXiv preprint arXiv:1603.08578 2016,
  • Ravani and Roth 1983 Ravani, B.; Roth, B. Motion synthesis using kinematic mappings. Journal of mechanisms, Transmissions, and Automation in Design 1983, 105, 460–467.
  • 44 Inc., W. R. Mathematica, Version 10.0. Champaign, IL, 2014.
  • Boytsov and Naidan 2013 Boytsov, L.; Naidan, B. Engineering efficient and effective non-metric space library. International Conference on Similarity Search and Applications. 2013; pp 280–293.
  • Uhlmann 1991 Uhlmann, J. K. Satisfying general proximity/similarity queries with metric trees. Information processing letters 1991, 40, 175–179.
  • Yianilos 1993 Yianilos, P. N. Data structures and algorithms for nearest neighbor search in general metric spaces. SODA. 1993; pp 311–321.
  • Berendsen et al. 1995 Berendsen, H. J.; van der Spoel, D.; van Drunen, R. GROMACS: a message-passing parallel molecular dynamics implementation. Computer Physics Communications 1995, 91, 43–56.
  • Van Der Spoel et al. 2005 Van Der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A. E.; Berendsen, H. J. GROMACS: fast, flexible, and free. Journal of computational chemistry 2005, 26, 1701–1718.
  • Hess et al. 2008 Hess, B.; Kutzner, C.; Van Der Spoel, D.; Lindahl, E. GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation. Journal of chemical theory and computation 2008, 4, 435–447.
  • Pronk et al. 2013 Pronk, S.; Páll, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 2013, 845–854.
  • Pall et al. 2014 Pall, S.; Abraham, M. J.; Kutzner, C.; Hess, B.; Lindahl, E. Tackling exascale software challenges in molecular dynamics simulations with GROMACS. International Conference on Exascale Applications and Software. 2014; pp 3–27.
  • Amadei et al. 2000 Amadei, A.; Chillemi, G.; Ceruso, M.; Grottesi, A.; Di Nola, A. Molecular dynamics simulations with constrained roto-translational motions: theoretical basis and statistical mechanical consistency. The Journal of Chemical Physics 2000, 112, 9–23.
  • Brooks et al. 1983 Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.; Swaminathan, S.; Karplus, M. CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. Journal of computational chemistry 1983, 4, 187–217.
  • MacKerell Jr et al. 1998 MacKerell Jr, A. D.; Bashford, D.; Bellott, M.; Dunbrack Jr, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher III, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiórkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-atom empirical potential for molecular modeling and dynamics studies of proteins. The journal of physical chemistry B 1998, 102, 3586–3616.
  • Brooks et al. 2009 Brooks, B. R.; Brooks, C. L.; MacKerell, A. D.; Nilsson, L.; Petrella, R. J.; Roux, B.; Won, Y.; Archontis, G.; Bartels, C.; Boresch, S.; Caflisch, A.; Caves, L. S. D.; Cui, Q.; Dinner, A. R.; Feig, M.; Fischer, S.; Gao, J.; Hodos̆ c̆ek, M.; Im, W.; Kuczera, K.; Lazaridis, T.; Jun, M.; Ovchinnikov, V.; Paci, E.; Pastor, R. W.; Post, C. B.; Pu, J. Z.; Schaefer, M.; Tidor, B.; Venable, R. M.; Woodcock, H. L.; Wu, X.; Yang, W.; York, D. M.; Karplus, M. CHARMM: the biomolecular simulation program. Journal of computational chemistry 2009, 30, 1545–1614.
  • Huang and MacKerell 2013 Huang, J.; MacKerell, A. D. CHARMM36 all-atom additive protein force field: Validation based on comparison to NMR data. Journal of computational chemistry 2013, 34, 2135–2145.
  • Huang et al. 2017 Huang, J.; Rauscher, S.; Nawrocki, G.; Ran, T.; Feig, M.; de Groot, B. L.; Grubmüller, H.; MacKerell Jr, A. D. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. nature methods 2017, 14, 71.
  • Jorgensen et al. 1983 Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. The Journal of chemical physics 1983, 79, 926–935.
  • Miyamoto and Kollman 1992 Miyamoto, S.; Kollman, P. A. Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models. Journal of computational chemistry 1992, 13, 952–962.
  • Darden et al. 1993 Darden, T.; York, D.; Pedersen, L. Particle mesh Ewald: An N log (N) method for Ewald sums in large systems. The Journal of chemical physics 1993, 98, 10089–10092.
  • Jones 1924 Jones, J. E. On the determination of molecular fields.—II. From the equation of state of a gas. Proc. R. Soc. Lond. A 1924, 106, 463–477.
  • Bussi et al. 2007 Bussi, G.; Donadio, D.; Parrinello, M. Canonical sampling through velocity rescaling. The Journal of chemical physics 2007, 126, 014101.
  • Andersen 1980 Andersen, H. C. Molecular dynamics simulations at constant pressure and/or temperature. The Journal of chemical physics 1980, 72, 2384–2393.
  • Parrinello and Rahman 1981 Parrinello, M.; Rahman, A. Polymorphic transitions in single crystals: A new molecular dynamics method. Journal of Applied physics 1981, 52, 7182–7190.
  • Reinhard and Grubmüller 2007 Reinhard, F.; Grubmüller, H. Estimation of absolute solvent and solvation shell entropies via permutation reduction. The Journal of chemical physics 2007, 126, 014102.
  • Reinhard et al. 2009 Reinhard, F.; Lange, O. F.; Hub, J. S.; Haas, J.; Grubmüller, H. g_permute: Permutation-reduced phase space density compaction. Computer Physics Communications 2009, 180, 455–458.
  • Beutler et al. 1994 Beutler, T. C.; Mark, A. E.; van Schaik, R. C.; Gerber, P. R.; Van Gunsteren, W. F. Avoiding singularities and numerical instabilities in free energy calculations based on molecular simulations. Chemical physics letters 1994, 222, 529–539.
  • Hensen et al. 2009 Hensen, U.; Grubmüller, H.; Lange, O. F. Adaptive anisotropic kernels for nonparametric estimation of absolute configurational entropies in high-dimensional configuration spaces. Physical Review E 2009, 80, 011913.
  • Gao et al. 2015

    Gao, S.; Ver Steeg, G.; Galstyan, A. Efficient estimation of mutual information for strongly dependent variables. Artificial Intelligence and Statistics. 2015; pp 277–286.

  • Khan et al. 2007 Khan, S.; Bandyopadhyay, S.; Ganguly, A. R.; Saigal, S.; Erickson III, D. J.; Protopopescu, V.; Ostrouchov, G. Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. Physical Review E 2007, 76, 026209.